By Daniel Dominguez

Meta’s latest AI research introduces BuilderBot, a new tool to fuel creativity in the metaverse capable of generating immersive objects through voice commands only.

Building for the metaverse will require major breakthroughs in artificial intelligence. Meta AI labs is already making advancements in research and development as part of a long-term effort to enable the next era of computing.

According to Meta, BuilderBot enables you to describe a world and then it will generate aspects of that world for you, adding that as we advance this technology further, you’re going to be able to create nuanced worlds to explore and share experiences with others with just your voice.

BuilderBot is part of a larger AI project called Project CAIRaoke, which focuses on developing the conversational AI necessary to create these virtual worlds.

One necessary step in advancing conversational AI is understanding the full scope of the problem. Many people see the numerous recent advances in NLU, such as BART and GPT-3, and think the challenge of understanding and generating human-like text has been solved. AI for understanding is well researched and developed across the industry. It’s used to extract meaning from various input modalities, such as automatic speech recognition, image classification, and NLU. AI for interaction is how we use our understanding of the world to interact with other people using technology. This can be sending a text, a voice command, haptic feedback, showing an image, a video, an avatar face, or a combination of all these.

Researchers and engineers across the industry agree that good conversational systems need a solid understanding layer powered by AI models. But many feel interaction is an engineering problem, rather than an AI problem. Hence an engineer who knows the state of the world can create an elaborate logic to handle the required interaction. The engineering approach makes it easy to understand how the system works and to quickly debug the logic when necessary. 

The canonical approach for AI-powered assistants requires four sets of inputs and outputs, one for each layer of the pipeline (NLU, DST, DP, and NLG). And it also requires defined standards for inputs and outputs for each layer. For example, for NLU, a traditional conversational AI system requires defined ontologies.

As Meta builds for the metaverse, there will be breakthroughs in areas like self-supervised learning and building the world’s most powerful AI supercomputer to drive the future of AI research breakthroughs.