Nvidia ACE Introduces AI to Game Characters, Enabling Lifelike Dialogues
Numerous avenues exist for engaging in text-based conversations with sophisticated language models, ranging from ChatGPT to Google Bard or MLC LLM—an offline chatbot application that can be installed on mobile devices. The frontier of AI now lies in incorporating the capabilities of LLMs into non-player characters (NPCs) within video games. Rather than relying on pre-scripted interactions, players can now engage in unrestricted and dynamic conversations.
During the Computex 2023 keynote, Nvidia’s CEO Jensen Huang unveiled ACE for Games, an AI model foundry service intended to breathe life into game characters through natural language conversations, audio-to-facial-expression technology, and speech-to-text / text-to-speech capabilities. Huang demonstrated a game scenario wherein an NPC named Jin, who operates a ramen noodle shop, engaged in conversation with a human player, responding with realistic answers that aligned with the NPC’s background story.
In the demonstration, a player named Kai entered Jin’s Ramen shop and initiated a voice-based inquiry about his well-being, leading to a discussion about the area’s elevated crime rates. Kai offered assistance, prompting Jin to disclose rumors regarding the influential crime lord Kumon Aoki, who is believed to be the source of the city’s chaos. When Kai inquired about Aoki’s whereabouts, Jin provided directions, thus initiating the player’s quest.
Huang emphasized the significant role AI would play not only in rendering and synthesizing the game environment but also in animating the characters. He stated, AI will be an integral component of the future of video games.
Nvidia ACE for Games will grant high-speed access to three existing components. The first is Nvidia NeMo, an AI framework designed for training and deploying LLMs. It incorporates NeMo Guardrails, which aim to prevent inappropriate or unsafe AI conversations. Presumably, this feature would prohibit NPCs from providing unsuitable or off-topic responses to user prompts. Guardrails also possess security measures to prevent users or potential infiltrators from jailbreaking the bots and exploiting them for malicious purposes.
Nvidia Riva serves as the company’s speech-to-text and text-to-speech solution. Within the ACE for games workflow, players pose questions via their microphones, and Riva converts the spoken words into text, which is then fed into the LLM. The LLM generates a text-based response, which Riva converts back into speech for the player to hear. Naturally, games are expected to display the responses in text format as well. You can personally experience Nvidia Riva’s speech-to-text and text-to-speech capabilities by visiting the company’s website.
Completing the ACE for games workflow is Nvidia Omniverse Audio2Face, which enables characters to synchronize their facial expressions with their spoken words. Currently available in beta, this product from Nvidia allows players to test its capabilities directly.
The demonstration, titled Kairos, was developed by Convai, an AI-in-gaming startup associated with Nvidia’s Inception program, which fosters connections between emerging companies and venture capital firms. Convai offers a toolkit on its website that empowers game developers to create lifelike NPCs with intricate backstories.
The company provides a comprehensive explanatory video illustrating the functionality and potential of its tools. In the video, players can be seen conversing with NPCs and issuing commands that involve tangible objects and other in-game characters.
For instance, the video showcases a player requesting an NPC to hand them a gun positioned on a table, with the NPC complying. Another scene depicts a player instructing a soldier NPC to target a specific location for firing. Convai’s tools facilitate such immersive experiences.
Contextual awareness within the game environment is crucial for NPCs. Notably, we recently tested an AI plugin for Minecraft that enables dialogue with NPCs. However, these NPCs exhibited no situational awareness whatsoever. For instance, we engaged in a conversation with a sheep even after killing it, and the NPC remained oblivious to its demise.