Advertisement

Generative AI can help bring tomorrow's gaming NPCs to life

Say goodbye to dialog trees and hello like a normal person.

Dmytro Robu via Getty Images

Elves and Argonians clipping through walls and stepping through tables, blacksmiths who won’t acknowledge your existence until you take single step to the left, Draugers that drop into rag-doll seizures the moment you put an arrow through their eye — Bethesda’s Elder Scrolls long-running RPG series is beloved for many reasons, the realism of their non-playable characters (NPCs) is not among them. But the days of hearing the same rote quotes and watching the same half-hearted search patterns perpetually repeated from NPCs are quickly coming to an end. It’s all thanks to the emergence of generative chatbots that are helping game developers craft more lifelike, realistic characters and in-game action.

“Game AI is seldom about any deep intelligence but rather about the illusion of intelligence,” Steve Rabin, Principal Software Engineer at Electronic Arts , wrote in the 2017 essay, The Illusion of Intelligence. “Often we are trying to create believable human behavior, but the actual intelligence that we are able to program is fairly constrained and painfully brittle.”

Just as with other forms of media, video games require the player to suspend their disbelief for the illusions to work. That’s not a particularly big ask given the fundamentally interactive nature of gaming, “Players are incredibly forgiving as long as the virtual humans do not make any glaring mistakes,” Rabin continued. “Players simply need the right clues and suggestions for them to share and fully participate in the deception.”

Take Space Invaders and Pac-Mac, for example. In Space Invaders, the falling enemies remained steadfast on their zig-zag path towards Earth’s annihilation, regardless of the player’s actions, with the only change coming as a speed increase when they got close enough to the ground. There was no enemy intelligence to speak of, only the player’s skill in leading targets would carry the day. Pac-Man, on the other hand, used enemy interactions as a tentpost of gameplay.

Under normal circumstances, the Ghost Gang will coordinate to track and trap The Pac — unless the player gobbled up a Power Pellet before vengefully hunting down Blinky, Pinky, Inky and Clyde. That simple, two-state behavior, essentially a fancy if-then statement in C, proved revolutionary for the nascent gaming industry and became a de facto method of programming NPC reactions for years to come using finite-state machines (FSMs).

A finite-state machine is a mathematical model that abstracts a theoretical “machine” capable of existing in any number of states — ally/enemy, alive/dead, red/green/blue/yellow/black — but occupying exclusively one state at a time. It consists, “of a set of states and a set of transitions making it possible to go from one state to another one,” Viktor Lundstrom wrote in 2016’s Human-like decision making for bots in mobile gaming. “A transition connects two states but only one way so that if the FSM is in a state that can transit to another state, it will do so if the transition requirements are met. Those requirements can be internal like how much health a character has, or it can be external like how big of a threat it is facing.”

Like light switches in Half-Life and Fallout, or the electric generators in Dead Island: FSM’s are either on or they’re off or they’re in a rigidly defined alternative state (real world examples would include a traffic light or your kitchen microwave). These machines can transition back and forth between states given the player’s actions but half measures like dimmer switches and low power modes do not exist in these universes. There are few limits on the number of states that an FSM can exist in beyond the logistical challenges of programming and maintaining them all, as you can see with the Ghost Gang’s behavioral flowcharts on Jared Mitchell’s blog post, AI Programming Examples. Lundstrom points out that FSM, “offers lots of flexibility but has the downside of producing a lot of method calls” which tie up additional system resources.

Alternately, game AIs can be modeled using decision trees. “There are usually no logical checks such as AND or OR because they are implicitly defined by the tree itself,” Lundstrom wrote, noting that the trees “can be built in a non-binary fashion making each decision have more than two possible outcomes.”

Behavior trees are a logical step above that and offer players contextual actions to take by chaining multiple smaller decision actions together. For example, if the character is faced with the task of passing through a closed door, they can either perform the action to turn the handle to open it or, upon finding the door locked, take the “composite action” of pulling a crowbar from inventory and breaking the locking mechanism.

“Behavior trees use what is called a reactive design where the AI tends to try things and makes its decisions from things it has gotten signals from,” Lundstrom explained. “This is good for fast phasing games where situations change quite often. On the other hand, this is bad in more strategic games where many moves should be planned into the future without real feedback.”

From behavior trees grew GOAPs (Goal-Oriented Action Planners), which we first saw in 2005’s F.E.A.R. An AI agent empowered with GOAP will use the actions available to choose from any number of goals to work towards, which have been prioritized based on environmental factors. “This prioritization can in real-time be changed if as an example the goal of being healthy increases in priority when the health goes down,” Lundstrom wrote. He asserts that they are “a step in the right direction” but suffers the drawback that “it is harder to understand conceptually and implement, especially when bot behaviors come from emergent properties.”

Radiant AI, which Bethesda developed first for Elder Scrolls IV: Oblivion and then adapted to Skyrim, Fallout 3, Fallout 4 and Fallout: New Vegas, operates on a similar principle to GOAP. Whereas NPCs in Oblivion were only programmed with five or six set actions, resulting in highly predictable behaviors, by Skyrim, those behaviors had expanded to location-specific sets, so that NPCs working in mines and lumber yards wouldn’t mirror the movements of folks in town. What’s more, the character’s moral and social standing with the NPC’s faction in Skyrim began to influence the AI’s reactions to the player’s actions. “Your friend would let you eat the apple in his house,” Bethesda Studios creative director Todd Howard told Game Informer in 2011, rather than reporting you to the town guard like they would if the relationship were strained.

Naughty Dog’s The Last of Us series offers some of today’s most advanced NPC behaviors for enemies and allies alike. “Characters give the illusion of intelligence when they are placed in well thought-out setups, are responsive to the player, play convincing animations and sounds, and behave in interesting ways,” Mark Botta, Senior Software Engineer at Ripple Effect Studios, wrote in Infected AI in The Last of Us. “Yet all of this is easily undermined when they mindlessly run into walls or do any of the endless variety of things that plague AI characters.”

“Not only does eliminating these glitches provide a more polished experience,” he continued, “but it is amazing how much intelligence is attributed to characters that simply don’t do stupid things.”

You can see this in both the actions of enemies, whether they’re human Hunters or infected Clickers, or allies like Joel’s ward, Ellie. The game’s two primary flavors of enemy combatant are built on the same base AI system but “feel fundamentally different” from one another thanks to a “modular AI architecture that allows us to easily add, remove, or change decision-making logic,” Botta wrote.

The key to this architecture was never referring to the enemy character types in the code but rather, “[specifying] sets of characteristics that define each type of character,” Botta said. “For example, the code refers to the vision type of the character instead of testing if the character is a Runner or a Clicker … Rather than spreading the character definitions as conditional checks throughout the code, it centralizes them in tunable data.” Doing so empowers the designers to adjust character variations directly instead of having to ask for help from the AI team.

The AI system is divided into high-level logic (aka “skills”) that dictate the character’s strategy and the low-level “behaviors” that they use to achieve the goal. Botta points to a character’s “move-to behavior” as one such example. So when Joel and Ellie come across a crowd of enemy characters, their approach either by stealth or by force is determined by that character’s skills.

“Skills decide what to do based on the motivations and capabilities of the character, as well as the current state of the environment,” he wrote. “They answer questions like ‘Do I want to attack, hide, or flee?’ and ‘What is the best place for me to be?’” And then once the character/player makes that decision, the lower level behaviors trigger to perform the action. This could be Joel automatically ducking into cover and drawing a weapon or Ellie scampering off to a separate nearby hiding spot, avoiding obstacles and enemy sight lines along the way (at least for the Hunters — Clickers can hear you breathing).

Generative AI systems have made headlines recently due in large part to the runaway success of next-generation chatbots from Google, Meta, OpenAI and others, but they’ve been a mainstay in game design for years. Dwarf Fortress and Black Rock Galactic just wouldn’t be the same without their procedurally generated levels and environments — but what if we could apply those generative principles to dialog creation too? That’s what Ubisoft is attempting with its new Ghostwriter AI.

“Crowd chatter and barks are central features of player immersion in games – NPCs speaking to each other, enemy dialogue during combat, or an exchange triggered when entering an area all provide a more realistic world experience and make the player feel like the game around them exists outside of their actions,” Ubisoft’s Roxane Barth wrote in a March blog post. “However, both require time and creative effort from scriptwriters that could be spent on other core plot items. Ghostwriter frees up that time, but still allows the scriptwriters a degree of creative control.”

The use process isn’t all that different from messing around with public chatbots like BingChat and Bard, albeit with a few important distinctions. The scriptwriter will first come up with a character and the general idea of what that person would say. That gets fed into Ghostwriter which then returns a rough list of potential barks. The scriptwriter can then choose a bark and edit it to meet their specific needs. The system will generate these barks in pairs and selecting one over the other serves as a quick training and refinement method, learning from the preferred choice and, with a few thousand repetitions, begins generating more accurate and desirable barks from the outset.

“Ghostwriter was specifically created with games writers, for the purpose of accelerating their creative iteration workflow when writing barks [short phrases]” Yves Jacquier, Executive Director at Ubisoft La Forge, told Engadget via email. “Unlike other existing chatbots, prompts are meant to generate short dialogue lines, not to create general answers.”

“From here, there are two important differences,” Jacquier continued. “One is on the technical aspect: for using Ghostwriter writers have the ability to control and give input on dialogue generation. Second, and it’s a key advantage of having developed our in-house technology: we control on the costs, copyrights and confidentiality of our data, which we can re-use to further train our own model.”

Ghostwriter’s assistance doesn’t just make scriptwriters’ jobs easier, it in turn helps improve the overall quality of the game. “Creating believable large open worlds is daunting,” Jacquier said. “As a player, you want to explore this world and feel that each character and each situation is unique, and involve a vast variety of characters in different moods and with different backgrounds. As such there is a need to create many variations to any mundane situation, such as one character buying fish from another in a market.”

Writing 20 different iterations of ways to shout “fish for sale” is not the most effective use of a writer’s time. “They might come up with a handful of examples before the task might become tedious,” Jacquier said. “This is exactly where Ghostwriter kicks in: proposing such dialogs and their variations to a writer, which gives the writer more variations to work with and more time to polish the most important narrative elements.”

Ghostwriter is one of a growing number of generative AI systems Ubisoft has begun to use, including voice synthesis and text-to-speech. “Generative AI has quickly found its use among artists and creators for ideation or concept art,“ Jacquier said, but clarified that humans will remain in charge of the development process for the foreseeable future, regardless of coming AI advancements . “Games are a balance of technological innovation and creativity and what makes great games is our talent – the rest are tools. While the future may involve more technology, it doesn’t take away the human in the loop.”

Per a recent Market.us report, the value of generative AI in the gaming market could as much as septuple by 2032. Growing from around $1.1 billion in 2023 to nearly $7.5 billion in the next decade, these gains will be driven by improvements to NPC behaviors, productivity gains by automating digital asset generation and procedurally generated content creation.

And it won’t just be major studios cranking out AAA titles that will benefit from the generative AI revolution. Just as we are already seeing dozens and hundreds of mobile apps built atop ChatGPT mushrooming up on Google Play and the App Store for myriad purposes, these foundational models (not necessarily Ghostwriter itself but its invariable open-source derivative) are poised to spawn countless tools which will in turn empower indie game devs, modders and individual players alike. And given how quickly the need to know how to program in proper code rather than natural language is falling off, our holodeck immersive gaming days could be closer than we ever dared hope.

Catch up on all of the news from Summer Game Fest right here!

If you buy something through a link in this article, we may earn commission.