(A reminder. I’m editing these notes a week after the fact, just making it readable, but I’ve left the original voice in this piece and roughness intact because I feel the live-blogging depth is more interesting then a concise, thoughtful analysis would be. I’m happy to answer any questions people have, though, and I encourage you to contact the original presenters as well for more thorough details.)
Halo 3 Objective Trees: A Declarative Approach to Multiagent Coordination by Damian Isla (Bungie)
Damian Isla is stealing from me, I swear. (Kidding!) Unfortunately for me, this is totally the talk he gave at GDC 2008. He’s sketching out an Encounter Manager – an AI that designs encounters and orders AIs around. He’s still talking about it in terms of squad tactics rather then narrative though – how groups move around, retreat, create “spice” variety. I remember this sort of thing being awesome back in Halo. Isla divides the encounter space up into territories, based on the player’s approach. He lays out a 3-step, 3-pronged encounter that he says is “simple”. Within each territory, he cites “tasks”: the language mission designers for telling squad AI what to do. Tasks identify an squad’s territory and their behavior – like aggressiveness, rules of engagement, and inhibitions. These aren’t individual instructions per-say, these are captain orders. In fact, each territory is a set of discrete points, like way points, that the low-level individual AI cycles through Fight->Flee->Cover inside. So it’s easy to switch territories on the AI – it just tells the low-level AI to use points somewhere else. Very elegant. Ah, they did do it old-school style in Halo – I was right! A really neat demo of the island map where the marines springboard between rocks along the beach, switching territories. Also a really interesting demo of how the marines move and choose points – they move a lot more then I realized as a player. He draws a pretty strict line between the Individual/Squad AI (AI Engineers and designers) and the Task/Mission AI.(Mission designers and Isla). In Halo 2 they just used a FSM with script tests with testable transitions to try and solve this. But the scripts got too complicated for some missions because of non-deterministic branching that lead to almost complete state machine interconnectivity. FSM main limitation is that transitions have to be explicit, and that can get crazy.
Imperative or Declarative? Imperative is line-by-line straight “how”, whereas declarative, like regular expressions, just describe what they are looking for. Imperative has more flexibility, but declarative is more maintainable and simpler. Declarative is just flat-out better for designers, but Isla says we usually do imperative . The Sims did a simple declarative model with their environment based AI, with the agents so simple, climbing the “happy-scape” of the object-space. In behavior trees, tasks are declarative, and have priorities, can be made up of sub-tasks, and have a finite capacity (because the territories and goals have finite capacity). So he went back to behavior trees again. In an aside Isla points out that what make behavior trees unique is that the decisions are just a prioritized list (although there is equal priority that splits individuals evenly), which is just simpler then weighting or percentage. Plus it’s nice that behaviors are self-describing and explicitly know only about themselves, and no external knowledge. They fit “objective trees” for his tasks well. As Isla puts it, it’s a dynamic plinko machine that’s run thousands of times! Doesn’t sound like he’s evaluating all trees all frames – just the conditions of the current tree until it’s finished or failed, but that’s likely a developer call. Next, he runs through the algorithm, which is straightforward. I love how descriptive Isla is – he doesn’t shy away from the math, but he keeps it understandable and his techniques simple. The most interesting part is the cost function that determines what task should be chosen by which squad – because positional awareness of territories is important, but that requires inter-agent “communication to avoid making other squads look stupid. Instead, he weights areas by “all groups distance covered” – the cost of the distribution is the sum of the cost of all individual’s distance, and then takes a greedy approach.
Bungie made refinements throughout the process from designers. They added task “filters” which were type-conditions to limit who can use tasks. They used infinite costs to represent a bad filter match. They could keep tasks open or closed forever, independent of whether the initial condition was still true. Death and Living count were explicit conditions for this too because they were so frequent. Bungie also explicitly supported one-time assignment to a task. In examples, Isla showed how to do leaders with leader filters and no leader filters, and then created a “squad broken” task, that would activate when a leader was killed. (This could have been a broken override for all tasks, that shuts them off explicitly for 10-20 seconds, but I think it was just a high priority task option). Second Isla showed player pickup in vehicle encounters. He shows the vehicle filter, a player that needs vehicle detection, and then a pickup task. Designers can add this task at the top of the tree at the very end and the transitions just work themselves out.
Badness summary – designers require training for declarative languages, and there was a sometimes awkward relationship between overall level script and objectives, particularly dialogue syncing with objectives changing, and tying together enemy and ally fronts. He tried matching the two fronts in concert with a scripting version because the enemies and allies were 2 separate behavior trees. And lastly it’s not always intuitive to bucket by squad – because in Halo 3 individuals in squads couldn’t change squads when they change roles (like, grabbing a new weapon). But designers liked it, the system took requests well and it matched how designers think about encounters. It was a great quick prototyping tool. And it scaled well so everyone used it for all cases (rather then mixing with script, say). Interesting, it was also driven from the UI up, and then they chose the data and the trees. Declarative showed its chops – less direct control and flexibility by author (reliant instead on AI), but more concise representation, more manageable, and ultimately better results.
Question: How was training? Interestingly, the whole system was design requested, so design put forward the initial tree idea and owned the training, which really drove success. Question: The most hackiness was the front coordination – they just sort of globally script the major steps and gate both sides until they reach major points. Question: you’re using global knowledge, might not work for many players? Isla says true, but just design and tweak to make realistic, and give player feedback from AI for whys (animation, alert state, etc). To add, talks about awareness activation conditions on tasks. Question: Influence map for the fronts issue to do mapping? Issue would be designers like to be really specific in how things fall apart. Might work well for allies.
Navigating Detailed Worlds with a Complex, Physically Driven Locomotion: NPC Skateboarder AI in EA’s skate by Mark Wesley from EA
After coffee, talks on paths and path following. How do you make skating bots? Build skate paths in an editor. Unfortunately I missed some of the setup details, but there’s a path editor, and it looks like they are just creating short “skater lines” using curves and about 5 velocity points, and the skaters just direct navigate between the ends of the lines. In fact the skater tricks are encoded into the path itself. Wesley even had path recording from player input to make it easier – creater activates, does trick, creates path. Skaters use local checking to navigate the board along the generated path. Then on top of this there are skater profiles in the AI that influence when they do the tricks on the path, use path following, and dynamic avoidance. EA also had automated testing – because AI Controller could be attached to anyone, including the player, so easy to soak test the paths. It’s a large open world – do they need a lot of paths? Yes, Wesley says, and EA relied on QA to do it. 5,000 of them, ¼ million nodes, 465 kilometers. They streamed the paths in, which was simpler then it sounds because they were small and discrete from each other. Pros: it worked and was understandable, easy to script to constrained paths, and otherwise the random path constraints gave nice emergent behavior. Cons: skaters were constrained to the path, requires a lot of data, and paths can be invalidated if underlying world moves. Wesley had a few automated tools to fix paths when art changed, but EA mostly just did paths once the levels were done.
The Rise of Potential Fields in Real Time Strategy Bots by Johan Hagelbäck from Blekinge Institute of Technology in Sweden
Next up in the whirlwind is Hagelbäck. More bot building! He’s pulling potential fields from robotics, used primarily in obstacle avoidance. It’s an attractive or repelling charge at a point, looking a bit like an influence map once summed up. You can generate paths using the map as a cost. Looks like Hagelbäck uses it to get the destination, as well as the path. His domain was Orts, an RTS. First, find game objects, then put driving forces on the map (both from objects and goals), and then build your fields – static ones from geometry, semi-static from goals, and the really dynamic from agents. A couple of tweaks to try and get better fields, control oscillations using the field itself. After the latest version, their bot was 99.25% effective against other bots in a tournament. Interesting how it positionally aggregates to handle oscillation and can deal with incomplete information.
A Cover-Based Approach to Multi-Agent Moving Target Pursuit from Alejandro Isaza
Isaza is looking at target interception pathing, using arbitrary, undirected, known graphs, turn based model, with one target and several pursuers. Problem with MTS and A* is that they can reduce distance, but not mobility, and there’s no team coordination. It can work, but it’s not appropriate. In the demo you can see that A* just chases forever, never catches. When he says “cover” he means “corner”. The AI in corner should progressively cut off territory from the target. It’s an interesting idea, particularly in squad shooter or horror games – you can definitely make them look a bit dumb now. Is it more fun though if the player has to avoid them when they’re closing in? Isaza’s trying to solve it – by maximizing his “cover” graph rather then minimizing the path distance only. First, divide the space in two, then determine which is covered and what’s not. This gives automatic coordination if you share cover graphs and maximize the team’s cover. To get the cover graph, do a Breadth-first expansion simultaneously on each agent on both sides, prioritizing time-to-reach, which takes O(vertices) time, and stop the expansion where each agent’s graph meets. The cover graph is a measurement of what the agent can get to before the target. In a refinement, Isaza also incorporates risk using an offset –a willingness to sacrifice a certain percentage of cover to get closer. 10% seems to be optimal from his data. I’m not sure I got all the steps here clearly, but it seems like there are some really good opportunities here, particularly with, say, large vehicles trying to block off ground forces – tasks that are less about one agent’s survival, since the algorithm assumes all agents are invulnerable. But it could also be incorporated with other techniques to be generally useful too. Something I expect to see implemented for certain goal types in all AIs in the future.
Talking with NPCs: Towards Dynamic Generation of Discourse Structures from Christina R. Strong from UCSC
The last session at the conference is on Language. Strong’s investigating dialogue generation. She starts by defining “dramatic beats” – the smallest unit of dramatic action that changes character relationship and story state – aka a little piece of interaction. Facade was authored with this in mind. She’s generating discourse structures using beats automatically. Not just sentences, conversation. To do this, she needs to explicitly representing social games, characters, and social relationships. So she uses planning to make a FSM for a beat. It’s important because Facade‘s beats are all hand authored. Mass Effect tried to reduce dialogue through design, but still had 28,000 lines. She wants to minimize the amount of writing the author has to do. Explicitly represent backstory, character, relationships, and higher level author goals, creating a wider variety of conversations then a human author could. Plus, a good story should have affinity beats as in Facade – getting player to side with one character or another. And a beginning, middle, and end.
So she uses Hierarchical Task Network Planning. Higher level tasks are author goals, lowest level are dialogue. She also uses forward state progression (?), which allows for functional effects like tension. Her approach generates linear plans for specific situations, but doesn’t incorporate player interaction automatically, unfortunately (probably because they are non-deterministic). To plan the conversations, she first chooses the characters, and then identify the steps are required to get the desired result. In the middle parts of the conversation, she pads with support, questions, or past information nodes, to then force the player to make a choice. She incorporates an annotation operator so that the FSM can represent all player choices. (Although, I’m not clear here why the whole conversation has to be actively mapped instead of dynamically generated to respond to the player. I believe this is what Façade was doing).
She’s using small chunks of pre-written dialogue from a library, presumably because you can use less dialogue by matched more to goals then the old scripted situations. In her work so far, they have generated 120 stories so far, and over 2000 unique dialogue FSMs – each different information or told in a different way. It’s interesting that you could do this with vocals as well as text – it’s similar to what I was planning at one point to handle dialogue in the Encounter Manager as beats. Her most interesting conclusion was that adding just one new information node to a character can double the number of conversations. Strong wants to explore “Mixed Initiative Planning” because her current preconditions aren’t always good enough. And she want to find ways to increase player interaction. Very interesting work.
Question: What about incorporating character actions? A great question – seems possible. Facade plans actions and dialogue simultaneously. Question: Are you really helping the author do less? (Can we measure the savings? Hehe, I’ll bet Far Cry 2 is doing exactly this just out of necessity, but a good question) (Lines are annotated with tags in both approaches.) Strong points out the dialogue generated is more dynamic then pre-written because it can use dynamic information, so it might not matter. I’d like to see someone actually try to compare the work; depending on the scale and linearity of the game, I’d expect this work to be easier. But it’s questionable if it would work with vocals – whether planning lines of vocals out like this will sound natural . It’s possible voice actors would find it difficult to act out these non-linear lines.
Learning and Playing in Wubble World by Wesley Kerr
The last lecture is a language construction proposal. Kerr’s building a Sentence-Scene Corpus. He wants to build a game for kids for free, have the kids play the game using his language, record it, and use that data to build a word identification library. Why kids? Cheap labor, successful outreach, patient with dumb agents, and intrinsic desire to teach. Http://www.wubble-world.com is his site, a collection of mini-games. You talk with your wubble and it learns language from you. You tell it about game objects, actions, etc. Kerr is parsing the sentence semantic construction of the kids, using specializing weights to learn common phrases. Active wubble question-asking gives the interactions impetus. Problem, what if kids aren’t accurate? Yep, this happened, particularly with things like size and color of objects. But that’s what they are looking for, concepts. And they measure the probability distribution, and they gradually get closer to the right concept. More of an issues with sample size, because it was hard to differentiate, but this will go away at scale.
Generally, the wubbles were able to get accurate language concepts. Conclusion is that not only are games entertaining and can teach the player, they can teach the AIs right too. Kids love the online learning results, and it’s massive and cheap. Biggest catch is it’s specialized to the provided environment. They have other games they are trying as well. In one, Kerr requires team coordination chatter and they use voice->text translators to analyze the output to understand the relations between objects. Another game has an active player avatar that can demonstrate verbs directly in the world for the wubble. Question: was there “language griefing”? It didn’t help them solve the task because it made it harder, but it will be a problem.
The Past, Present, Future of Game AI by Steve Rabin of Nintendo
Looking forward to this talk by Steve Rabin. In his career, he’s talked to over 100,000 game players at Nintendo, helping them out. He’s been at Surreal, WizBang, and Gas Powered Games. Now at Nintendo doing development support, as well as AI Game Wisdom and an instructor at DigiPen and UW extension. Going from past to future, Rabin starts with the AI in Pac-Man, a game that took 17 months to create, and has individual ghost AI with no randomness, with wave-patterned attack, retreat cycle. And there’s the implicit enemy cooperation. Then SimCity with cellular automata and influence maps. Virtual pets in 1995, adaptive, emotional, memory. Remember Tamagotchi? Creatures, which is still complex today? And Thief, a new sensing model. Half-Life – integrating AI into storyline using script. In 2000 was The Sims – smart terrain and smart objects, similar to affordances. Black and White with neural nets, empathy learning, and gesture recognition (AI Wisdom 1). Fable – player reputations (not the first, an example). Halo 2 and behavior trees. Nintendogs and brain age with speech recognition that works in any language. 2005 – F.E.A.R. and the STRIPS system and enemies using the environment – going to more declarative languages rather then procedural. Forza also used neural nets to drive cars that players could train too. And Facade – interactive story, natural language processing, and the ignoring the player reasonably similar to Nintendogs. And last, he can’t help throwing in Gesture recognition in the Wii remote.
In the present, the living cities in GTA 4, Euphoria and animation. Spore creature creator dynamic skinning and animating, Fable 2 dog and Family sim. Left 4 Dead AI Director to create experience, mood, and tension. Far Cry 2‘s dynamic narrative. Very dependent on guy’s in charge understanding AI and using it. Our current AI has adequate movement, sensing, Behavior trees. It need help, voice, declaratives. What holds our AIs back? Lack of attention, experience, design vision. But it’s not a lack of CPU time or good algorithms. Really, most important, why are AIs still allowed to suck? For Rabin, it because it’s not required as the key driver of fun (see MMOs), and designers and scripters who don’t get it.
Where are we going? Cost is getting higher, risk is greater. Nintendo trying to ease it up, but that’s just one. How do we compete back? Middleware, build once and reuse, and procedural content. But deeper AI needs expression, which costs money. Procedural content? Music, story, creatures names, dialogue? Sure. But can procedural be original and reliable? SpeedTree is a success story. Yet, we see new experiences and new interfaces are expanding the market. AI also allows for new experiences. That could be the key. There’s diminishing returns on graphics, and input, physics and animation are hot right now, but the future is AI and game design working together (of course, at this conference *grin*). Taking parts of the AI and making it the Hook. Rabin points to Portal and all of these other physics games coming out now. We know this can works, see AI pets.
Yes, there are the core game AI problems. save time/money, CPU/RAM, ease of authoring (complexity, scalability, readability, debugability, robustness). But there’s also gameplay – emergent, adaptive, novel. Even though we’re always fighting how cheating works so well, AI will win. The challenge is that 50% of AI programmers feel problem is working with designers. In the near-term, the best AI programmers will get better collaboration, and agents will get larger vocabulary. Bringing up the AI programmer bottom requires awareness, education, proven solutions, and middleware. Clearly we’ll get to full actors, physical, biological bodies, subtleties of performances, let’s push it now.
So predictions: dynamic personality, style, and emotion, leading to synthesized motion in true performances. Unfortunately, it’s not clear how game AI gets better by the more cores that are coming. Instead, watch for new gameplay. Manipulate and toy with intelligent sims – organize, create, nurture, teach. Or just interacting with intelligent sims – command, conquer, persuasion, negotiation, relationships, learning. Or get inside the intelligent simulation – be part of an evolving story. Rabin also asks about Chris Hecker’s goal of a Photoshop of AI – the decomposition of AI into it’s core element. What would this be? A training tool? F.E.A.R. scripting?
If we had perfect AI, what games could we make? For designers – the deep future can be modeled with a simple technique; If you could make a game using real humans, what would you make? The humans could be mates, opponents, companions, mentors, or psychiatrists. Maybe the game would be murder mysteries, reality TV show, playing God, or leading a battle. And with a real human writer, you would have a dungeon master. This potential to recreate us, ourselves, is what makes AI so powerful.
For the researchers, tips on getting research into commercial games. Start with a problem, not a solution, prove the solution is better then others – more efficient, faster in all ways, player approved, new gameplay – and is designer tweakable. Then make a great demo, give the code away for free, and get it in front of game developers. Shoot for the stars, tackle the extremely hard problems commercial guys can’t. Go crazy. (And he’s absolutely right here. Go guys!)
(Architecting Believable Characters the Spielberg way) by Borut Pfeifer from EALA
And lastly Borut Pfeifer, talking about his next title Project LMNO, and the future of believable characters. He’s worked on it for several years, including in Radical’s spinoff research studio. Looking for these characters, with meaningful choice and emergent gameplay, what has to happen? The player has to understand the AI’s motivation – they have to have consistent rationale/motivation, and the player has to learn it, eventually. All at high fidelity. Pfeifer points to the “suicidal goblin problem”. It is dumb AI, but it’s not ever cited by players as dumb because the priorities of that character are very clear. We just need to bring this sort of clarity to smart AI.
Starting with “Spielbergness” approach, let’s run through the problems Project LMNO is struggling with. The goal is very clear expression, using technology that really creates that ultimate character. The problems he’s seen come up, examined. First, Conflicting Motivations. Watching a scene from Munich, the first murder, we see the actors have deep internal conflict, on-screen. Player needs to see this to reason about NPC behavior and see them as realistic. Plus, the NPC should convey knowledge it has that the player doesn’t (there’s someone over there!). He recommends separating out “executing behavior” and “showing intent” from each other. But when running in parallel? How do you avoid the complexity explosions? (Good points!)
Next, Reactions. Watching a scene from Raiders of the Lost Ark golden idol scene that shows body and face reactions, comprehension leading to emotion. Consideration, thought, giving the non-interactive feedback. Usually we do this with same decision making and executing behavior. But many different reactions – subtle overlay going to full body, big full body shocks to falloff subtle behavior, important reactions in the middle of non-interruptible action, and you may have to stop a reaction to do something else, and then come back later. With each Reaction, the player needs to see the connection to the world and get feedback if there is an internal change in the NPC happening. Not only that, Reactions can cause or direct behavior change and denote the change is happening. We can separate into pieces – initial, sustained, overlay, full body, but need lots of parameters – physical state, emotional state. But each Reaction may only use subset of pieces or change the entire flavor of something active. Watch out! Complexity explosion again!
Let’s switch to Attention, looking at character focus in Catch Me If You Can, the airport pilot scene where he sneaks past the agents and escapes on the plane. Here agents are searching, distracted, and focused on objects. That all requires attitude/behavior, and player awareness of attitudes. Does each world object create distraction or not? How do we do multiple priorities for targets of current behavior, or reaction? When there’s multiple, procedural targets? Or there’s non-active behaviors that might need to know about possible attention targets. How is this communicated, priorities handled?
And then, Meaning, in ET with Elliot and ET learning to communicate in Elliot’s bedroom. Objects in the scene have different semantic meaning to the different characters – Show, Play, Eat, Scary, Threatening. This makes them applicable to different behaviors for different character. And the player needs to know what meaning the NPC is applying to the target and how to change that meaning if possible. The NPC has to perform that meaning, communicate it. That means controlling how the NPC learns, where, and when, and then showing that back to the player.
Plus, Emotional Simulation, in the car crowd scene in War of the Worlds. There’s a variety of short and long term clips of the crowd and characters – frightened, panic, violent, angry, intimidated. Each has different impacts on different NPCs, based on where they started in the scene. What’s the point of all this complexity? New gameplay possibilities – believable, consistent, richer characters. Player needs to know the NPCs emotional state, and the affordances for NPC behavior that come from that state that the player can manipulate. Ideally these are as natural as possible. Emotional state has to show and hide, based on mood – how does player track it when it’s not available? Is it too much to keep track of? Pfeifer’s not thinking float sliders with traits – not directable enough. He wants emotional states that designers author rules about, and create the transitions in/out. A FSM above a normal AI system. It can affect everything below it – goals/actions become active/inactive, attention targets and frequency, tone of reactions and movement, and filter interpretations of objects and characters.
Lastly, Physicality – pathfinding, collision, world registration, object interaction, social nuance as in public/private, indoor/outdoor. Player needs to know when/how this has changed and to believe it.
So there are 3 recurring problems – control and feedback between layers, mind-body problem, the complexity, explosion of assets, and interleaving authored data where necessary. The last 2 also feedback loop each other, because procedural data requires authored knowledge to tune and case study. With Mind-body – layers can fail to talk together, so communication needs to be good about failure, and we should have higher level layers deal with failure. This is fundamentally software architecture problem. But complexity and authoring, well, all these sims are intersecting at various times, and some events are more crucial to the experience then others – looking at you in tense situation versus wandering and exploring the environment when exploring. How can we Spielbergify these important moments?
Pfeifer presents the notion of the “Sparse Matrix” – an abstract authoring idea, a possibility space of n-dimensions with emotional states, physical state, history of behaviors, physical limitations represented. Which combined factors do you need now to perform within the matrix? Does the agent need to override behavior/performance for any new combination of factors? We also need to easily define reasonable defaults and fallbacks. Pfeifer proposes search, filter interpretation, and context author to help.
You need to search your set of these things. GOAP sure – but behavior, animation planning, any content, all need to be searched. Think youtube data tags. Building and adding elements must be easy, ideally it’s declarative with no connections, but still requires a lot of run-time visualization. Filtered interpretation means reduce the number of things to search on. Build abstract sets of your things to reduce the set temporarily and get more usable results. This mapping can be shared between characters, things, as long as the source data set remains the same. Almost a way of hierarching people’s behaviors (or whatever), so you reduce the complexity for the higher levels when the complexity doesn’t matter to those levels. And reduces content needs, potentially. You aren’t removing the context, but abstracting it out, and only specializing when needed. Using overrides when it really should be a specialized context. (aka A car is about to hit me!) When considering authoring these context things – there are several kinds. Global: high level, Are you in combat, in-code global detection. Regional: one or few factors, affects a few behaviors, like what kind of space am I in, public or private. Regional is primarily used in script and filtering/mapping. And lastly, Local context – context within a behavior only. An example here is the classroom – if people are inside, it’s a different social behavior set then if the room is empty. Unless someone is being chased by the cops, then it’s totally different and there should be a global override. And Pfeifer believes this has to be driven through data and emergently.
So… the whole AI process… game events get tracked, translated into context in design and code, then filtered into meanings, which are turned into behaviors. Each system passes this down, can search and modulate this, and pass back up failure just to get the intention performed. Solved! Well, not quite. They are trying building an AI around it, and hope to have results to show in Project LMNO.
Question: In AI, is simulation or authored winning? He says he’s just trying to pick his knobs, because ultimately you have to be able to override anything. Euphoria isn’t stylistic, for example. Games don’t do adverbs yet – that’s what he’s trying to bring. (A nice point, good meme.) Question: this is very much about social acts, there’s the translation to performance level, what about the reverse: player actions translating to social acts, giving the player emotional control? Pfeifer calls for the low hanging fruit because there’s easy things players do, but player modeling is difficult, and they are trying to limit their contexts and conditions and building larger meaning slowly over time, in part because the player can only track so much. Question: cultural emotion? Project LMNO is using non-humans to help create the separation from this. Spielburg doesn’t mention working on it, explicitly, so it may be that emotions are universal.