(I’m editing these notes a week after the fact, just making it readable, but I’ve left the original voice in this piece and roughness intact because I feel the live-blogging depth is more useful then a concise, thoughtful analysis would be. I’m happy to answer any questions people have, though, and I encourage you to contact the original presenters as well for more thorough details.
I’ll be posting Thursday and Friday’s notes shortly. Enjoy!)
I’m at the Artificial Intelligence and Interactive Digital Entertainment Conference 2008 (AIIDE) as I write this. Here’s the program. I’m not sure what the conference etiquette is here, but I’m planning to live blog as the event continues. The big push this week seems to be into proceduralism and “Drama Management”, what I’ve been calling Encounter Management, so I’m looking forward to that tomorrow. I cover a lot of ground in the notes here fast, particularly on the more interesting talks, so please give a more leeway on the editing, etc.
Panel Discussion: Realistic Human Characters with Borut Pfeifer (Project LMNO), Michael Mateas (UCSC), and Richard Evans (Sims 3)
I just caught the end of a panel on Realistic Characters. The questions were wide-ranging and wandering, as much about proceduralism as about AI or behavior. There was an interesting question about having 2 human reactions rather then just one. As in, if you tap someone on the shoulder, there’s 2 reactions – the “reptilian brain” jump and then the spin around “huh”. Usually we do this with one animation, but maybe it should be 2 separate states. The procedural animation discussion focused on if you could blend code as opposed to just animation data, unfortunately only raising questions. Also, some discussion of player-human interaction – once characters get real in games, shouldn’t the player’s verbs get better? Shouldn’t the AI be two-way? This was an interesting point, and one that surfaced a couple of times through the conference.
Intelligent Trading Agents for Massively Multi-player Game Economies by John Reeder from Stanford.
He’s looking at the game auctions like in World of Warcraft and Eve. Developers can control item frequency, “taxes” and fees, and ways players making money, but there aren’t great levers for developers to push, and players can easily drive the economy into the ground. He’s looking for a way to be the anonymous federal bank of the market. In low population you can provide liquidity. (I’m getting reminded of the economy crisis.) He cites TAC Classic and TAC supply chain management papers as other sources, for those interested. He focused his research on Eve because of its economic focus and the production steps of complex materials. So he pulls from real market economics to structure his AI bot – market orders (stock requests) and limit orders (sale proposals?), buy and sell orders that create a mid-spread balance sheet. Steps are Collect Data, Infer Transactions, List of Transactions, Generate an order book, and then put the item up. Just by collecting the data he can simulate what the whole of the market is doing by watching items and catching when they disappear, assuming disappearances that aren’t based on expired time are sales. This helps him identify the true price, much like a human player would. Focusing on minimizing cost and maximizing profit, he evaluates the agents on the gains over time. His process sounds good but it’s not clear how he defines success. Ah – he’s using reinforcement learning on the evaluation, and he’s trying to find the optimal algorithm. He shows that reinforcement learning does generate a better result then a traditional fixed market techniques. Seems like a straightforward and reasonable application of reinforcement learning, although he’s not taking the final step of showing how designers can control the market significantly better with this work. Interestingly, having his bot guessing prices ahead of time didn’t help, because the time cycles were so short and the markets move so fast. He also plans to compare the bot’s success to humans traders, although from a design point of view there’s not a need that it be superior. Question: is cheating if you tie the economy to real money? *chuckle* Where’s an ethicist when you need one?
These 20 minute sessions are fast. This is going to be a long set of posts!
Learning to be a Bot: Reinforcement Learning in Shooter Games by Michelle McPartland
She’s trying to reduce code through reinforcement learning, although that dodges the production issues with it (we don’t care about how much code we ship!). I missed her other reason, unfortunately. She brings up that there’s 2 versions of reinforcement learning, Tabular and Generalization, which I wasn’t aware of (shows how often I’ve built this stuff). She’s using the Tabular method, with the Sarsa-Lambda algorithm. There’s 4 parameters, Learning rate, greedy value, discount factor, and trace factor. It’s been researched in racing, RPGs, fighting, and squads in FPSs. Her research is in FPS bot training navigation and combat. It’s funny, researchers use FPS for the same reason we create them, it’s one of the simplest designs. She chooses her own discrete reward values, which seems unfortunate (is getting items or killing someone more rewarding more frequently?), but I’m probably missing something. She did run several different reward models. Basically it seems like it comes down to what you measure, rather then how much. Intuitively, seems like that means the model is too simple. The stories of how the reinforcement learning gaming the reward system are pretty funny – camping the item spots because that’s what’s rewarded. For navigation, she didn’t find she needed planning – that reactive behavior worked quite well. Good reinforcement for current obstacle avoidance models. Even given how simple this is, it’s interesting that she’s able to fix the rules and just vary the rewards to get such different results from the bot. Biggest takeaway is that the bots got very different personalities. But that’s part of what makes reinforcement learning hard to use – small changes lead to big emergent differences. It seems like the biggest potential here is the thing that I recall Unreal did, not less code but more unpredictability and different bot styles. A good question: could it be used to also help train the player. I’m also surprised they used so few trials – is that common? If I had these sims and was going for data, I’d run it for a thousand different variants and train them against each other. Ah, the learning trials were 500 iterations, but they didn’t generate more then a few bots. I’d actually love to see this research applied to a Starcraft bot, which is a notorious problem.
Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games from Maria Cutumisu
She’s looking at improving NPCs that lack adaptability so that we can produce more captivating stories. She seems to be focusing on combat “stories” though. In Neverwinter Night, she’s looking at generating adaptive behaviors without programming. Cool! First finding was that fixed learning rates or decaying learning rates learned too slow in games – action is too fast. Actions can also be lost if they aren’t learned early. So she created ALeRT to replace Sarsa-lambda algorithm, using action-dependent learning rates, trend detection, and exploration rate decrease/increase with wins and losses. This creates a pretty interesting return for reinforcement learning – she’s essentially added an AI to her reinforcement rewards to better tune them. If it works, this is actually pretty useful – it would allow people to be much more hands off in the highly dynamic game environments. Well, her data says it’s not always better then basic reinforcement learning. Over time it’s actually less optimal. But it adapts substantially better, which in the game’s highly dynamic choice-space is more valuable. The most interesting of this set, I’d say. Against or with humans, I expect it would do much better. It’s higher complexity does make it harder to implement, though.
And we’re on to the Singular techniques section. In a switch up, Horswill hops in first. He wrote twig, a procedural animation and simple physics library, open source and runs on XNA, to be a platform for some emotion research. It’s pretty cool – “fast, AI-friendly, scriptable and authorable” and open source. He shows a great demo of parent child attachment, including some great mother-leg grabbing, then running off to play, then runs back. The library’s based on Jakobsen’s work on the Hitman engine from GDC 2001 on Advanced Character Physics. The trick is you represent character bodies as particles – point masses connected by massless rodes and represented only with position and time. Thus can easily capture state of system with only 2 frames. So if something is out of the way you just move it. Then you use rigid distance constraints as springs to keep it together. Moving hands becomes really easy. Emulates real life, constraints, physics, and IK sim just handles everything else for you. No joint angles, just c artesian coordinates. It need help to handle joint constrains, that’s a bit harder without angles, but it’s worth it. Horswill‘s got a posture control system modeling the spine. I had a friend actually propose something very similar to this idea before, but..
OK, problem one, no conservation of mass and energy. The forces are directly applied. They break things using real physics. Horswill cites Perlin’s gait generation – move the center and using balistic motion to drive legs when they get stretched to far. twig also has sensing and attention. Collisions generate “pain” and attention and a gaze reaction. He’s using the library to create a web comic – used XNA’s pipeline and Google sketchup to put the art together (although it takes C# to define collision volumes and prop actions). Interestingly, holding some object, the “something” can just drag the arm around. (How does he keep from breaking the arm? Common Sense character relative? Nope, just loose snapping, unfortunately) He’s added behaviors such as Hold and Write for props, Walk, Gesture, Speak, etc. The Hug, reach, grapple, drag stuff is usually really hard and he makes it sound trivial. Scripting is RPC interface or read from a text file. He cites some related papers, Jakcobsen, Veriet, PhysX doing “Position-based physics”. Procedural character control from Badler, Goldberg & Perlin, etc. Well, ok, I didn’t know this was this far along yet.
Problems: twig’s bad at accurate simulation, photoreal, handle complicated collision, or path planning. But twig’s good rough-and-ready character behavior, relatively expressive motion and dynamic (even though it’s inaccurate), and seems expandable. Neat to see this stuff in open source, but most of these problems are the core things that need to work well in games, so this still likely has a long way to go for complex commercial applications.
Combining Model-Based Meta-Reasoning and Reinforcement Learning for Adapting Game-Playing Agents from Patrick Ulam
As in, “How can agents act in highly complex non-deterministic limited information environments?” Ulam proposes combining existing approaches. Use symbolic reasonings with numeric machine learning, localized failures, and adaption. In model-based meta-reasoning the agent reasons about its failure, and then tries to target and correct it. Big weakness, a lot of expert knowledge is needed to use this. Reinforcement learning, on the other hand, requires little knowledge and can get optimal, but is slow and suffers “curse of dimensionality”, it doesn’t scale. Not so good for these environments. So he combines both – uses meta-reasoning to localize the failure, and then reinforcement learns against the failed solution. He tested this approach in FreeCiv – defending a city for 100 turns using the TMKL model – Tasks, Methods, and Knowledge separation. E.g. what is done by the computation, how a computation is done, and concepts used (i.e. data). The adaptations and failure database itself is first designed by an expert. The model-based reasoning tries to find the task in the database that failed. The reinforcement learning is used to help build new libraries. Model-based in general does pretty well, but here he got slightly better results with this hybrid. Hybrid does reduce the amount of knowledge engineering necessary and reduces training time, but you still need an appropriate model. It’s interesting how focused Ulam is on the gameplay side possibilities for this – but this is an interesting step along the path for smart design tool. Most of this research can’t be put into development until it’s much more developed and has better interfacing, and no one seems to be focusing on that end of it yet. Are they leaving that up to development?
TAP: An Effective Personality Representation for Inter-Agent Adaptation in Games by Chek Tien Tan.
Tactical Agent Personality – synthesizing adaptive inter-agent behavior. Oh, he just criticized the Diablo 2 Necromancer pet AI as being too blindly dumb and then said “there’s no Blizzard AI guys here, right?” (Must.. keep… from… laughing…) “Let’s fix that” he says, by trying to measure player and/or NPC motivation, extracting Agent motivation from its actions. For players he just tracks number of uses over time, for NPC he uses chance-to-use. Then for each interlude, he calculates an error value to back propagate into a neural network. Whoa. Where did that come from. OK. Think I’m fully over my game programmer reaction to unnecessarily complex AI techniques. For now. I’m here to be proven wrong. So he’s using the neural network to predict what the agent’s goal is. His latest research is to switch his weights from the actions to the action transitions, so that he can build one-step deep combo sequences. Wouldn’t this skew the neural net badly in situations like fighting games, where combos go to 2-5 steps? Plus, one-step combos are almost always designer-created, so the expert knowledge is there without needing the neural network to find it. More interesting use would be for long term strategies – aggregating big picture patterns in an RTS and then predicting tech, econ, or attack, for example. Although here, it seems reinforcement learning does pretty well to, and expert knowledge might even be the best design (we want to train rather then win). Maybe for helping designers predicting emergent behavior? A good question on why he aren’t searching for the higher personalities traits that the net could communicate back to the user. But I’m not convinced this is a fruitful solution for games yet.
Offline Planning with Hierarchical Task Networks in Video Games by John-Paul Kelly
After lunch off-site, we’re back for some tools related help. “A method that uses AI Planning to generate game scripts automatically” in Oblivion. Scripting is manual off-line planning by a human being – takes time, is costly, error prone, and doesn’t scale well (depends on the design of course, but generally true). He says we could try on-line planning, (UT, Fear) but scripts are standard, easy to implement broadly, fast, can’t be interfered with by the user, so let’s consider off-line. Can AI tools generate scripts automatically?
Hard but interesting: almost “Can you script fun with AI?” His approach – represent the daily plans of NPCs in an Oblivion town as scripts that map actions in a plan to AI behavior packages. Game programmer writes a hierarchical task network, then the planner runs through the HTN, and sends the results to the translator. The script interprets the translation and sends it to the game. The gain here is that the hierarchical task network is an abstraction of game knowledge and can be reused cross-character. Each node is an abstract task -> method-> atomic tasks. It’s hierarchical goals, basically, like a decision tree. Attributes drive the goals chosen, can be character or world based. (It’d be nice if they were not just linear decays in the planner, more realistic algorithim.) Used JSHOP2, open source HTN planner. Plan that comes out represents all NPCs, because they are just interactions. The translator then creates an individual script for a per character basis. Then when the script gets executed – cthe script hecks the plan is still valid in world state, updates NPC state, activates AI to implement actions, and send messages between individuals.
OK, in practice not as much fun driven as living world driven. Not necessarily very reactive to the player. It’s offline, so it’s really just pre-generating scripts we’d normally do by hand, and the planner can be debugged to handle the interactions correctly every time. It’s interesting because I was just reading that Oblivion had to simplify their AI script systems down because the player interactions were breaking it. So this is ignoring the player and making the scripts more complicated. Whoops. It’s still a very creative idea. Because the scripts check world states it’s more robust then pre-scripted usually is (fixes eating even though we stole food off the table for example – although this might have more to do with the backend package?). Catch is if script is checking world conditions and not AI, then script need a lot of extra “ifs” – contingent planning – and the player can still break it during play. Seems not too difficult to set up for very predictable attribute behavior, but hardly a replacement for dynamic AI. Really how well can you do the contingent planning in script? Could also be very good for quick-dying combat, just to do an initial pass. Could you just grab a new script when the current one failed? It might be more interesting to make the scripts themselves more akin to swappable AI packages. I’ll mention it to him.
Simulation-Based Story Generation with a Theory of Mind from Hsueh-Min Chang and Von-Wun Soo
Next! Can agents generate stories through their interactions? Arbitrary plans probably, but not usually story – they need some new AI. Chang believes it’s a mental-level interaction among agents. Chang separates Agent and Patient – influencer and influencee and define a social plan as a character plan whose goal or subgoal is to change the mind of others. Their characters use theory of mind to infer other’s goals and change them. The influencer creates a plan using the game’s public knowledge of character attributes (envy, fear, etc.) to find actions that manipulate the influencee’s attributes to get the desired result. Their tests modeling Othello, creating fairly manipulative characters, for sure. Visible feedback from the influencer would be regularly required, which could make the social behavior seem silly right – “I’m trying to fool you!”. And of course, the plan has to be redone if the conditions change or the actions are not verifiable by the influencer to have happened. Most interesting here, this idea of a “social villian” could create a “new” kind of information game – players competing against (known or unknown) social agents. “New” as in done by Crawford in Legacy of Siboot, but that was 20 years ago, so who remembers that? There’s definitely potential here. Question: Do you have to replan after to every action because of interfering plans? “Yes” More evidence this works best with only one active agent and the player.
Automatic Generation of Game Level Solutions as Storyboards with David Pizzi from the University of Teesside
And finally, Pizzi, who was working with Alex Whittaker, an Eidos AI programmer. Pizzi points out game designers are reluctant to use planning to procedurally generate content because of QA issues. But planning can provide all the alternative solutions that a designer might not see. His research was used in Hitman: Blood Money, where they needed particularly good level planning tools. He interprets the levels using STRIPS interpreter on game actions. Then plans out a possible level solution using heuristic search planning, and generate a storyboard to show the result. Storyboards are normally expressive, easy to produce, universal, and easy to understand. But they describe animation or possible situations, aka a plan. The issue he ran into was that each plan implies a new drawing. He solved this with dynamic generating each storyboard panel from parts. Analyze the world state, pick the next action in the plan, and load the corresponding pane template, filling in the details of the pane dynamically. Draw the result using automatic composition techniques, drawing an atmosphere, environment and then actor layers. Seems like it could be a very useful for some games. Combinatorial explosion of possible panels? They had 90 possible actions, but only 24 templates – each template can cover multiple actions. Solutions usually numbered 25 to 40 actions for an area. In the planner, iteration led to a temporal element to account for NPCs that moved around, and designers could select walkthroughs by player’s play style as well. He shows the designer’s tool. The sheer number of possible plans here just seems to make it useless as a comprehensive check – many of the plans would be nearly exactly the same. Being able to filter down the plans would help, but then you start to miss the QA aspect of it checking all aspects of your level design. Also might be cumbersome to import the level into the tool, rule by rule. They are actually looking at the final level, this is more for design planning, pre-production level planning. It’s kind of neat to see the planning tree and walk through the options at each level though. And to rate how complex the plans coming off each potential choice in the level are, so you can balance them in the design, a critical part of Hitman design.
Modeling Culturally and Emotionally Affected Behavior given by Vadim Bulitko.
It’s 3:00, time for social intelligence. My social intelligence is telling me we should have a short break first, but to shut up about it. (Yes, clearly working.) So, Bulitko asks, “why model culture?” Because, say, you’re trying to be more realistic in virtual trainers for soldiers going to Iraq. His demo shows a conversation that would normally be pre-created. Bulitko says that would be laborious and error-prone. Why not model “procedural culture”? He defines “schemas” that are a framework of cultural knowledge and “shared symbols” that are shared icons of a culture. We can build a library of schemas and create plans to show what steps a cultured NPC would support, i.e. rates highly. As an aside, in this plan we can also model how NPC buys into a player’s promise-commitment by measuring the player’s cultural commitment, “trust”, a future predictor initiator the player will follow through.
He combines this with procedural emotional behaviors. Using Smith and Lazarus’ appraisal theory, emotions are modeled by the appraisals of current state with respect to goals and beliefs. Tie your procedural culture plan output to a higher decision level that maps to desirable and undesirable emotions, and then you can get one weight to evaluate each potential plan. Combining plan layers models humanistic emotional reaction alongside culture affronts – ie disrespecting Islam.
Basically this is just a dual layer plan evaluation system, emphasizing culture and emotion, rather then just emotion as games normally handle it (behaviors weighted by emotional impact). It seems like a pretty obvious approach and could be applied to any other situation with two factors to incorporate. Qs: How do you select the plan’s weights? In this case, he used a human expert and they watch results and then say yeah or nay, and then you debug what happened. With training, he might be able to have experts tune the numbers without playing.
Otello: A Next-Generation Reputation System for Humans and NPCs from Michael Sellers.
Advantage of 20 minute conversations: we cover a lot. Downside: tech setup between presentations isn’t fast. Here’s something from within the industry. Well, the web industry. Otello’s a web site combining Facebook and Digg, because Digg doesn’t rank stories based on who your friends are. Sellers separates out opinion (what you know) from reputation (what you heard from someone). Reputation is outside of your control. It’s what someone else heard from your associates about you. So he wants you to rate relationships by value and confidence, while keeping them one-direction, transitive. These ratings could be done by humans or AI bots, and by being transitive, humans could choose to “trust” the output of these search bots. He uses a weighted recursive aggregation search using the relationship value and confidence, as well as a “social friction” value to restrict the length of the search. This search can trace the network to find early adapters, key connectors, and sharing-killers. The whole site could be used for news story, safe chat with kids, limited visibility of individual’s content (or anything that drives funding, really.) He’s looking for a game connection, but admits it’s been trouble in the past (Facebook friends aren’t great gameplay because you just end up with competing for thousands of friends you don’t know). Mostly an idea and a sales pitch, but might be some ideas in this. Interesting proposal for using this approach to relationships as a score in an MMO, to enable UO-kind of stores, taverns, or theater feedback and gameplay aka the gameplay Sims Online tried to capture that the Sims does well.
Modeling the Dynamics of Non-Player Characters’ Social Relations in Video Games by Vincent Corruble from the University of Paris.
Working with Quantic Dream (of Indigo Prophecy) and an AI middleware company, Corruble had worked in the past towards improving dialogue in adventure games through measuring Emotions, Experience, and Personality. He wanted to add another layer: Social Relations. Usually this is scripted, he’s trying to go procedural. Starting from emotional/personality/social relations models from psychology, he searching for a model that is comprehensive, manageable, computational and implementable. He settles on the same emotional appraisal as the planning above (Bulitko), as well as measuring personality attributes (Big Five, EISN? Models) and social roles (Brown and Levinson – Liking, Dominance, Solidarity, Familiarity). His AI is event-driven. Given an event, first the raw emotions the event causes are selected. Then the personality is overlaid to get the character’s true reaction. Then the character’s overall emotional state is aggregated, and the AI behavior is changed (modified by the social role). The social role reacts to your emotions and other people’s emotions (i.e. solidarity, dominance when they are same/different). Seems like they still have quite a ways to go, not much detail here to go on yet, particularly on how the social role contributes to the model. It’s interesting, though, to see a model that tries to take into account so many diverse character factors.
The invited talk of the afternoon is David Cope, an AI programmer/artist and professor. Cope, faced with a bad case of composer’s block, decided one day that instead of writing an opera by hand he’d try to get a computer to construct Bach’s chorales for him! The chorales had a lot of competing rules, but there are a lot of examples and the structure is generally well understood. For example, each has independent running music lines, but there’s always a repeating note because they are 3 chords for 4 parts. So he wrote a program that followed the rules and came up with something that seemed to work. But it was missing something. And he wanted to handle different rules as well. He really just wanted to analyze the rules arising from a database of music. As simply as possible. So he started over, creating a Bach database, organizing it using grouping only the chords that came from the same previous note. So as the program choose notes, it could search and know where it could go next. The rules are embedded in the data organization. And he noticed Bach doesn’t follow the rules! Well, only 5% of the time. (Take that my music theory teacher – I get to break a few too!) His program’s algorithm – Eliza – randomly samples all of Bach’s chorales. It’s never breaking the rules, because it’s only using Bach’s notes. There’s no sophisticated knowledge. Just Bach. Can’t get more expert knowledge then that. The results were better. But there were still problems.
First, his program created phrases of uncertain lengths and direction-less music. No place to take a breath, no shape or form. So the program needs a model to limit phrase length. At first he tried to program form and create rules in the code. But then he tried went back to the data sampling the same way, pulling out models of a Bach phrase. This whole thing really reminds me of Encounter Manager and story arcs – matching story beats rather then music beats from data, so to speak. To measure phrase, Eliza just matches the first and last chord and Bach’s overall phrase length and then just fills individual notes in as above. He then added overall form, modulation, key changes, balance, theme, shape, etc. He just scaled bigger and bigger. Recursively building it up. I’m sure Lisp was great for this. What’s particularly interesting here is that the singer’s breath time comes from Bach’s own sense of timing, automatically, not the program’s inherent design.. He’s put 5000 of these songs on his website. He can’t even listen to them all.
Pushing beyond Bach chorales, most music doesn’t have 4 voices without rests. Different instruments, textures, styles, volumes, form. Can we emulate all that? We’d have to track character , section, cadence, variation and repeats and more all within the beats themselves. Cope gave it a shot. The Rachmaninoff piano piece he shows here is stunningly pretty, with incredible variation given his small sample set. “It means, Rachmaninoff wasn’t dead.” Poetic. The meaning of art recreated procedurally, beautifully. He does all this data by hand, meticulously copying and checking each note.
He didn’t start composing his opera until a month and a half before the premiere, and he completed it in 3 days, to the best reviews of his life.
A question whether he’s tried to do this with text. He tried it to create a world anthem, with music. In essence, it worked, although he didn’t like the data. Close to being deemed the world anthem by the UN, but it’s a terrible piece of music. The whole program right now is about 20,000 lines of code, optimized down. It’s past the “Turing Test” with 9 out of 10 faculty at a school of music. He’s had a lot of trouble getting it performed, because people know he can always make more – there’s no sense of rarity, of special. The program isn’t rare, so it’s not valuable to perform. So he has destroyed the databases. It takes a long time and a lot of work to put the databases together because they have to match the style you are going for. And es, they perform it now a bit more, but in a sense the program is dead. We took a vote on if it is creative. I voted yes – I’ve done it before by hand, and I believe it’s little different. Cope says he believes it’s not and he’s written a book on that question. I ask him about the difficulty caused when the database sample size, the expert knowledge, is too small, and he says that it’s definitely an issue, but it’s very case-specific – some data is easier then other because it’s significantly different enough from itself, while some just falls apart. But there’s no shortcut for the sample size problems the encounter approach can run into, unfortunately.
A great talk to end a day that covered a lot of ground on a lot of different. His technique is very similar to the technique I originally drafted to do procedural interactive stories, before I knew anyone else was even attempting it. It’s still surprising to see how many people have tried it and how often its worked. Gives me some good hope. The most useful part of this approach is how straightforward it is and how no beat needs to be generated before all the ones before it have been played. Less use in music, but a significant win in game narrative. Well, off to rest and get ready for Thursday.