A way to deal with enormous branching factors in strategy games

dripton · on March 25, 2016

Link to the actual paper: http://julian.togelius.com/Justesen2016Online.pdf

I've written an AI for a similar game, where you can't always even iterate over all your moves for this turn, let alone do a multi-ply tree search.

Using their terminology of Action and Turn, my method was to first evaluate individual Actions using a simpler evaluation function, and apply a cutoff so that only a reasonably number of Actions had to evaluated, then combine that reduced group of Actions together to make a reasonable subset of full Turn moves, then use the full evaluation function to find the best Turn.

Writing a good evaluation function for a complex game is hard, so I picked a whole bunch of inputs and then used a genetic algorithm to find weights for them. (Let the various AIs play entire games, and let the winners breed.)

Works okay, but still can't beat a decent human player very often. (There's enough randomness in the game that the better player doesn't always win.)

In this paper, their Online Evolution beat the other four computer strategies, but they don't mention whether it can beat good human players. If it can't, it's not clear to me whether Online Evolution is a good algorithm. Beating their other four algorithms doesn't seem to be a very high bar.

scrollaway · on March 25, 2016

Small nitpick regarding Hearthstone: Its branching factor is really not that high. Higher than Chess, but not in the same category as Civilization, which gives you thousands to tens of thousands of potential inputs every turn.

In Hearthstone, you have a specific amount of options you can play (anywhere between 0 and 10-ish), and each option may have one or more targets (usually again between 0-10 ish). This sounds high, but you're able to discard a lot of those options much more easily than you would in Go/Chess (eg. damaging yourself is seldom a good idea).

There's often multiple options you can play per turn, but each option has to be evaluated under the current state of the board since playing a single option can radically alter the game state. So a much fairer approach is to look at a branching factor of about 15, with multiple successive "turns" (eg. each option is a single "turn", and you can play multiple turns before ceding control to the opponent).

scriby · on March 25, 2016

I think the branching factor is somewhat higher given that the order of actions makes a big difference. Also remember that when playing a card you have to choose where to place it, which can multiply the branching factor by up to 7.

If both of you have 3 minions on the board, each of your minions has a choice of 4 targets, which is already up to 4^3=64 choices. Then considering each order of attacks, and the fact that you can play any card of your hand before or after attacks, and the branching factor has the potential to explode.

scrollaway · on March 25, 2016

For nearly all cases (except some you can specify manually and/or discover through deep learning), you can throw away positioning which is why I didn't mention it.

As for the order of actions, I did mention that. The basis of my argument is that you shouldn't consider an entire player's turn as a "turn"; instead, consider each action as a turn.

Spellman · on March 25, 2016

Part of the search space though should also consider the global state of the game, which includes things like your own deck, which adds a further complication for forecasting branching.

And if you want to get truly smart, you also have to consider what the opponent might do, which, as viewed by the player, opens up the search space to a lot more cards both in their hand and in their deck.

So yes, each turn the branching options to play is relatively small compared to the other examples. However, the evaluation of future moves and the "true" board state is much, much harder and has a huge branching factor.

morley · on March 25, 2016

I think that's basically what their algorithm does:

> We use an evolutionary algorithm to evolve sequences of actions that could make up a single turn. The fitness function is the quality of the board state of the end of the turn, as measured by some straightforward metrics.

My interpretation of this (and your suggestion) is that they take the number of possible actions in one turn, and prune them according to how they affect the board state. That seems to be what you're describing.

Evaluating those pruned action sets across multiple turns seems like the next logical state.

cableshaft · on March 25, 2016

They claim the pruned branches based on potential actions is a novel approach, but it seems common sense to me and I'd be surprised that several other games aren't already implementing this in some fashion.

tschellenbach · on March 25, 2016

I wonder how long it will take till we find a way to accelerate the rate of learning. If you give a user (experienced with games), a tablet with the FTL game, chances are they will beat it in 10-30 attempts. AI on the other hand still takes thousands of playthroughs to get good at a game.

ivanhoe · on March 25, 2016

Well, to be fair even the youngest gamer has spent years and years as baby/toddler focused almost solely on learning how to use his/her brain to reason and make quick strategic decisions. As any parent knows babies know almost nothing and have to learn it all from the scratch and watching them figuring out the world and improving their skills is very insightful to anyone interested in AI. And we often forget how much time and hard work we all invested as kids to perfect our (basic) abilities...

Houshalter · on March 25, 2016

A good example is when you throw humans into a game with physics they aren't familiar with. E.g a 4 dimensional game. Or QWOP. Humans then suddenly don't seem so amazing without also taking thousands of attempts.

infogulch · on March 25, 2016

I feel like the ways humans and AI are trained is analogous to vertical / horizontal integration, respectively. I can't quite describe why I feel this.

mey · on March 25, 2016

An AI is typically starting with no prior experience and ability to reason. It is essentially brute forcing approaches and is basically succeeding based on the birthday paradox.

https://en.wikipedia.org/wiki/Hacker_koan#Uncarved_block

namenotrequired · on March 25, 2016

The funny thing about that link is that the original anecdote is far more enlightening than the "koan" it was turned into.

GFK_of_xmaspast · on March 25, 2016

What does the birthday paradox have to do with this.

elbigbad · on March 25, 2016

I assume because it would take a fewer number attempts than one would expect intuitively before the AI finds a successful strategy. Thus severely cutting down on the amount of time (or number of iterations) before a successful strategy is found.

Like with the birthday paradox, one would intuitively think it might take hundreds of people in a room before two had the same birthday, but of course we know that is indeed not the case.

schoen · on March 25, 2016

The birthday paradox involves the possibility of random processes coinciding at some point, which feels a bit different from what the AI is doing.

Birthday paradox: different people are coming into the room with different randomly-distributed birthdays; how many people do you expect need to come in before two people in the room will have the same birthday?

Ordinary random search: there are several people in the room with birthdays that are unknown to you. How many birthdays do you expect you'll have to guess before you name a birthday that is actually the birthday of someone present?

pavel_lishin · on March 25, 2016

I've played it probably fifty times, and I've only gotten to the boss level three times.

But it's possible that I'm just bad at video gaes.

simoncion · on March 26, 2016

Are you playing on Easy or Normal? If you've that result while playing on Normal, then you're not bad at video games.

talmand · on March 25, 2016

Ha! On easy maybe.

Kjeldahl · on March 25, 2016

So you've now moved the computational complexity to the fitness function, which needs to consider future moves as well?

oh_sigh · on March 25, 2016

Not necessarily. If there are factors of a board position that make it strong regardless of future moves, then why can't that information be encoded in the fitness function without having to play future moves forward?

Kjeldahl · on March 25, 2016

Sure, but how would you know? Sacrificing officers in chess is a typical example. Negative expected value short term, but typically long term positive when done intentionally. But generally, if you can come up with a robust fitness function, genetic algorithm is one method, although typically there are better alternatives giving faster results. In my experience, genetic algorithms can be a great start to get something but typically slow. Which matters if you need to simulate a few billion games... Interesting article anyway, thanks.

togelius · on March 26, 2016

The fitness function is pretty simple, essentially a weighted piece counter. It's true that it does not look ahead and take future states into account so might fall into a trap, but empirically it's doing very well!

markbnj · on March 25, 2016

Seems to me this is the basic difference between a search and situational approach to evaluating game position. Backgammon is another great example, and one that I think has a higher branching factor than even Go does. Minmax only works when you can construct a tree of possible outcomes that is reasonably searchable. Many "contests," like backgammon and Go, cannot be represented as a tree of outcomes so neatly. So you have no choice but to fall back on algorithms that evaluate the current positions without full knowledge of possible outcomes, i.e. heuristics, neural nets, etc.

magoghm · on March 25, 2016

In a game like Advance Wars a player turn consists of moving all your units, and very often the order in which you move your units is important, so the branching factor would be even greater than what they estimate.

shoyer · on March 25, 2016

This seems like a less sophisticated variant* of the AlphaGo approach. In both cases, Monte Carlo Tree Search (MCTS) is constrained by model for likely future moves. Here, they use a genetic algorithm. In AlphaGo, this is called the "policy network".

* This is not to say that these authors copied AlphaGo -- my uninformed guess is that this general approach of constraining MCTS has been known for a very long time.

jebediah · on March 25, 2016

If I understood correctly, they are not using any MCTS, I believe they are not doing any look ahead, only greedily selecting the best turn.

togelius · on March 26, 2016

Correct - we are not using MCTS and the approach has nothing to do with AlphaGo.

TTPrograms · on March 25, 2016

Are there theoretical reasons people use genetic algorithms for problems like this? It seems like you lose any local knowledge that something like annealing or stochastic gradient descent type algorithms use, but it's not clear to me that there are many advantages of using it versus some sort of random sampling. I guess in the case where your merit function is decomposable as, for example, the product or sum of a number of lower dimensional functions then it makes sense - but if that's true then I would think there would also be other approaches that would work even better.That is, if genetic algorithms work better than random search then your problem has some structure that would make it amenable to techniques that explicitly take advantage of that structure, thus performing even better.

sago · on March 26, 2016

You don't lose local knowledge, no. You're basically doing a bunch of parallel stochastic hill-climbings that can share information. In fact, with suitable parameters, a GA can reduce to a stochastic hill climber.

You're right that the problem structure is key. There's a key theorem called the No Free Lunch Theorem for Search which says that will always be the case. But on search problems with many related local maxima, evolution will often outperform a series of stochastic gradient ascents, because the information in one hill can be applied to another.

Of course, search and knowledge are always two sides of the same coin. The less knowledge you encode, the more search you'll have to do. But GAs allow you to encode knowledge too, in their operators, particularly.

There's been quite a bit of work done on the kinds of fitness landscapes that evolutionary algorithms are better suited for.

[NB: Used 'ascent' terminology to avoid confusion - it's a convention only, of course, energy minimization or fitness maximization]

TTPrograms · on March 26, 2016

Thank you, very good points! If you have the time to suggest some references I'd greatly appreciate it :)

ggggtez · on March 25, 2016

Go wasn't "solved" with MCTS. They ignore that there is caching of good moves (a good move 5 turns from now is probably good now too). In StarCraft, the difference between moving 36 degrees to the left doesn't make a big difference, so there are many move combinations which will provide nearly identical output. Additionally, alpha go did use an evaluation metric to measure the board value. Evolution of strategies just sounds like generic algorithms all over again

danbolt · on March 25, 2016

On downtime, I've been playing the new Fire Emblem, which has made me really wonder how the AI works on the simple 3DS hardware. This is a nice complimentary read.

cableshaft · on March 25, 2016

At least with the older 3DS Fire Emblem (and to be fair I didn't play it too much), it didn't seem particularly clever. Mostly just choosing which character is more vulnerable, moving into range (probably using a standard A* path algorithm), and attacking with their best option (plus some healing and a few stage specific things).

Pretty standard A.I. from what I could tell. But my memory might not be that great, it was awhile ago that I played it.

Is the new one significantly different from that? What stuck out for you as being particularly clever?

danbolt · on March 26, 2016

Hmm, I guess the way the enemies would line up for attack stances (being adjacent to one another to deal extra damage) caught my attention. That's probably the most notable feature I've seen in the game.

At the same time, you've laid out how the game works quite plainly, and I've read enough about Pac-Man to know that video games can seem more clever than they seem!

shuzchen · on March 25, 2016

While the AI problem of Fire Emblem isn't easy, it's likely a magnitude smaller than the Hero Academy on.

IIRC, in Fire Emblem every character gets some small amount of actions per turn, whereas in Hero Academy you have a pool of action points that you can spend among all your characters.

I mean there's some complexity beyond that simple description in FE (which hero moves first, their positioning and abilities can affect effectiveness of subsequent moves), but fundamentally the tree you must traverse is a lot wider and deeper with Hero Academy.

AstralStorm · on March 25, 2016

The problem of pool is not that bad for the search (range searches such as R trees work), but then you end up with a variant of packing problem to pick optimal options and order. That alone is NP-complete.

But the real hard problem is evaluation of moves. Even for simple games encoding of positional and temporal advantage is hard. This is really important for things such as build and research orders. Once weights are known, you may use any number of asymptotically optimal scheduling algorithms.

murbard2 · on March 26, 2016

You need to to reason abstractly about your moves rather than enumerate through every possible combinations. If you pick a general course of action and then implement the details, the branching factor becomes less of an issue. One way to do that might be to learn a representation of the action space.

xxbondsxx · on March 25, 2016

Isn't this similar to A* or alpha-beta search but with a fitness function and the random-ness of evolutionary algorithms?

I feel like it might apply well to a strategy game but for something like chess, would not fair as well. Mixing two great move sequences in chess would probably result in disaster

skittlebrau · on March 26, 2016

Very cool to see this (I worked on Hero Academy). When you say the randomness was removed, does that mean player's hands were a specific fixed shuffle or something else?

How does it fair against good human players? Would it do much worse with randomness added back?

mooneater · on March 25, 2016

Im not a hater, I like reading about different approaches, it's cool. But I dont think anything is being learned here? I wish the system would get better at the game, the more games it plays. From skimming the paper I do not think that is the case here.

mgalka · on March 25, 2016

Interesting way of approaching the problem.

I didn't realize the go solution "just" used monte carlo branching. Still impressive, but does not seem as revolutionary as it's been portrayed.

hyperbovine · on March 25, 2016

The insight was to use CNNs to prune the search space. In a sense, the AlphaGo algorithm is to MCTS as importance sampling is to regular Monte Carlo. However, even without the search part, AlphaGo crushes MCTS-based Go algorithms (see https://www.tastehit.com/blog/google-deepmind-alphago-how-it...). That in and of itself is quite extraordinary.

jwatte · on March 26, 2016

Another way to deal with large branching spaces is MCMC.

onetimePete · on March 26, 2016

If you cant find a optimal approach in the forrest of possibilities, compile a list of all actions, and start to search from the defeat scenarios?