It's interesting to watch the videos they link of deepmind playing against the top-level Stratego masters [0]. I usually find Stratego to be a bit of a dull game (less elegant and more drawn out than Go and chess), but I'm a sucker for watching top-level AIs play.
Its skills for bluffing are both fascinating and a bit scary.
It seems like it would be easier for AI to do, since it doesn't have any tells (it's easier to have a poker face when you don't have a face at all).
I remember playing poker as a kid, and experimenting with pretending like my cards were good/bad with body language. I don't think that any professional players use that approach (they just have sunglasses and a straight face), but I wonder if AI could beat humans even more consistently if it developed a way to convey tells and fake tells?
All Poker AIs developped as yet approaches Nash Equilibrium -- it's just a "perfect" strategy that wins by default because it makes no mistakes. Since you make mistakes against the AI strategy, and the sum of the game of poker is 0, you lose by default.
No poker bots yet I know of have developed "exploitative" strategies, where they deviate from the Nash Equilibrium strategy to exploit opponent mistakes.
Back when I played professionally (2012-2016, poker AIs being relevant in 2015/16) the standard was to use a bot to study the best default strategies and use expert human judgement to deviate from it against bad opponents.
They absolutely do. Since it's world cup time I'll use a contrived soccer example.
A penalty kick where the kicker can kick left or kick right. The goalie has to jump one direction, if they jump the wrong direction a goal is scored. Both people know that this kicker is great at kicking to the left side of the goal but rather "meh" at kicking to the right, so if the kicker kicks to the left and the goalie jumps left, there's still a 20% chance of scoring, but if the kicker kicks to the right and the goalie jumps right, there's only a 5% chance of scoring.
There is a Nash equilibrium for the kicker, and it can't be "always kick left" because then the goalie would "always jump left" which would give the kicker an advantage if it kicked right.
Similarly the Nash equilibrium for poker can't be to always fold a weak hand, because that's leaving money on the table because then the opponents will always fold against a raise, which would mean the player could get easy money by raising with a weak hand.
Bluffing isn’t really something one needs to compensate for. Bluffing as a game rule simply means that all hands or values may be max or min values, but the idea is that if you are making bets based on the mathematics of your hand itself this isn’t so pertinent.
There is a new player in the professional poker scene that people call "Casino Eric". He combines reverse tells with insults and jabs at the opponent. This helps him make up for a lack of pure technical skills when compared to top online players like Linus Love. This is a meta game that people have played for a long time, but "Casino Eric" is known in professional circles as someone who is naturally trying to perfect this.
Unless it's completely random, you're just giving away information. i.e. do the opposite of their body language, only exhibits body language when it's really bad or really good, voluntary vs involuntary body language.
You only use the "tell" when you believe there to be a pattern.
Otherwise, if it is truly random, I realize I can't get any info there and I ignore it, which then leads to me just having a straight face.
Is that heads-up only? Early massive overbets are nearly a coin toss in heads-up, so you break even in on a call and win on a fold, which is an equilibrium.
However at a large table, you are going to get called only by the person who thinks they have the best hand, which is a lot better than the average hand of a typical opponent.
In multiplayer you see it where ranges are narrowed, like 3bet pots or on turn/river
It matters less than you'd think because overbets imply you have a polarized range (nuts or air). You generally pick the bluffs to be hands that have cards blocking the best calling hand combinations.
Okay, that makes sense. I think it matters some still because if someone else has nuts, then they automatically know you are bluffing, and the more players still in the pot the more likely one of them has nuts.
Fascinating games. But I played a lot of Stratego as a kid, and I remember 1 was the strongest piece and 9 was the weakest. In these videos it seems 10 is strongest and 2 is weakest and that's making it confusing to watch. Did the pieces change sometime in the past 40 years or am I imagining things?
I've just watched a video https://www.youtube.com/watch?v=HYQbGHgaWbM
that talks about "original" Stratego, a game created in 1947 by Jacques Johan Mogendorff. It might be nostalgic for you.
"European versions of the game give the Marshal the highest number (10), while the initial American versions give the Marshal the lowest number (1) to show the highest value (i.e. it is the #1 or most powerful tile). More recent American versions of the game, which adopted the European system, caused considerable complaint among American players who grew up in the 1960s and 1970s."
Yeah this is one of the reasons why I find it more dull than chess.
There is an incentive to just not move your pieces, so that the other player thinks they're bombs. As a result, players only activate 2-3 pieces at a time.
In chess, on the other hand, you are constantly moving your pawns to the other side to promotion, or otherwise trying to activate/coordinate all of your pieces for an attack.
It makes me think that if deepmind for Stratego was trained to not lose instead of win, then the top strategy might be shuffling pieces and letting the enemy come to attack. No human would ever have the patience to play that way though.
> It makes me think that if deepmind for Stratego was trained to not lose instead of win, then the top strategy might be shuffling pieces and letting the enemy come to attack. No human would ever have the patience to play that way though.
Tournament Stratego uses a clock, which reduces some of the issue there. It's not hard to beat a player that does what you suggest; just send some middling pieces after each piece that moves. You'll take the weak pieces and reveal the strong pieces.
It is much more defensive than chess in general though, as moving a piece and capturing with a piece both give the other player information.
> It's not hard to beat a player that does what you suggest; just send some middling pieces after each piece that moves. You'll take the weak pieces and reveal the strong pieces.
But taking a strong piece means revealing a stronger piece of you own. That’s why I think the best strategy is to put almost all your weak pieces up front.,and wait for your enemy to reveal their pieces. Scouts, especially, are canon fodder that you sacrifice to find out information about the enemy and that you need to get rid of so that you get room to maneuver.
Having scouts in the mid/late game is important to make them play more cautiously with their spy, since a scout can take out a spy from any straight-line path.
If you know know where the enemy’s spy is he either is dead or served his purpose by killing your marshall.
So, the idea is to, in midgame, make educated guesses as to the positions
of the spy (e.g. the piece that stays close to the enemy general, or that moves towards your marshall), and sacrifice scouts, hoping to kill the spy?
Interesting tactic (and if that’s their common usage, why are they called scouts? ‘Assassin’ might be a better name)
That is one of several uses for scouts. When the opponent has scouts, it's safer to keep your spy behind one of the lakes, since they will occasionally fly across the map just to reveal a piece; you don't want your spy getting "accidentally" killed by a scout either. If the opponent sacrifices all of their scouts early you can move the spy around more easily.
Since killing the opponents Marshall with the spy is such a huge advantage, losing your spy is a big disadvantage.
“scouts are the only pieces allowed to both move and attack in the same turn. A scout can move any number of open squares forward, backward or sideways into an attack position. Once in position, it can then attack”
That doesn’t say in any way that that attack has to be in the direction of movement. So, you could move a scout 3 squares forward and then attack leftwards.
Different editions of Stratego have different wordings for scouts. Some rules are silent on moving and attacking in the same turn (implying that scouts can't move & attack in the same turn), some rules specifically state that the scout cannot move and attack in the same turn. Some rules say they can move and attack in a straight line, and some use the wording you quote.
The ISF rule is that the attack must be in-line with the move[1].
A common variation of the game is to let the aggressor win battles where both pieces have the same value. The default is a draw. This promotes aggression.
why would you make a move that would place your piece next to an opponent if it just gave them the potential advantage in a battle? Wouldn't it increase wariness?
The player in the first game had an advantage and didn't trade 10's, then for some crazy reason left his 10 out there to the only piece approaching the 10. I played a ton of stratego as a young adult, and i would never have thought that was close to an optimal strategy.
What makes you think that? The article says that one of the coauthors used to be a world champion, which suggests to me that there is some kind of competitive organized league.
I think there are probably very much diminishing returns.
A small scene is probably pretty damn good at the top. Having hundreds of thousands of competitive players helps, but even with a small sample you are probably likely to get at least some very, very strong players.
It's hard to think of a relevant real world example, but a fun corollary I'm familiar with is Fedex (Federico Perez Ponsa). He is a full chess Grandmaster, #461 in the world in chess amongst ~300k active FIDE players. By your "International Masters study hard" logic, when you drop him into Age of Empires 2, a game with ~500 competitive tournament players, his work ethic should dominate. But it turns out that the top ~100 AoE2 players are really damn good and practice a ton (easily chess IM amounts), and Fedex tops out around ~#50 in the world.
No matter how good the top players are relative to the competition tho, I feel like a large playerbase still raises the skill bar to a huge degree. There's a ratchet effect where someone figures something out, other people copy, and it breaks into public consciousness through influencers and popularizers. Then on the tail end, regular people regurgitate it for years like it's new information. (Getting sick of hearing about cognitive biases and product-market fit and dunning kruger, ffs). I've seen it with dota over the last 10 years. Pro players were always good, but now bad players are good and pro players are better. Not to mention the motivation that comes from seeing other people work hard.
When it comes to stratego, watching the linked games (from an armchair!), the human players looked relatively sloppy. Overusing scouts in the early game, too eager to trade, noticed a piece being forgotten about once. Not to say I'd be better, but it definitely looks like the scene is "for fun" and not so serious.
I completely without knowledge on the topic would imagine the "for fun" aspect is probably a pretty dominant factor.
I play disc golf pretty regularly. Over the last 10 years, it has exploded in popularity.
I'm not sure that the explosion of popularity has resulted in better people performing in tournaments.
But what has changed, is how much money can be won by competing in disc golf, and how much money is available across the sport as a whole.
The increased prize money has dramatically changed the number of people and level of competition for people playing the game seriously as well as how seriously everyone involved in top level play takes the sport. This then has a trickle down effect in the number of people and seriousness of things like training clinics, professional teachers, professional and more intelligent course construction and analysis. It pays for more analysis into all aspects of gameplay to increase the competitive edge of performers at the top.
The increased player base, in turn, pays for most of this, as it increases the potential market for the same services as all of the above. And the more seriously top level play is, and the higher the winning prize pool money goes, the more respectable the sport has become in the public eye, which in turn creates a catch-22 effect whereby players appear more willing to spend more money on the sport on those services.
All of which increases the level of play at the top. And you can see it in the quality of new young athletes that are coming up in this new environment, and how much better they are and how much more they are able to learn from the more widely accessible resources than their equivalent counter-parts were 10 years ago.
Its been fascinating to watch, and very exciting. Particularly over the pandemic, the sport has come from being called "frolf" on a golf course or in your local park, to "disc golf" with multi-million dollar professional contracts, dedicated disc golf resorts and private courses, training clinics, and dedicated PPV channels. Very cool : D
A counter-example might be the various competitive communities around different forms of boardgames, which tend to be very small, but often very competitively driven and taken very seriously by a very dedicated community, but I don't know enough about the topic to discuss : )
That type of logic is unsafe - we don't know that Ponsa is playing AoE with the same intensity as chess.
In fact, the idea that someone can train with sufficient intensity to be a high ranking chess master then break in to the top 50 of AoE at the same time suggests a lower skill saturation in the AoE world.
Call me a cynic but the fact that after almost 10 years of AI hype we are still working our way down the list of popular board games is a bit of a downer for me. I mean, having AIs to play Stratego, Risk, Go, Diplomacy and what have you against sure is nice. But there are literally billions of dollars spent on these projects and I really come to the point where I just don't believe anymore that the current AI approaches will ever generalize to the real world, even in relatively limited scopes, without the need for significant human intervention and/or monitoring. What am I missing?
AI remains better than humans at anything that has well defined rewards and small time gap between action and feedback mechanism (either naturally, like poker, or by value function engineering, like Go or Chess)
The problem here is that it's missing the "glue" to more real world applications. This is where more humdrum software engineering comes in.
Diplomacy in this is much more interesting than Stratego or beating the next video game - it mixes cooperative game theory with NLP and reinforcement learning.
> The problem here is that it's missing the "glue" to more real world applications. This is where more humdrum software engineering comes in.
This is a bold statement. The world does not function based on "well-defined rewards". The concept of "common sense", which some consider table stakes for a human operating competently in our world, is mostly made up of things which are neither well-defined, nor allow a tremendous amount of training data. Current ML approaches require both.
Correct, but there are likely applications of AI in strategic reasoning that are used in the real world (outside the obvious finance bots) by engineering a "translation layer" from real world constraints to AI-compatible value functions and back.
But in general, yes, this is why since 2013 we don't see AI making anywhere as massive strides everywhere as they do in boxed-in applications like games.
> The world does not function based on "well-defined rewards".
Humans can be neatly reduced to a "procreate, and make sure your progeny procreate" value function. A lot of apparent counterexamples to that turn out to make sense when 2nd order effects on progeny are considered.
It's been providing real value to me over the past day for practicing Spanish, explaining Machine Learning concepts, and doing fancy write-ups in LaTeX. And this one can't even use Google yet! (other research teams have already created models capable of doing so, it's only a matter of time until these innovations are brought together in one place)
If you took the word "Wikipedia", and threw all the letters up in the air, what are some possible things they'd spell once they fell on the ground again?
If you took the word "Wikipedia" and threw all the letters up in the air, some possible things they could spell once they fell on the ground again might include "kawipieda," "keiapwidi," "kipedwaia," "pwakieida," or "eipawikid." However, without being able to see the exact arrangement of the letters on the ground, it's impossible to say for sure what they would spell.
We don't know how well it is generalising to your input without knowing the training data. Failure cases however tell us a lot, such as the following:
Input: "If five cats can catch five mice in five minutes, how long will
it take one cat to catch one mouse?"
Output: "If five cats can catch five mice in five minutes, then it will take one cat one minute to catch one mouse. This is because the number of cats and mice are inversely proportional, so as the number of cats decreases, the time it takes for a cat to catch a mouse will increase. Similarly, as the number of mice decreases, the time it takes for a cat to catch a mouse will decrease. Therefore, if five cats can catch five mice in five minutes, then it will take one cat one minute to catch one mouse."
As you can see, it fails at simple logic because it fails to understand that the number of mice aren't fixed in the question. Whereas a human would immediately pick that up because they reason beyond x=5 and y=5.
Are you sure a human would immediately catch this? The question is somewhat ambiguous and I bet if you posed this question to many people they would take the oversimplified non-gotcha approach and simply say one minute for one mouse just like the AI. Of course if you abstract out there are so many other variables at play but within the confines of a simple word question the answer is not necessarily incorrect.
You could probably test this by asking a few friends this question and see what they say. Outside of pure math problems you can get into an infinite regress defining the underlying first principles behind any given assumption.
I am not claiming anything other than the fact that we do not know the training data therefore not much can be inferred about how well it generalises from some success case.
Quite interesting that it will make subtle errors in its otherwise reasonable-looking answer, e.g. "kipedwaia" has two "a"s; "kawipieda", "kipedwaia" and "pwakieida" have only two "i"s.
I have seen reports that it will happily hallucinate a plausible but wrong answer to all sorts of different prompts, intermixed with many mostly correct answers. It's interesting to think about how to place trust in such a system.
Peter Watts has a series called Rifters that explores this a little. "Smart gels" which are neural nets made up of a mishmash of cultured neruons and silicon that run most of society. They're trained just like neural nets today, and therefore their decision making process is basically a black box. They do a great job, but no one is really sure how they get there, but they work great and they're so much cheaper, so they who cares.
Anyhow spoiler alert, the neural nets running the virus response have been inadvertently trained to prefer simple systems over complex ones without anyone realizing, and decide that a planet with no life on it after being wiped out from the virus is infinitely more simple than the present one and starts helping it out instead of stopping it.
So short answer to your question is I would not place much if any trust and systems like that, in as far as anything that has high stakes, real world consequences.
This kind of AI is really useful, at least for creating video games. Game devs spend countless hours making AI which is still not as effective or realistic as real humans, a straightforward way to create good game AI from ML it would be huge.
As others mentioned, AI is making headspace in enterprise and accounting, and achieving the “last mile” of human work. Better image recognition for handwritten forms and mail, better content and sentiment analysis for reducing spam, robot arm tasks which are more and more complex (yet still tame compared to humans)
“AI” hype is indeed overrated. If you think we’re close to reaching the singularity or anything resembling skilled human work you will almost certainly be disappointed. We probably have decades of slow improvement, more and more of these “breakthroughs” which aren’t really amazing compared to a human 5-year old, and aren’t really going to revolutionize industry, but will nonetheless have practical benefits
Now game companies won't even own their ai! They will get to rent it by the minute from Google. The death of consumers having control over the software they pay for fucking suuuucks
Not neceessarily. Right now AIs like this take lots of compute and resources to train. But as we've seen with Stable Diffusion, it's likely in coming years we will scale them down and create more open-source ones so that indie devs can train them and players can run them on their own gaming GPU.
I suspect that a real artificial intelligence will require a handful of specialized subsystems linked together in the right way. I feel like the current method of research is to find problems similar to solved ones with a few challenges, then look to solve those specific challenges. Doing this for ever complex problems requires all sorts of unique solutions, and over time we find ways to combine and simplify those to get generalized systems. I don’t think the systems of today will generalize as you desire, but I think this method of research seems like a good one? What do you think?
Enterprise deployments. When IBM's Watson gets deployed to run your building's elevator scheduling, we don't hear about it. Nor do we really hear about AI powering hospital backend systems. Or eg fraud modeling at Visa. Twitter assuredly was doing ML on their backend as well.
Or on Youtube as attention-retainers. It's really good at drip-feeding content over weeks/months, and changes what videos are recommended based on the duration of the viewing session.
These board games are models of real human problems. And these reasoning and tree searching tasks are very general, and humans perform these very often in work and in personal life.
I agree that these specific models are not going to be useful outside of board games. But in the future when there is the opportunity for AIs to interact with the world for real, the this kind of research will allow AIs to dramatically outperform humans on these tasks.
There is work on zero-shot learning and on procedurally generated environments like Procgen but it doesn't go mainstream because the results aren't as cool to most as some new record on a known board game or video game.
I agree with with you. I watched live every Lee Sedol Go game in the first match. Lost some sleep due to the time difference. I was so excited. That was a long time ago. Now large language models get me more excited for progress in AI.
At least this new model is very different to the approach taken in Alpha Go.
The AI hype has been around for much longer than 10 years. Ray Kurzweil's book "The Singularity is Near" was first published in 2005. Deep Blue beat Garry Kasparov in 1997
There was a time when I thought that maybe there was something more to AI then a fancy statistical model when you need to fit non-linear data. But I’m solidly on the belief now that AI is precisely a very powerful statistical tool. I honestly think there was never any real strategy of getting AI to anything more then specialized learning for deeper inference using a lot of computational power.
Don’t get me wrong, using AI for that purpose is pretty amazing (but can also lead to some sketchy results if you don’t know what you are doing[1]) but pretending it will lead to some “general AI” is nothing but hype IMO. And teaching AI to play these board games better then a grandmaster only serves to increase that hype.
I’m not an AGI fanboy. I agree that the current line of inquiry (ie deep learning) won’t get us there. I think neurosymbolic reasoning is needed. That work is still nascent, and worse, we don’t have great ways to connect our current paradigm to it.
Now thinking about it, is there any need for an AGI model? Is this a classic case of a (magic) solution looking for a problem?
There are for sure use cases for inference models in generalized (or rather ill-understood; or even highly dynamic) non-linear systems, and deep learning models kind of ace at that—given enough training data and a lot of computational power. However I’m not really sure what we will use AGI for.
Great article. I played Stratego a lot as a kid and it always felt simpler than chess, go , or poker so it’s surprising it’s a much bigger game tree unless you stop and think.
I’m curious about the comparisons to poker. I know the hot algorithm in poker solvers is counter factual regret minimization. The article indicates that the feedback cycle is too long for those algorithms to work but I’d be curious to learn more about the relationship from CFR to what’s tried here, if any.
It’s very hard to use CFR in Stratego because you can’t represent the hidden information in memory. In poker, the hidden information is a 52C2 vector, so you can easily pass it around. Not so in Stratego, so you have to do something else. RNAD has similar theoretical foundations though.
I had star wars stratego that i played where they reversed the number values. It was characters from the OG trilogy and episode 1. takes me back! $80 on amazon for that version :(
Dota is a pretty local game. 70% tactics, 20% strategy. Maybe 10% information. Yes you have warding game but for an AI with no cost of looking at heroes inventories (humans need to waste attention and move their map) AI already has huge advantage over humans in the imperfect information part. Usually fighting into the imperfect information is the bad choice.
Stratego is 40% information, 40% strategy, maybe 10% tactics. If you know where is the flag it's trivial to win in almost all situations. Fighting into imperfect information is literally all the game.
Dota at the mid-casual and high-casual brackets (which is where you find most players) is also a social game. Establishing efficient leadership, communication and cooperation in a game gives you a huge advantage. And the low-casual and the pro levels you find it becomes more a game of skill and strategy funnily enough.
(The old joke is that Dota is a 1 v 9 game, not a 5 v 5)
I periodically still drop in to Subspace/Continuum Trench Wars and in a multi hour play session will generally spend the first hour dogfighting and then get annoyed with the performance of my assigned team and switch to a command ship and spend a lot more time typing encouraging other pilots to do the sensible thing than I do actually fighting.
Both modes of play are fun, mind, but the parallels struck me as worth noting.
It's from things like properly last hitting creeps, good reaction timing, good reaction decisions, coordinating real time actions with teammates in milliseconds resolution.
It's obviously not about clicking fast, but it is about timing, sometimes 100 milliseconds reaction time make huge difference in outcome. It is usually making decisions on very small time scales. Do you retreat or continue? Use ability or hold it? Can you overextend?
The only meaningful strategic decisions in dota (which you have long time frame of deciding and effect the game for a long duration) are draft (which AI doesn't really master, they reduced the heroes pool to simplify) and item purchases, and there are only a handful of them (~6) in an entire game. Other decisions don't really have a long "memory" time, a minute or two at the most. After two minutes every other decision is just reduced to the relative advantage between the teams.
There used to be one hero in Dota which made it a strategy game instead (techies). But it was like playing a different game and everyone hated it and it was effectively removed. Techies was like playing stratego against chess players, they obviously get pissed off by not playing what they wanted.
There are larger strategic decisions that are significant in dota. Which area of the map to play, which objectives are important and when, what type of fights will we win (fast and bursty) and when will we take them. Often times these are thought of at the beginning of the game and effect gameplay throughout.
That's pretty tactical decisions, usually made in a time window of like 20-30 seconds top. Takes like 40 second to reach more than half the map from the base. Fighting decisions usually also get tactical. I would say that 90% of the game deciding decisions have a time window of less than 30 seconds to make. 70% are in a time window of seconds. You don't make a strategic decision to fight ever, you make tactical decision when the map looks fitting. If you take strategic fights because of long term strategy (instead of deciding tactically) in Dota you will lose.
I feel that I generally know where I will play well in advance. Like before the game starts. At this point it just seems like semantics of what we think of strategy vs tactics if that's the case. As I think many of the things you see opportunistically while I decide I know when I'm going to fight in advance and I put myself in opportunistic positions at those times.
in dota the tactics is to do with the execution of abilities, often times in coordination with other agents in execution of their abilities to get combo effects while adapting to the situation as it unravels.
As an avid dota player I wouldn't agree with your characterization that 70% of dota 2 is your definition of tactics. What I've noticed differentiates player MMR the most is the strategy applied to each context. It's rarely the execution that's the problem as you can gain such overwhelming advantages through strategy.
There's barely any long term strategy in Dota, only meaningful strategic decisions are items and heroes. Even ultimate usage has like 2 minute window of importance. Wards too. And maybe the decision to push high ground because of how many times games are lost because of it, but it's the tactical errors usually making most of the difference there.
I'm immortal (6k MMR) and agree with the parent. I used to think similarly as you, because my mechanics were very good, but as I started to play with friends who are much lower ranked, I noticed that they can often execute things (e.g. ability usage) pretty well, but their overall play and strategy is desperately lacking. Things like playing the wrong areas of the map, spending their time inefficiently, never capitalizing on their strategic opportunities (e.g. not playing where they have vision, not playing in areas of the map that are near their objectives, not taking calculated risks based on available information).
I do think OpenAI five derived some of their advantage from seamless ability usage, and inhuman coordination in lane, but it also did some novel things strategically, that challenged some of the established tenets of high level play (e.g. they had a lot of mobility on the map, back when that was considered very inefficient).
I don’t think it is more challenging than StarCraft or Dota. Does the blog post claim that anywhere?
Stratego is way more challenging than Poker, though. StarCraft/Dota/Stratego have the property that you can’t represent their imperfect information as a vector in memory, whereas you can easily do that in Texas hold’em poker (there’s only 52C2 = 1024 possible hands). So for those games, you have to use an approximate distribution rather than the exact one.
I’m an author on this paper (although my contributions were relatively minimal) and on the Player of Games paper (which did Poker, Chess, and Go).
> I don’t think it is more challenging than StarCraft or Dota. Does the blog post claim that anywhere?
My very naive question question being: Why tackle this after SC and Dota have already been done? What is the scientific interest? Stratego seems strictly simpler than both of these games. In what way is this an advancement over how SC/Dota AI were solved?
Great question! I would say the main reason why this work is significant is that StarCraft agent was bootstrapped from human replays, and Dota agent did not have game-theoretic guarantees on minimizing exploitability (i.e. how far the strategy is from Nash equilibrium). The R-NaD algorithm with neural nets (behind Stratego) starts from scratch, and has game-theoretic guarantees.
In principle, the AlphaStar's league approach (from StarCraft) could be done also in Stratego, and it would be very interesting to compare the two approaches. Note that AlphaStar is more expensive: it required to train N competing agents with pair-wise evaluation costing N^2, while Stratego's NeuRD trains a single agent.
Starcraft and Dota benefit a lot from having good micro. Stratego seems to be only macro. Micro is easyish for AI and requires less long-term thinking to get benefits from.
Starcraft especially only resembles a strategy game in GM (and maybe high masters). Below that, the strategy is mostly macro better so you have more units.
I remember seeing a version of the paper earlier in the year (it talked a lot about getting the bot to be aggressive to avoid stalemates).
Feels like the secret sauce has to be probability distributions guessing what all the pieces are.
Bluffing in stratego seems like it requires long-term planning (if you move a 2 like a 10, you have to keep treating it like that for the bluff to work).
On a related note and surely interesting to the HN crowd I invented Clesto which is similar to Stratego but a bit more like chess, based on an old chinese game: https://clesto.com/ - it is more quick to play and has open information.
Ha, amazing! I always thought it was a thing but then occurred to me that it was maybe just my friends and me who said this. I'm glad to know it's a thing.
Thinking back it may have been the sergeant who was the primary Hat Guy as I remember there was "Hat Guy" and later "Other Hat Guy".
Tangentially taking this opportunity to mention the far-superior "Lying and Cheating" version of Stratego, that (as far as I know) my father invented.
It makes the game so much more interesting, IMO. Played it a lot as a child.
Here are the basic rules, when a piece is attacked:
* The attacker says what their piece is, without showing it (they can lie)
* The defender says whether they believe that
* The defender says what their piece is, without showing it (they can lie)
* The attacker says whether they believe that
* ONLY IF someone calls a bluff is that piece revealed. Otherwise, it is treated as the piece it was claimed to be, and kept hidden.
* If someone calls a bluff, and they were right, then the other player loses a piece (reach over and remove any piece you like)
** If you pick their flag, then you win — game over.
* Likewise, if someone calls a bluff but is wrong, then *they* lose a piece.
* After all of that is resolved, do combat as normal, with pieces having either their revealed or not-revealed claimed value, as appropriate.
Once you resolve all this, there is no "memory" - you can claim it is a different piece in the future.
Some minutiae:
* You can move any piece as though it were a Scout (9), but when you do the move, the other player can call your bluff since you're essentially claiming it is a Scout at that moment. Resolve that bluff/call before completing the move.
* You could even call a bluff on *any* move someone makes, if you believe that piece is a bomb or flag (and thus cannot move).
* You can attack with a bomb! It's a two-step process: first you move (and they could call your bluff, if they know it is a bomb - see above). Then, when the attack happens, you say it *is* a bomb. Of course, your opponent may say their piece is a Miner, and if you haven't seen it, it's a dangerous proposition (since bombs are rare).
** You can also do a variant where bombs can't attack (by attacking, you are claiming it is *not* a bomb). I prefer the above version.
Overall, I find this version of the game is a lot less boring. Since you'll probably get several pieces zapped over the course of the game, it affects your flag placement. Plus, you can move flags and bombs, making it more dynamic. Also, the "remember where things were" aspect is even more poignant, since once a piece has been revealed, it loses all the power of being whatever-is-needed-right-now (assuming the other player has a good memory).
So, for instance, you can do something crazy like move your bomb as though it were a Scout, all the way across the board, onto an opponent's piece, but then claim it's a "5" instead for the attack. Then if it survives, just let it sit there, continuing to be a bomb in the future (causing havoc).
We did something simpler in the same spirit, borrowing from poker rules. When you attack, you must reveal, but as the defender, you can remove your own piece without revealing what it was (even if it would have been the winning piece, although removing a winning piece is very seldomly of actual strategic value). Even this slight tweak goes a long way towards spicing up the game.
One of my favourite things about watching AIs learn and play games I’ve played is seeing if they come across the same weird strategies I did.
I remember my brother and I as kids playing Stratego and discovering the “impenetrable bunker of bombs” to put your flag in. Which evolved to “put a scout in as a ruse” and later “don’t actually enclose it because now brother just assumes it’s enclosed.”
What happened to mastering StarCraft? Why did these guys give up after a mid tier pro player defeated the bot handily on live stream? They were super enthusiastic up until that point.
Just look for deepmind StarCraft in YouTube. It won the first event but people came on top during the second event which turned out to be the last one if memory serves
Its skills for bluffing are both fascinating and a bit scary.
[0] https://www.youtube.com/watch?v=HaUdWoSMjSY https://www.youtube.com/watch?v=L-9ZXmyNKgs https://www.youtube.com/watch?v=EOalLpAfDSs https://www.youtube.com/watch?v=MhNoYl_g8mo