Hacker News new | past | comments | ask | show | jobs | submit login
Mastering Chess and Shogi by Self-Play with General Reinforcement Learning (arxiv.org)
539 points by dennybritz on Dec 6, 2017 | hide | past | favorite | 270 comments



This is an incredible demonstration that the AG Zero expert iteration method is a general method. If you go back to the discussions of AG Zero lo a month ago, there was a lot of skepticism that NNs would ever challenge Stockfish et al - they are just too good, too close to perfection, and chess not well suited for MCTS and NNs. Well, it turns out that AG Zero doesn't work as well in chess: it works better as it only takes 4 hours of training to beat Stockfish. This is going to be an impetus for researchers to explore solving many more MDPs than just chess or Go using expert iteration... ("There is no fire alarm.")


See the thing is though, Giraffe's evaluation actually was better than Stockfish's evaluation function, but it took much longer, and thus wasn't able to search as deep as Stockfish et al. So in a way, the real triumph of the AlphaGo series was the TPU and GPU army.


> So in a way, the real triumph of the AlphaGo series was the TPU and GPU army.

Eh. It's still searching many fewer positions than Stockfish is.


Unlike in most algorithms where correctness and performance are independent, chess engines can't be evaluated without testing performance at the same time; faster is not just faster, it changes the results.

So there is a tradeoff between the depth of the search and quality of evaluation. For traditional chess algorithms, better evaluation was rarely worth the cost; it would slow down the search so much that it didn't pay for itself.

But this performance tradeoff (like all optimizations) critically depends on hardware. Change the hardware and you change which optimizations are "worth it".

AlphaZero is clearly good at using TPU's to maximum effect. But what would its performance be in a CPU only environment? Maybe dumb but deeper searches still win there? This evaluation hasn't been done.

This isn't to say that the AlphaZero evaluation is "unfair". Rather that chess engines evolved to be too dependent on their environment. Getting maximum use out of CPU's is a strength, but not being able to use TPU's or even GPU's is a weakness.


Agree with this. Stockfish is fast enough to run on modern iphone and Android phones. AlphaZero most probably not.

But the fact that a generic algorithm absolutely destroys humans and their human crafted programs is the most interesting.

Yes TPU + GPU army is a huge amount of computation power but I'm sure they'll be research coming out trying to compress the algorithms enough to use the same computation power as stock fish.


It searches fewer positions because it decides where to search using 4 TPUs, which are 180 teraflops each according to Google.


That's not clear, each (second generation) TPU is 45 FP16ish unspecific TFLOPs. A single board consists of 4 TPUs at 180 TOPs total. This is similar to the Dual P100 NVLINKed Quadro which is an absolutely killer HPC/DL card. I believe they have a similar Volta option, but that kind of HW is above my pay grade these days.

Further, they used 5,000 (first generation) TPUs at 90 INT8 TOPS each, page 4, to run the network during MCTS and 64 (second generation) TPUs to train this thing according to the methods. That's a nice mix of using INT8 for inference and FP16ish for training IMO.

In contrast, I personally own 8 GTX Titan XP class GPUs and 8 more GTX Titan XM GPUs across 4 desktops in my home network. I'd love to experiment with algorithms like this, but I suspect I'd get just about nowhere due to insufficient sampling. These algorithms are insanely inefficient at sampling at the beginning. So I guess I will seed the network with expert training data to see if that speeds things up.

That said, more brilliant work from David Silver's group! But not all of us have 5,000 TPUs/GPUs just sitting around so there's still a lot more work/research to make this more accessible to less sexy problems.


It's definitely worth a shot reproducing the results.

On the other hand, Google will make a shit load of money when they make TPUs available on gcloud. Papers like this are great marketing for them.


I wonder how much the hardware would cost to rent for a researcher not working at Google?


So 3 2nd generation TPUs are ~= 1 Volta class GPU ~= $3 per hour on-demand on AWS: https://aws.amazon.com/ec2/pricing/on-demand/ and ~$1 (75 cents at the moment with p3.8xlarge and its 4 GPUs) in spot: https://aws.amazon.com/ec2/spot/pricing/ if you take the time to build a robust framework.

And to make things simple, let's do it all in FP16 because INT8 on Volta ~= 1/2 a first generation TPU, but FP16 ~= 3 first generation TPUs at INT8 (sad, right?), an accident that occurred because P100 didn't support INT8, but consumer variants did.

So, 5,064/3 = 1,688 Volta GPUs ~= $5000 per hour, probably half that reserved, a quarter of that in spot.

Say you need a week to train this, so $200K-$800K...

You can buy DGX-1Vs off-label for about $75K. Say they costs $20K annually to host. Say you use them for 3 years, so total TCO is ~135K, which comes down to $0.64/hour.

Conclusion: p3.8xl spot instances are currently a steal! But I don't have ~$200K burning a hole in my pocket, so I guess I'm out of luck.


Google provides 1000 TPUs free of charge to researchers https://www.tensorflow.org/tfrc/


I don't think that the specific numbers are relevant for what deepnotderp and I were saying: that Giraffe already demonstrated the potential, and all that was missing was a boatload of compute.


P100 is 20 FP16 TFLOPs, V100 is ~30. So 4 TPU gen 2 is ~9 P100 or 6 V100


TPUs aren't "cheating" though, as they can be used for generalized machine learning models, and not just Go.

Computer graphics is still an impressive achievement even when done on a GPU instead of a CPU.


I think his point is if you devote X Flops to something then a fair comparison would be to also give X Flops to the competitor. The specifics of how an algorithm does not matter as much as the total resources used and outcome.


A more fair comparison would be to cap the hardware used at a certain cost. That's much more reflective of the real world. There are plenty of tasks that perhaps you could do more efficiently on a CPU for a given number of operations, e.g. maybe some graphics operations, but in practice it's completely irrelevant because a GPU gives so much more performance for the given cost. There's nothing special about an operation, but dollars do matter.


Only if you're buying hardware based on the algorithm used. Useful chess programs need to actually run on people's phones where performance on a cluster of ASIC's is mostly meaningless.


I think we need to start capping total electricity and total $$$. I'd love to see AlphaZero 20W pitted against that other 20W supercomputer. When humans fall to that, be afraid(tm).

I'll even be charitable in order to simulate the existence of school/teachers/books: training from the start gets 2KW. But gameplay still gets capped to 20W.


Electricity isn't free though; why can't it simply be rolled into cost? Just assign it a standard cost per kW-hr and charge accordingly. This more accurately reflects economic incentives driving hardware development.


Sure, why not, but then how do we compare to a human?


I don't think you can, and such a comparison is not really needed here anyway. People are not chattel slaves and cannot be racked into data centers to solve boring problems.

Of course, you can hire people, and that has a well-defined cost, so it does all come down to money again.


Sure, but the whole point of the above idea is to compare our 20W computers to what we can build that eats 20W. And don't give Silicon Valley ideas about disrupting the lucrative Mechanical Turk ecosystem by scaling it up with ideas borrowed from growing veal because some VC sociopath will take it seriously. Just sayin'...


And I'm saying that this 20W limitation isn't particularly meaningful, as many organizations have way more power at their disposal to throw at a problem than that. The economics of a given solution, on the other hand, is applicable at all scales.


Meaningful in the sense that if an AI plays against humans, is it smarter at the same energy efficiency of humans.

We are comparing machine intelligence vs human intelligence.

It can be said that with more computational power, you can raise intelligence. Human brains consume the most power relative to body size than any other animal.


This argument is about state-of-the-art chess, not chess as a mobile phone game. Humans are so bad at chess compared to the best programs now that even a smartphone app can't be defeated by people.

Also, mobile phones have Internet access, so there's no reason the algorithm has to run on the phone itself. It could run on TPUs in the cloud. It's common for many games to have server-side components. Though this isn't even necessary except maybe if Magnus Carlsen wants to play it.


I think you misunderstood. Sure, if you are willing to deal with the increased costs and lowered reliability you could write a chess program that required massive server resources.

But, I don't think a lot of people would pay for that vs. having a program that just runs on there phone and still beats them. So, in practice without a significant subscription fee you are going to be limited to cellphone hardware.

PS: In practice most games take about as much computing power from a server as a chat app as companies need to pay for that hardware. Remember 1,000,000+ X get's big unless you keep X very low.


Again, this entire article and discussion is about state-of-the-art chess. As in, literally working to "solve" the game and develop optimal strategy. I don't understand what relevance casual mobile chess games have. Computer chess is already very far beyond human capabilities, and it can't be pressed further just using mobile phone hardware (nor is that a reasonable restriction).

It'd be like in a discussion about SpaceX's BFR designs to colonize Mars, someone comes in and questions why they're using retropropulsion since the requisite control systems are infeasibly expensive for amateur model rockets. It's a completely different discussion.


That's not why this is relevant. Given equivalent hardware it's still a worse solution for chess. The value is you can get results of similar quality with vastly more compute power even without 1,000+ years of analysis.

Otherwise the only takeaway is this failed to improve the state of the art.


"Equivalent hardware" is only relevant if we're talking about cost. When measured by that metric, the TPUs are indeed superior. Raw operations is an irrelevant metric given the existence of economic purpose-specific hardware that can perform a lot more of the operations required for matrix multiplication than for general computation. GPUs work exactly the same.


Again, cost is relative to hardware you have. If you own a supercomputer already and you want to run chess on it for whatever reason it matters what the performance you get from each algorithm on that hardware. If your going to buy new hardware it's design depends on performance across every algorithm you expect to use.

So, the only case where chess performance per $ matters is if you are only going to ever use that hardware to run chess. In every other case which is the vast majority of the time you care about diffent metrics.


Why? Just do the computation in the cloud.


Several phones already have neural net acceleration hardware in them today, including the latest iPhones.


Some iPhones are manufactured with this, but again if you have paid for the hardware you care about performance on that hardware. If you have yet to buy anything then theoretical performance per $ becomes the meaningful metric.


Same with the Pixel 2. But the Pixel 2 appears to be a bit more powerful than the iPhone neural chip. The PVC is able to do 3 TOPS but we really need instructions supported and word size to truly compare.


But better evaluation gives you asymptotic speedups. You can give Stockfish several times its computation (which is already a lot, I mean, 64 threads, come on) and it doesn't make good use of it since it just runs into the search wall. If you gave Stockfish the equivalent in CPU power (and I'm not sure this is a fair hypothetical since part of the appeal of NNs is that they have such efficient hardware implementations, so it seems unfair to then grant a less efficient algorithm equivalent computing power by fiat), I'm not sure it would be restored to parity or superiority.


Absolutely. This required an exorbitant amount of compute, but DeepMind had to do novel, nontrivial research to make use of those resources.


Edit: DeepMind's victory over Stockfish didn't need novel research. Giraffe already demonstrated that the asymptotic speedup was possible; it just needed more compute.


The number of positions evaluated is the number evaluated. Speed doesn’t change that.

Speed probably made the initial self play training quicker though.


Compute absolutely matters. With tree search, there's a tradeoff between scoring cost and positions evaluated. AlphaZero can evaluate fewer positions because it uses a huge amount of compute to accurately score each position.

It's not just training. Training used 5,000 TPUs.


Can't those teraflops be applied to evaluating more positions instead of deciding which positions to evaluate?

It seems that the metric should be compute time, not positions evaluated.


Presumably they both had equal clock time - that is a standard chess rule so it would be surprising to see it different.


Wall clock != cpu clock

I can do more in the same wall time with a faster cpu(s); I can afford inefficiencies that the opponent cannot, and accomplish just as much.


Right right, but my comparison was between giraffe and AlphaGo , not neural networks and Stockfish.


But it looks like AlphaGo is searching fewer positions per second than Giraffe did.

AlphaZero evaluates 80K positions per second, according to this paper, and the Giraffe paper says that Giraffe averaged 258570 evaluations per second when running STS.

While we can't directly compare the computer power, this implies that AZ has learned a better representation.


It's unclear how much of the "better representation" was due to better algorithm vs. more compute/deeper NN.


Giraffe was trained until convergence. Maybe if there was more compute power then, a different model would have been used, but that's deep into the world of silly hypotheticals.


By starving the competitor of computing power, if you compare A and B you can't give A 10+x the compute power and assume a fair comparison. What's interesting is a demonstration that enough compute power let's NN reach beyond human level play. Though, I don't think that was ever really in doubt.


While I'm glad to see you're excited about this, take note that this is still an approach which requires that an exact model is known, the state is fully visible, and the reward is perfectly define-able and known. Progress in this setup isn't necessarily correlated with the kind of AI for which we'd need a fire alarm.


It's easy for many to think that solving Go and chess means we can also solve household work like cleaning, cooking and washing dishes but it's actually harder.


Next up: Google's Deepmind AI learns to perform arithmetic tabula rasa.

More seriously, it seems Deepmind and the AI community in general is having a Streetlight effect problem, i.e. looking for AI in what works now, rather than coming to terms with the hard challenges. This explains why there are so many papers on GANs. People are just doubling down on what works (where the streetlight is), rather than acknowledging that where we need to look for AI is dark. Since it's become such a cut-throat race to be the next one to say "we made a breakthrough!", it makes much more economic sense to solve simple problems and advertise them as huge challenges.


I wouldn't dismiss GANs so easily. Yann LeCun was singing odes to GANs - as the most interesting idea in the last decade. The interesting thing about GANs is that they don't use a predefined loss, but instead the discriminator acts as the loss function for the generator - thus, it is learning a loss fn instead of using human guesswork to create it. That's quite a powerful new idea. Applications of GANs include making simulated images look more real, which is essential for RL, generating 'artificial' training images for other tasks and using the discriminator as an image embedding generator or classifier.


I agree that the average Joe will misinterpret the significance of AlphaGo, to Google's benefit.

But most people in the research community already know how amazing it would be to make an affordable household robot or a search-and-rescue robot or a self-driving car. Many labs (including mine) are working on it. The streetlight adds a small bias, but the bigger problem is that we have no idea how to build human-level AI.


The biggest problem in robotics is vision. How do you translate pixels to a 3D scene graph with objects attributes and correlate with prior knowledge.

Do that in real time, on device without using a crazy amoubt of power because of batteries.

CNNs and faster GPUs are the biggest breakthrough in that regards but it's still a long way to go before we get to human level visual cortex.


Vision is part of the puzzle--a large part in the case of self-driving cars. But blind people are way better than computers at everyday tasks, so I don't think that it's the Big Problem.

Translating to 3D is low-level and relatively easy. That's not the reason why we don't have household robots/self-driving cars.

Framing vision as "object attributes" and "correlate to prior knowledge" might be a good approach for current research. But humans do more--we understand what we look at. We form concepts and models of the world that allow us to adapt to very novel situations.

The main reason why we haven't solved vision, language, playing chess like a human, etc is that NNs are a poor approximation of human concepts. I agree that we probably need more compute and better compute.


Yes, but it doesn't seem like much of a problem? Exploiting a breakthrough before moving on to harder problems isn't cheating, it's the smart thing to do. It might even turn out to be the fastest way to make progress on the harder problems.


Let's break this down and consider things carefully. To informed researchers, what is most surprising here is not that the AlphaGo Zero algorithm beat stockfish but that MCTS managed to outperform Alpha-beta search. I'll venture a hypothesis as to why this was.

Informed skepticism would have discounted MCTS against alpha-beta search but wouldn't have put much stock into the idea that Neural Networks couldn't learn better features than what has been painstakingly handcrafted. We know that given sufficient data and an appropriate architecture, neural nets have achieved better local minima than humans. This shouldn't be surprising anymore. A structurally adapted searcher will always do better in its adapted to domain. A Cat is so good at being a cat, it doesn't even have to think about how to cat. Choice of optimization method, input pre-processing, loss function, hyper-parameters and architecture together define a search space, a structural prior and how to navigate.

Returning to alpha-beta vs MCTS, my view is that earlier work on the chess search space being ill-suited to MCTS has not been invalidated once you account for the synergy between the neural net and search method brought about by the imitation learning approach. What might be happening here is the neural net not only learns to correct when it goes out of bounds, it also learns to account for missteps of MCTS!

The AlphaGo Zero Chess Program is clearly smarter than stockfish from the perspective of its ability to better navigate the search space but before talking about fire alarms there are some things to note.

Assuming the paper, AlphaGo zero does well if you hold compute fixed and adjust time, but how does it do as you move along both compute and time? This is of relevance to the general community, especially if AlphaGoZero skill degrades gracefully enough to allow it to be a better tutor than current engines.

Contrary to the no fire alarm claim, we should see sudden improvements everywhere due to how close joint, structured prediction, reinforcement and imitation learning are to each other. Unexpected improvement across a broad class of problems is a fire alarm. Right now, POMDP or games with hidden information and multiple interacting agents are still very difficult. Structured prediction is still difficult. Granted, this was before AGZ, but Neural Nets+MCTS had to be modified to Neural Self-Play before it could work just ok in poker-like games.

What we should take away is the power of combining searching and learning. I'll argue that what is now being called expert iteration was presaged in an antique 2006 paper [1] where Hal Daume et al discuss the power of a learning algorithm trained to imitate a search computed policy. Even with limited compute and data, you can use similar ideas under the learning to search framework. The imitation approach is what's consistently yielded great results, whether applied to neural nets or logistic regression.

[1] http://www.umiacs.umd.edu/~hal/docs/daume06searn-practice.pd...

https://link.springer.com/content/pdf/10.1007/s10994-009-510...


Correction to the above: I stated Deepmind applied Neural Nets+MCTS and achieved ok results. I was actually misremembering two David Silver (Deepmind) papers as one. Smooth UCT modified UCT (popular brand of MCTS) to be able to handle imperfect information games. MCTS does not converge under imperfect information. Smooth UCT is strong at limit poker. Limit is much simpler than no-limit.

Neural Fictitious Self Play based on fictitious play (invented 1950s), is an approach to reinforcement learning using neural nets for function approximation. Typical RL methods like DQN are highly exploitable. Against strong programs, NFSP did okay, with a win rate of -50 mbb/h against the best bot it played against.

Looking not just at Deepmind, there's Deepstack. It's similar to AlphaGo OG, combining CFR+Neural nets. Deepstack did not win convincingly against humans at 2 player no limit hold em.

The general point I'm trying to make here is that Chess and Go are closer to checkers than to poker, which is itself a constrained game with known rules. I mention all this and this Deepmind paper: https://arxiv.org/pdf/1711.00832.pdf, to provide a sense of scale to those talking about smoke and fire alarms.


What do you think of Libratus which won quite convincingly against top players in no-limit Texas hold ‘em poker?

https://en.m.wikipedia.org/wiki/Libratus


Probably the wrong engine to test this with then. Although it's interesting nonetheless. It's pretty well known that chess engines have this trade-off between searching and evaluating. Among the consistent top 3 I suppose Stockfish is the easiest to test, being open source and all. It's pretty well regarded that Komodo has the best evaluation function though. Even if it doesn't keep up with the nodes/sec of Houdini and Stockfish, it's consistently up there with the top 3. The other chess engines doen't even come close. (Fire is probably number 4 but is on a league of it's own. Not quite good enough to challenge the top 3, but eats everything else.)

I know it's complicated, between the hardware differences, search method used, etc. But when claiming that NNs beat hand crafted evaluation functions, keep in mind that Stockfish is probably are not the best choice to compare, since it has made different tradeoff choices to get more depth (which goes back to search method and hardware choices).


Your comments about Stockfish, Komodo etc are entirely subjective. "It's pretty well regarded". No it's not.

You can't disconnect the search part from the discussion, as the search selectivity is ALSO learned by the neural network.


Yeah, I'm quite confused that there's no mention of SEARN or LOLS or similar imitation learning algorithms in the references of the Alpha Zero paper. The algorithm for learning looks severely derived from that 10 year old idea.


I agree that Searn is rather prophetic in retrospective.


They aren't the first to apply NNs to chess though. What are they doing differently? And does anyone else smell smoke?


It's certainly not the first NN chess program. You may remember one of OP author's Giraffe NN (https://arxiv.org/abs/1509.01549) which was essentially 'AlphaGo for chess'. But like the original AG, it struggles to learn and Lai had a lot less computation as a student than he does now at DM. What they're doing is applying AlphaGo Zero expert iteration with some simplifications and TPUs. And that pwns previous work like Giraffe the way AlphaGo Zero pwns AlphaGo. Quantity becomes a quality all its own.


Look at Figure 2, and remember that DM has access to a lot of hardware. At short thinking times, AlphaZero is weaker than Stockfish. This is equivalent to longer thinking times with weaker hardware, and it is likely that the former applications of NNs to chess had hardware that was a 1000-fold slower than what DM has access to. This means that even if the approach was identical to DMs, they would not have seen a better performance of NNs than the classical alpha beta approach.


In essence MCTS + NN is just another way of tree search just like AlphaBeta or its brute force cousin Minimax.

AlphaZero just tries to be smarter about which branches to evaluate so it can go deeper.

But I would love to see AlphaZero (trained) run side by side with stockfish on an iPhone hardware and defeat it. That would be a more apples to apples comparison.


they are a huge company (Google) with access to top top top talent (experts) and infinite hardware resources. I don't know why it would be surprising if they acheived performance that hadn't been acheived before


>it only takes 4 hours of training to beat Stockfish

In that time I figure they used the equivalent of about 1000 cpu-years. Imagine the things we'll be able to achieve as we can do more and more computation in less and less time.


But how many "cpu hours" of human work were used to design stockfish? You can't really compare that.

Some scientist say the brain has a power of several petaflops, so if you use this, I guess the design of stockfish was way less efficient.

You can't really compare things to cpu years, it doesn't make sense. Power consumption would be a better metric I think.


The best metric is total cost, including the cost of the hardware as well as the electricity. It might be worth prorating the hardware by the amount of time it spends on the task, too, assuming the hardware is general enough for many purposes (like TPUs are), vs say something like EFF's DES cracker which was not.


A ton of CPU has gone into Stockfish, if only for their distributed computing project fishnet: http://tests.stockfishchess.org/tests


To be a little more precise: Stockfish has used >5,667,382 CPU-hours (5.6 million CPU-hours) adding up just the participants who contributed >10,000 CPU-hours according to https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU...


Yes but that's training time. At runtime AlphaZero got magnitudes more computation power than stockfish.

A more fair comparison would be use as much computation needed for training but for runtime, use equal wattage hardware.

E.g 20W of cpu in mobile phone running stockfish to 20W of GPU in nvidia TX2 running AlphaZero to 20W human brain.


> In that time I figure they used the equivalent of about 1000 cpu-years.

Are you using some kind of conversion factor from TPUs to CPUs? If so, what is it? And is it valid to do that?

You could convert the amount of time it took to render an hour's worth of gameplay from 1 GPU-hour to 50 CPU-days (or whatever), but is that really meaningful?


The conversion factor seems to be 1 TPU-hour ~ 500 CPU-hours in terms of flops. We can nitpick that number, but it won't change the conclusion that AlphaZero needs a boatload of compute.


I don't see how this is relevant though. A GPU also provides graphics rendering performance equivalent to some boatload of CPU-hours, but who cares? GPUs exist and are used for the tasks they are good at. TPU hardware isn't theoretical; it does exist and it is being mass-produced.

Yes, it needs a boatload of very simple compute (8 bit operations), the kind that CPUs are not even close to ideal at providing economically.


It needs huge amount of computations to perform a task which previously required a boatload of domain experts' brainpower too.


> doesn't work as well in chess: it works better

That's not how "as well" works.


Yes, it is; while the idiom standing on its own implicitly includes a leading “at least”, it is also idiomatic to use it in exactly the way used by the grandparent post, in an explicit contrast with better, where it comes with an implicit (or sometimes explicit) leading “merely” instead of “at least”.


It's unnecessary, though, and makes the point harder to read. "It works even better" would be a perfectly sufficient description. "It works not as well as but better" is an unnecessary rhetorical flourish.


The misdirection is being used as a rhetorical device — you're supposed to feel a brief confusion when you get to the colon; it's then quickly resolved.



Time and again Alpha shows it is much better at eval than Stockfish.

Alpha play feels "human" at least to this FM. This is fantastic news! It is what I would imagine a good correspondence GM would play like with engine assistance.

I already commented on Game 1 where Stockfish played extremely aggressively with 13. Ncxe5 ??! and 31. Qxc7 ?!

Game 3 is a positional masterpiece. Alpha is willing to play pawns + exchange down when it correctly evaluates that Black queen and rooks will be tied down.

This kind of long term thinking is beyond what regular engines perform.

Game 10 is also an impressive showing by Alpha. Alpha is willing to play down a piece and a pawn for 15 (30 ply) moves in a middle game beyond the reach of Stockfish's raw calculations.

If one could only get access to Alpha evals :) When do mere mortals get access to TPUs on Google Compute Engine?


Thanks for the analysis.

There's a project currently that emulates AlphaGo Zero using distributed computing / crowdsourcing: https://github.com/gcp/leela-zero . You can run it on the browser too and it will submit the games after: https://ntt123.github.io/leela-zero/

Hope such a project will be available soon for the chess variant.

Or maybe Deepmind will release this as a SaaS product?


Deepmind should release the TF compatible model with weights. And then it's just a matter of shrinking the model enough to run on desktop hardware.

But I don't know whether they'll do it. I hope they follow suit like other researchers who have github repos with code and models besides their papers. Really accelerates research.


So, 1. d4 for White, Berlin for Black. I got it


https://i.imgur.com/kwCyiHn.png

That was a bad move for white to play. It's easy to win when your opponent throws the game.

No human player would trade queens in that situation.


Are you thinking Qa5?


Yeah, or anything other than trading. You can see from the graph that it was all downhill from there; deservedly so.


Uhh... These games are actually broken. From the second link: https://imgur.com/a/P5tG6

See for yourself:

https://lichess.org/Zqwn4Gzk#87

https://lichess.org/Zqwn4Gzk#88

EDIT: Nope, I'm just a noob.



Ah, thanks.

I'm delighted. Chess seemed so simple. I had no idea there was a special pawn capture.


This looks like a fantastic site! Is there anything similar for Shogi?


One impressive statistic from the paper: AlphaZero analyzes 80,000 chess positions per second, while Stockfish looks at 70,000,000. Seventy million, three orders of magnitude higher. Yet AG0 beats Stockfish half the time as White and never loses with either color.

A stunning demonstration of generality indeed.


So ... what if you combined Stockfish and AG0, and let AG0 explore 70M positions instead of 80K? Would it improve even faster?


What if you combined a bus that gets you to work in 10 minutes and plane that gets you from Paris to Brazil, would it get you from Paris to Brazil in 10 minutes?


yes. In an imaginary and hypothetical sense. :)


The issue is you can’t evaluate positions that fast in AlphaZero (currently).


It would be interesting to see if there were some way to extract a couple of new heuristics from AlphaZero that could be implemented fast enough to incorporate in Stockfish's evaluator though. I suppose this is the age old problem of black-box models: _why_ does it think this?


I think that it is almost always possible to extract optimized models from nn and implement them faster. I wonder if this can be generalized. Nn to optimized fix algo for Max speed?

This has to already exist as it is very obvious.


I dunno, seems like Google would just do this instead of keep around the pesky neural net at runtime. There's an _awful_ lot of computation going on inside, and it's necessarily hugely interconnected. I'd be impressed if someone had already done it, but it seems a great avenue of research if not. I suppose it goes hand in hand with models for which you can actually _explain_ their results, which certainly is an active area of research.


There are well-known techniques that work pretty well to shrink neural nets a lot while keeping almost all of their performance. See Geoffrey Hinton's model distillation papers.

The first AlphaGo paper had a system that used tons of computation, and was followed up by one that used much less and worked even better. Not speaking for Google, but I think it's a bit of a race to publish great results first. I wouldn't be surprised to see something better than this that uses 1000 times less resources published in a year or two, just like what happened with Go. First prove it's possible, than figure out how to make it much more efficient.


A really good example of model distillation also comes from DM: their new realtime WaveNet used in Google Assistant. The first WaveNet was ungodly slow due to redundant computation; but even after that, it still was not realtime simply because the CNN is too deep and slow. But you need the CNN to be deep & big in order to train good audio generation. Model distillation to the rescue: take a wide fast small CNN and train it to imitate the slow deep WaveNet. Result: WaveNet quality realtime voice generation which can be deployed to the masses.


Thanks for these googlable hints. :)


"We also analysed the relative performance of AlphaZero’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon." <- Amazing!


Meanwhile a human player considers <1 position per second, so there’s a few orders of magnitude left to go in that direction.

But unsettlingly few, nonetheless.


Humans are also much weaker than AlphaZero in these three games. The difference in the numbers of positions searched might be responsible for a substantial part of that.


It'd be interesting to weaken AZ until it is on par with a human, and then compare moves evaluated. I'd suspect humans still evaluate significantly fewer moves.


Strong human players consider a lot more than 1 position per second in Chess...


If you have seen the Stockfish project you will see many hardcoded weights in the configuration, found through experimentation. All these adjustments took probably years to achieve... and now Alpha Go Zero just self-learns everything and surpasses it.

Would be good to see Deepmind's solution play Arimaa and Stratego, and see what kind of strategy it comes up with. Or weird variations of Go.

Eventually this tech will make it into military strategy simulators and that's where things will get really messed up. 4 star generals will be replaced by bots.


I don't think this technique immediately applies to Stratego because it's not a perfect information game.

I suspect it would exceed the state of the art in Arimaa, since Arimaa is specifically designed to have a high branching factor (17281 -- compared to 35 for chess), and this technique was designed to work well in high-branching factor games (since Go is a high-branching factor game, though much lower than Arimaa).


In that regard then Stratego would share some aspects with Starcraft, another incomplete information game.

Deepmind is actively working in a StarCraft bot. It would be interesting to see if they can be put together a supraintelligent StarCraft bot and then translate those results to Stratego.


I smell a rat.

The paper says:

'AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi'

In the first game Stockfish's, 9. Qe1 is one of the strangest moves I've ever seen, which would never be considered by a human, let alone a superhuman.

11. Kh1 also makes little sense, but is not as bad. My Stockfish sees it as losing 0.2 pawns, which makes it highly suspect in such a position.

35. Nc4 is also a deeply puzzling move that my Stockfish sees as losing half a pawn immediately, and a whole pawn soon after.

50. g4 also suspect

52. e5 is insane.

This is bullshit.

Edit: bullshit is too much - see comments below.

Edit: Oh dear. We're doomed.

https://lichess.org/study/qiwMCyNQ


The Stockfish engine provided by lichess on the game you linked doesn't seem to mind those moves - it has most of them in the top few lines after a few seconds of thinking time.

Qe1 and Kh1 are fine if the plan is to prepare f4.

35. Nc4 stuck around at the #2 / #3 best move for as long as I ran that position.

Remember the Stockfish in the paper had 64 cores so you'd have to run your Stockfish for a while to get it to arrive at the same principle variation.


Yeah that's right. I think this might say more about the efficacy of chess engines over a certain point vs human analysis rather than the 'bullshit' I called.

I'd certainly fancy my chances against this AI more than Stockfish on a lower power.


If I leave Stockfish to study for longer then Qe1 comes up in the analysis. Which makes me wonder whether SF gets weaker in some positions the more it's left to think.


Now I'm really intrigued.

SF plays really odd moves when left to its own devices for a time. As does this AI. So maybe chess looks really weird with play significantly better than the best humans.

It's actually really disturbing.


I think being able to play tactically perfect chess over 20 or so moves will often look weird to human strategic sensibilities. The computer sees every tiny exception to the patterns and heuristics you've incorporated into your gut feel about positions. In a way these moves are right just because they're right, and that's what's jarring - there's no _principle_ behind them that can be learned and generalised, which is something humans struggle with in all walks of life.


Except AlphaZero doesn't evaluate nearly as many moves as Stockfish (80Knps vs 70Mnps), so in a sense, it has exactly generalized a principle (or likely a whole lot of principles) that allows it to estimate positions much better than Stockfish.

Of course you are right about perfect play, but the human-like aspect is part of what is exciting about these new Alpha engines.


Yeah. I'm stunned.


There's definitely nothing fishy going on, although it'd be nice to see a fully loaded Stockfish on its full complement of 512 cores and a proper endgame tablebase to really slog it out with AlphaZero.


Back to zero at 41 ply... I shan't give a running commentary anymore.


It's fascinating isn't it? I'd love to see this vs Magnus.


Pretty sure I input the moves wrong now I'm looking at it. Humans really have no place in chess. :P


> which would never be considered by ... a superhuman

How would you know?


I don’t see it in my database, and it’s never been played on Lichess, even in bullet games.


Which, of course, is not evidence that a superhuman wouldn't consider such a move. AlphaGo also made unusual moves that looked like mistakes, but turned out to be insights.


Apologies, misread the parent comment!


Ask any GM.


How would a GM (human) know what would be played by a superhuman?


The edits on this comment make it the second best HN comment of all time.


What's the first?


The "Did you win the Putnam?" comment: https://news.ycombinator.com/item?id=35079

The whole thread is pretty hilarious. In another part of the same thread there is this comment:

we're in a similar space -- http://www.getdropbox.com (and part of the yc summer 07 program) basically, sync and backup done right (but for windows and os x). i had the same frustrations as you with existing solutions.

let me know if it's something you're interested in, or if you want to chat about it sometime.

drew (at getdropbox.com)

https://news.ycombinator.com/item?id=35103


hmm...13.Nce5 looks like the move no strong human would play, and I suspect even engines after going sufficiently deep wouldn't choose it (I haven't checked it though).


My perspective as FIDE master who has played Ruy Lopez Exchange type of positions for 30+ years.

9. Qe1 is a pretty normal maneuvering move

13. Ncxe5??! looks like a major howler.

Ask 100 strong chess players and 99 of them would completely ignore it. You are giving up a piece for two pawns in an open position and black has no real weaknesses. There is no real basis for a sacrifice.

This shouldn't work. The crazy thing is that Stockfish almost makes it work.

It is the kind of move you play when you absolutely must win and must win now.

The only reason Stockfish considered it is because of white pawn on a5 giving additional tactics in breaking up black pawn chain with a6 a couple of moves down. With pawn on a4 Ncxe5 wouldnt be worth attempting.

The crazy thing is that being such a bully almost worked!

At move 28. White looks very solid, with 3 perfect pawns for the piece + black has horrible weaknesses. 29. g3 is a bit suspect but the next super computer move is

31. Qxc7 this has to be losing but it is a typical computer bully move.

Most strong human players would prefer to defend h3 hole with Kg2 (on Qh5 f5 looks fine).

The idea is that black's white square bishop is boxed in with white pawns.

There must be a concrete reason why Stockfish did not play Kg2.

Overall the impression one gets is of very "human" play by Alpha and ultra aggressive play by Stockfish.

EDIT: so extremely impressive play by Alpha but a bit suspicious aggression by Stockfish.


> It is the kind of move you play when you absolutely must win and must win now.

I agree Ncxe5 looks crazy, but the weirder thing to me is that Stockfish offers a repetition the very next move. So it can't be caused by having high contempt (favouring wins over draws).


Thanks for the analysis


I wanted to contact the authors directly but can't seem to find contact info at the moment, with a question. I hope some of you might know enough to answer it.

I'm interested in applying this method, or a similar neural-network / tabula rasa based method to the game of Scrabble. I read the original AlphaGo Zero paper and they mentioned that this method works best for games of perfect information. The standard Scrabble AI right now is quite good and can definitely beat top experts close to 50% of the time, but it uses simple Monte Carlo simulations to evaluate positions and just picks the ones that perform better. It doesn't quite account for defensive considerations or other subtleties of the game. I was wondering if anyone who had more insight into MCTS and NN would be able to talk me through how to apply this to Scrabble, or if it even makes sense. One of the issues I can see currently would be very slow convergence; as it has a luck factor, the algorithm could make occasional terrible moves and still win games, and thus be "wrongly trained".


Step 1: millions (?) of dollars of hardware.


Two things to note:

1) Alpha Zero beats AlphaGo Zero and AlphaGo Lee and starts tabla rasa

2) "Shogi is a significantly harder game, in terms of computational complexity, than chess (2, 14): it is played on a larger board, and any captured opponent piece changes sides and may subsequently be dropped anywhere on the board. The strongest shogi programs, such as Computer Shogi Association (CSA) world-champion Elmo, have only recently defeated human champions (5)"


Shogi is a fun game, it always feels a little sad that it doesn't get more exposure outside of Japan (and my understanding is that, by and large, in Japan it is considered an "old persons" game)

Because captured pieces change sides, there is less of an "endgame" scenario, and as a beginner (like me) it is very easy to put too many captured pieces back into play, which makes it hard to defend everything and essentially you end up giving them back to your opponent


It recently got renewed attention when Fujii Sota, 14 year old turned pro at the youngest age since Kato Hifumi, and subsequently had a record breaking winning streak (29).


I've been interested in learning both shogi and xiangqi for a while. If anyone knows a nice engine with graphical frontend for either game, I'd love to know. Wasn't able to find much the last time I looked.


The best place to play Shogi online against others at http://81dojo.com/


It briefly became popular in the otaku culture from an anime called Hunter X Hunter.


I'm curious to see if "San Gatsu no Lion" (the Lion of March) will spark interest. I highly recommend it to anyone interested in more slice-of-life/drama kinds of things. It's quite a beautiful anime/manga, even if the shogi isn't quite centre stage.


Recommendation seconded, Sangatsu no Lion is a lovely work. On the other hand, it has been running for 10 years (!), if it could spark interest like Hikaru no Go, it would have happened already.


Shion no Ou is another good shogi anime. I haven't read the manga.


As a chess player I find the win rate astonishing.

Given the drawish tendency at top level, among human players, in correspondence chess and also in the TCEC final, I thought that even absolutely perfect play wouldn't score so well against a decent Stockfish setup (which 64 cores and 1 minute per move should be).


I can’t see any reference to whether Stockfish was configured with an endgame tablebase. It’d be interesting to see results then, as you’d expect AlphaZero’s superior evaluation to give it an advantage out of the opening, but later in the game Stockfish would have access to perfect evaluations. Obviously there’s nothing stopping you from plugging a tablebase into AlphaZero but that feels wrong.


It's not clear that it had an opening book either. In any case it's not specified which one.


I'm not sure it's really fair to compare Stockfish to AlphaZero; AlphaZero used 24h of 5000 TPUs in compute time, and still needed 4 TPUs in real play, while Stockfish ran on just 64 threads and 1GB RAM. Nonetheless, still an impressive achievement.


Wait, how's the 24h x 5000 TPUs relevant? That is training time, and that training corresponds to years and years of hardcoding evaluations in Stockfish, not to compute time during the match.


Only 1GB RAM? Really?


Yes, this is really strange. Hash table size is a major contributing factor for strength of chess programs. It looks like a very artificial limitation.


Serious question: how does one evaluate the results reproducibility of this paper ?

Maybe I'm missing some things but:

- Are 1st gen TPUs even accessible ? You have to fill out a form to learn more about those second generation TPUs: https://cloud.google.com/tpu/

- I can't find the source code

This does not look like a scientific paper, but a (very impressive) tech demo.


This is definitely a scientific paper. Pretty much no scientific paper comes with source code and the majority of scientific papers are not reproducible without an entire university department of resources anyway.


...and this attitude causes quite a bit of them to not be reproducible even when people try with the same resources.


> Pretty much no scientific paper comes with source code

Are we blindly accepting this as science now ?


Yes? Sorry you've been out of the loop so long but science doesn't cater to your idealistic ideas of what it ought to be.


My main thing about source code and scientific papers is that it would just be so easy to release the source code along with the paper. Even if people don't reproduce work source code would often help to understand it as often I'm a little unclear on implementation details, which source code would be able to greatly clarify.


How do you replicate CERN experiments? The LHC? Hubble? LIGO? LISA? At least this paper is reproducible by people who have the compute, and many universities have super computers.

Even at home, you can verify the results by replaying the games against stockfish. You might not be able to replicate the setup at home, but that does not mean it is not science.


Comparing projects done in the open with multiple different universities, on public funds, with something done behind closed doors with only personnel from a commercial entity is pretty far-fetched.


Why? How are any of the factors you mention related to verifiability? How does being supported by public funds with academic personnel from multiple universities make LIGO any more verifiable for me at home? At least I can run these games against my stockfish, thus verifying the result. The method I cannot verify, but being able to verify the results is already more than most of science.


You can maybe not do the experiments from Cern, but you can do the calculations most of the software they use for that is open source iirc.


There are too many details missing for the results to be reproducible.

Does it even qualify as a tech demo if the result only exists in DeepMind's lab?


This raises an interesting concept. If you cannot reproduce an experiment because of lack of resources, can you believe it? Or is this the equivalent of 'photoshopping your results'?

A similar problem exists in cosmology. Can you verify the multiverse model if you only have one universe to experiment in?

As data storage requirements in RAM and TPU power requirements increase to run certain models/algorithms, machine learning is becoming more obscure. Not only can we not understand how an AI is reaching its conclusions (inscrutability), we cannot even probe it (by tweaking parameters, etc) to find weak points (inaccesibility). This is actually a good thing. Where humans cannot tread, there can be no evil?


At least in computer chess such experiments were typically demonstrated by winning the World Championship. (And sometimes they failed...cough Deep Blue cough)


Definitely a scientific paper. It’s obvious from they way they formatted it that they’re going to submit this to Nature or Science.


This then may suggest that there’ll be this detail-light manuscript in the journal and a 50-100 page supplemental document available to download, with all the details to reproduce (hopefully)


There's a project currently that emulates AlphaGo Zero using crowdsourcing: https://github.com/gcp/leela-zero . You can run it on the browser too and it will submit the games after: https://ntt123.github.io/leela-zero/

Hope such a chess project like this will be available in the future.



Stockfish plays like an ambitious amateur in the first game, giving away a piece for two pawns on move 13.

Perhaps this move was justified though, as later in the same game Stockfish gets a position which is at worst drawn, likely winning. Moves later however, around move 40, Stockfish gets its own knight trapped and the game is over.

This is not the kind of chess we normally see from Stockfish.


Yeah, that game was kind of different from the others - in the other games the feeling I got was that over time AphaGo's pieces got increasingly effective while Stockfish's pieces would get bottled up and lose their mobility.


Very happy to see this result. It's like a moral victory for humans, as alphago is more human like (discounting montecarlo search) than stockfish. Maybe deep learning will give us the next Euler, Newton, or Einstein.


Shogi, chess and Go are "perfect information games", meaning you can see the whole game state. It's a whole different thing to be able to solve games where you don't see everything (based on uncertainty).


Is it really though?

A big class of imperfect information games can be modeled by having a record of everything the agent has seen so far. Then it has exactly the same, if not more, information available than a human player in the same position. We know that with equal information AIs can make better decisions than humans (see also, AlphaGo :] ) so at that point the AI could reasonably be expected to achieve superhuman performance.

The "imperfect information games are harder for AI" crowd are going to be surprised by just how badly humans deal with imperfect information. AIs have a much better memory than humans do, and much more potential to use actual probability which humans are truly shocking at utilising (although neural networks don't seem to utilise this edge; so far).


The difficulty of imperfect information is from cross cutting through information sets and partial observability. With perfect information games like chess or Go, one can solve subgames with guarantees that the equilibrium is the same as for the full game. This is not the case for games like poker, which is why they have been difficult. In addition to that, for n > 2 players, there are no longer theoretical guarantees about converging to a nash equilibrium, which makes designing theory guided algorithms harder. Though empirical performance with n=3 of CFR is encouraging, I know of no results for n > 3.

Earlier this year, DeepStack, a system combining neural nets with search, competed live against humans without any side being dominant. Search policy guided training might improve its results, which are impressive compared to even 5 years ago, but this highlights how much more demanding imperfect information games are.


Yep, this. Btw there are some encouraging results for n=4 using sequence form replicator dynamics (which are implementing a form of CFR) in Kuhn poker. Toy example but the game gets large fast with n=4. Don't know of any results with n > 4.

http://mlanctot.info/files/papers/aamas14sfrd-cfr-kuhn.pdf


i'm not sure deepmind would publish a paper in which they describe a winning high stakes online no limit holdem player. the ethics would be quite shady. for all we know, they might have already done that just to see if it works.


Could work, but it hasn't been widely demonstrated yet. I really hope we can tackle such games/RL tasks.


You mean, like poker? https://www.cmu.edu/news/stories/archives/2017/january/AI-be...

Actually machines can have an even higher advantage in those cases, because they can be much better at estimating probabilities than humans. Think of card counting, for example.


I disagree. Computers have been outplaying the best humans at chess for two decades, but they only recently beat the top players at 2-player NLHE and only with the aid of massive computational power during training.

Furthermore, techniques like monte-carlo tree search used in AlphaGo don't work very well for poker - You can't just try and find the "best move" from the current game state, or you will end up playing a highly-exploitable strategy. You essentially have to solve the entire game every time (or completely in advance) to make sure you are playing a balanced strategy.

Only the Counter-Factual Regret Minimization algorithm has been able to achieve this level of play in Heads Up, and right now it looks hard to scale to poker games with more players, like the full-ring games you see at the World Series of Poker, for example. We still have a ways to go in Poker AI.


In 2015 Heads-Up Limit Hold'em was solved: http://science.sciencemag.org/content/347/6218/145.full


Math can be perfect information too if you just start with axioms. Even when starting with conjectures, the rules are transparent for manipulating statements.


For those complaining about the TPU resources used during self training it is worth noting that Stockfish has used over 10,000 CPU hours for tuning its parameters. See https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU...


This understates it a bit. More like 10 million CPU hours according to that link.


What an amazing result! Evaluating fewer (by a factor of 1000) positions AlphaZero still beats Stockfish.

In the figure on its preferred openings I find it very interesting that it doesn't like the Ruy Lopez very much over training time (there is a small bump but that is transient). I am hardly a chess expert but I know that it was very favored at the world championships so maybe the chess world will be turned upside down by this result now?

Positing that the chess world is bigger than the Go world (in terms of interest and finances) there is probably going to be a race to replicate these results "at home" and train yourself before your competitors :)


What would be a good starting point to learn about the AI behind that for a "normal" programmer? There seem to be so many resources now that it's hard to choose. Combination of hands-on plus theory would be good.


Coursera - Andrew NG's course => Classic starting point, very thorough and digestable introduction to Neural Networks. I found he covered the 'how the heck do I use this?' rather well.. :)

From there, Coursera has a paid(?) DL course by Andrew NG or there's Fast.ai which looks good.

Good luck!


The keyword is "reinforcement learning".


I know the names of the general concepts, I was wondering if someone has concrete recommendations on where to start and which books/frameworks are sort of beginner-friendly.


Try Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems for the fundamentals.

http://shop.oreilly.com/product/0636920052289.do

For reinforcement learning, I hear Barto and Sutton is very readable, but I haven't read it myself. You can just pick the concepts up by reading papers. The introduction in the Deep Q-Learning paper is not great, but it's how I first learned the concept.

http://ufal.mff.cuni.cz/~straka/courses/npfl114/2016/sutton-... https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf


You can check this Reinforcement Learning Course by David Silver on YouTube: https://www.youtube.com/watch?v=2pWv7GOvuf0&t=836s

By the way, I believe David Silver was the lead programmer for AlphaZero.


While this sounds impressive, I'll believe it when AlphaZero wins TCEC.


It beat the winner of TCEC-2016, Stockfish, with a record of 28-72-0. That's zero losses.


If I run SF on my desktop computer it will kill SF run on my phone. It doesn't prove anything. Comparing TPUs and CPUs is hard but they could've at least let SF run on what is considered top of the line setup and sensible settings (1GB hash memory is very limited, 8GB is standard for rapid games on a quad core CPU, let alone 64core one).


I can't figure out the reason for this stingy 1GB hash memory limit when using 64 cores. It pretty much negates advantage of 64 cores vs say 4/6 cores.

A nefarious suggestion would be that setting 1GB limit ensures that Alpha would always have the edge in depth as Stockfish would be forced to prune long lines to preserve hash memory.

Maybe someone who has read Stockfish source code can comment how Stockfish prunes hash memory.


Well, the one explanation is that they wanted to win "convincingly", thus 1m per move and so low memory amount for hash.


They didn't demonstrate that AlphaGo Zero can beat Stockfish in a fair contest: i.e. take the amount of money they spent on Stockfish's CPU and RAM, buy a commodity GPU for AlphaGo and then see.



I'm sorry, I thought we were discussing the paper.


On completely different hardware.


Back when AlphaGo was playing Lee Sedol I was thinking about a chess playing version in TCEC.

The interesting thing is TCEC assumes a bit about the structure of the chess program. That is, the TCEC win-adjudication rule says that if both programs agree that one program is 6.5 pawns ahead for 8 turns in a row, they judge that program to be the winner.

But programs like Alpha don't have an evaluation function that operates in conventional units (like centipawns).


You can convert winning percentages to centipawns, so that's not a problem.


Could you explain your proposed conversion process?


Here's a relevant section from Deepmind's paper:

> We also measured the head-to-head performance of AlphaZero against each baseline player. Settings were chosen to correspond with computer chess tournament conditions: each player was allowed 1 minute per move, resignation was enabled for all players (-900 centipawns for 10 consecutive moves for Stockfish and Elmo, 5% winrate for AlphaZero). Pondering was disabled for all players.


Houdini for example tries to make it so that +1.00 evaluation is a win in 75% of cases in blitz games and +1.5 represents 90% chance of winning (http://www.cruxis.com/chess/houdini.htm). Anyway, this is not a problem at all, this was introduced so less electricity is wasted when the position is a clear win/loss.


I hope they change the TCEC hardware specs to include GPU so this might be able to happen.


How can we fairly evaluate TPU engines vs. CPU engines?


I wonder if being an expert at one game makes it easier to be an expert at another. If so, then maybe the examples are datasets, and convergence would be able to complete new tasks after a few examples.


Really interesting question. Some strategic concepts may transfer, say, from chess to chess variants. However, a simple change in the rules can have a huge impact in the game mechanics as anyone who has tried chess variants [1] knows.

[1] https://en.wikipedia.org/wiki/List_of_chess_variants

The answer may be that it is hard enough to become an expert at anything, but there may be some serendipitous (how to make this precise?) overlap.


Well, it's not doing anything like that for now. Even though the algorithm, in an abstract sense, is the same for all three games, in fact it's a new network for each of the three games, with architecture and input features adapted to the game, and then trained from scratch.


It would be very interesting to see someone try something like transfer learning from one game to another.


It doesn't seem to like the Sicilian Defense (1.e4 c5), which is the most popular opening by human players. I wonder if this will change opening theory?


It looks as if it doesn't play 1.e4 much as white. Since these statistics are for self-play games, that means it won't get a lot of opportunities to play 1.e4 c5 as black. Still, it does seem as if it likes the Ruy Lopez and French better as black than it does the Sicilian. (It would be nice to see a little opening "tree" with move probabilities, rather than this list of 12 most-popular-among-humans openings.)

[EDITED to add:] A couple of other remarks:

Playing against Stockfish, the Sicilian seems to give it more wins as white and more losses as black than any of the other openings listed here.

What's shown here are two particular versions of the Sicilian; for all we know there's a lot more 1.e4 c5 in its self-play than the graphs suggest (e.g., maybe as white it prefers 2.c3 or 2.Nc3 or something). Eyeballing those graphs, these 12 openings account for substantially less than half of AlphaZero's self-play games.


A list of openings in recent world champions: https://www.chess.com/blog/ih8sens/world-championship-openin...

Queen's gambit is there.


That's stunning. I thought that was one of the strongest openings for black.


Not strongest opening, but because it's an asymmetric opening system, which introduces imbalance into the position, thus it tends to have less drawish tendencies than a symmetric opening system.

This creates the psychological effect of slightly turning the knob of "Black is playing for equality", to "Black is playing for counter-play".


It seemed to play a lot of English Opening...that also seems strange to me.


I think in a way it’s an opening that rewards preparation and theory. With near perfect play expected on both sides, what seem like sharp games to humans are quite easily navigable.


I thought it was interesting that it seems to like the English Opening. It's not popular, but Bobby Fischer played it in the world championship against Spassky.


Combinations appear in Sicilian often. If you're playing black, that's a good thing depending on your rating.


So when are they going to apply this to Atari Games or well anything? The next step is they have one AI figure out the rules by making a GAN that imitates player behavior and the other AI be Alpha Go which tweaks the GAN inputs to generate different moves to win. Voila...Almost General Purpose AI that can learn to play any game.


The main problem is that we still lack good generative models and good ways of interrogating them. GANs are unstable and difficult to apply to time series, VAEs suffer from posterior collapse, WaveNet/PixelRNN grow with the input size and overemphasize the details, RNNs are hard to train because we lack good training algorithms. Generally, small errors tend to compound in step-wise predictions because NNs do not generalize very well and gradients tend to vanish and shatter. If you just regard computation time to roll out the future, modeling domains in which the rules are simple enough to be hand-coded and evaluated quickly (such as Go and Chess) probably makes MCTS a million times more suitable compared to domains in which you need a complex model.


To expand on eref's comment a little: you absolutely could apply this or MCTS to ALE (and Guo et al 2014 did it very nicely). After all, the ALE is deterministic and simulatable by definition, so of course you can explore the game tree and reset the simulation as necessary. But people aren't much interested in this approach because using the ALE as a 'simulator' is cheating as far as testing full-strength AI techniques (we don't have simulators of the real world, after all), and the ALE games themselves (unlike Go) are of little intrinsic interest so there's no real benefit to engaging in cheating.


Didn't they start it all by playing atari games?


Is this a library or something I can download and try training myself (on a small scale)?

I'm not in a position to read the paper right now, so my apologies if that's covered in there. I want to ask just in case it's not, while this is still on the front page.


No. DM only occasionally releases software. Expert iteration is simple enough that someone can code it up on their own and there's already a few clones, so if anyone cares to train their own, it's doable, although it may take a while.


"a while" is a bit of an understatement.

Leela zero (the main alphago zero replication project) is a crowd sourced computation effort that's going to take a fairly long time to get anywhere.

And from this paper: > "Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks."


You don't have to start from zero though. It's cool that it works with google scale resources. But it seems like it would be faster to initialize with a neural net first trained to mimic the moves of an existing chess or Go AI. And then improve it from there.

>"Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why do you close your eyes?", Sussman asked his teacher. "So that the room will be empty." At that moment, Sussman was enlightened.


The problem is that it isn't entirely clear whether this produces equal quality results. You might end up on a lower optimization plateau.


I don't think it's definitely true that will work well. AlphaZero did significantly better than the original versions of AlphaGo (which did learn from existing human games). However, even training those nets will still take a fairly intensive amount of computational resources.

As for that koan, I'm not convinced it's very applicable here. My interpretation of the koan is that the entire setup (training process, structure, etc.) all encode domain knowledge. In this case, I think AlphaZero's domain knowledge is transferable enough that I don't think it's relevant.


I'm pretty sure starting from zero is the point of the Leela-Zero. If they started from Stockfish, it wouldn't be a replication of AlphaZero.


What is its win percentage against itself on each side of the board in each game? Is chess a draw for its style of play? Is there a first move advantage for the other games with its play style?


So AlphaGo Zero used 4 TPUs while AlphaZero used 1500. It’s not immediately obvious to me why there is this massive difference. Can anyone elaborate?


Both used 4 TPUs at playing time. At training time, AlphaGo Zero used unspecified amount of computing resource, AlphaZero used 5000 TPUs for self-play.


Ah, thanks for clearing that up! Makes sense.


I'm only a fairly pedestrian chess player, but I looked at one of these games between AGZ and SF and aside from the endgame, AGZ played in a manner that almost seemed alien. It seemed to completely ignore various little rules of thumb which is to be expected in hindsight but fairly mind-blowing when you actually watch a game.


Here's an HTML version of the paper:

https://www.arxiv-vanity.com/papers/1712.01815/

Table 2 is broken, but the rest is much more readable if you're on a phone.


The more interesting metric going forward is performance at a given power budget (not unlike with motorsports). The TPUs are consuming sooo much power here! Most interesting real-world problems are power-limited, including in nature (e.g. metabolic limits).


When a lot of money is on the line you can use a lot of resources.



This paper compares AlphaZero to the 20 block version of AlphaGo Zero that was trained for 3 days. Am I right in thinking that this version was significantly less strong than the 40 block version? If so, does it matter?


Wasn't Stockfish gimped for this competition? No openings, no endgame tables, low RAM, etc? If that's so then this AI did not in fact beat the computer chess champ.


Is there an sdk or compiler for using the google tpu's beyond just using tensorflow ? Is the tpu backend of tensorflow based on cuda, opencl, plain c or something else ?


As a Shogi enthusiast (but complete beginner), I'd like to have seen more Shogi details in the article. Nevertheless there's plenty of other things to geek out on...


Great result, but without access to source code this is not a scientific paper.


There is only one way for a human to win at chess against these computers; and it involves violence against the chess board.


Did Magnus play against this? Is there a way we can see the game?


No he didn't play it. As far as I know, computers are already far ahead of humans in chess, so a further progress in this wouldn't really make a difference.


It would be interesting in one way though: Magnus says he hates playing against computers, because "it's like being beaten by an idiot". Modern chess engines still make moves that are somewhat strategically weak, but they make up for it with amazing tactics.

It would be interesting to hear if Magnus thought AlphaZero played less like an idiot.


You're right that it's pointless. The paper has the game with Stockfish so that's good enough for me.


source code?


See, Mom? Self play is a good thing.


A lot of the graphs in the paper seem to level out as they hit the level of the opponent. It makes me wonder to what extent AlphaGo Zero is merely optimizing to beat flaws in existing opponents' current implementations (even if "existing opponents" == all available opponents' data and algorithms today) rather than generalizable insights into the underlying game. Because wouldn't you expect that unless we are at the theoretical limit of perfect chess that a tabula rasa approach might exceed existing best practice significantly, especially with the massive computation advantage it has?

Not that there's anything wrong with that; AlphaGo Zero supposedly optimized for the "just enough" win rather than the crushing win. It doesn't even mean Stockfish is doomed--I suspect Stockfish could beat it in a future heads up match provided that Zero didn't have time to retrain, but that a retrained Zero (having the benefit of optimizing against a new Stockfish) would be able to supersede it once again.


> A lot of the graphs in the paper seem to level out as they hit the level of the opponent.

DM is no longer investing much in the AG research program; Silver said the team has been disbanded already. If you look at the Go graph in this or the first AG0 paper, Zero was still getting better at Go when they shut it down, it hadn't converged. They just didn't want to tie up the TPUs. I don't think it's a coincidence that the graphs tend to stop after they reach superiority.

(Also, as Houshalter says, one of the critical aspects is that this is pure self-play ie the NNs never play against the existing engines except for evaluation. So it's all independent from-scratch reinvention.)


It's not. It learns entirely through self play and never learns from playing it's opponent. Diminishing returns isn't unusual and happens in every domain. These AIs are probably playing close to the limit of what is possible, just not quite there yet.


Are there popular games where the best human players are not near the limit of what is possible? Obviously you can construct one to be hard for humans (large 3SAT problems, or even big arithmetic problems), but I wonder if there is one that people enjoy.


Humans are nowhere near the limit of what is possible in chess, as evidenced by how much better computers are at the game.


Presumably tlb meant what is humanly possible...


I'd assume that for pretty much any nontrivial game the best human players are nowhere near the limit of what's possible. Humans can play a perfect tic-tac-toe, but for everything in the realm of go, chess, poker, bridge, etc the theoretical ideal is far beyond currently best human players.


Seems like it flattens, but they only trained for a few hours. What would happen with 100x more training?


ELO ratings level out eventually for a given pool of opponents. If a player already wins every game against all available opponents, there's no evidence that can tell you if they suddenly got twice as good.

If tracking improvements past the state of the art is important I think they'd have to freeze the algorithm every 400 ELO or so and rate the improved versions against the last snapshot.

(Doesn't really apply to the stockfish case, but it does to the other two games.)


Certainly a significant achievement. Also, kind of interesting that the AlphaGo team spent a lot of energy to convince us Go is much harder than Chess, only to turn around and tell us that it is amazing that it can also win at Chess.


> only to turn around and tell us that it is amazing that it can also win at Chess.

What they're demoing here is a single, general formula for mastering multiple games. Start with empty AG0, then teach it chess from scratch until it is the strongest player on the planet.

Go back to an empty slate, with the same exactly "untrained" AG0, and now teach it Go, to the same result. No fine-tuning for the domain of the game you are training -- it is general(ized).

That's the gist I'm getting from this.

question for someone who has time to read the paper: can you train it to master chess and go at the same time? or is it one or the other? I'm assuming the latter.

edit: check out the graph on the 4th page. AlphaZero, which can master chess and shogi, can beat AlphaGo Zero, the implementation specifically designed for Go, at its own game.


Question: do you think you are using the same parts of the brain to play chess and Go? What counts is not using the same neurons, but using the same neural algorithm.


> question for someone who has time to read the paper: can you train it to master chess and go at the same time? or is it one or the other? I'm assuming the latter.

I'm sure you could with a multi-headed NN. But what would be the point? There's very little transfer of knowledge between the games, especially once you get past the very most basics.


The point is that real problem domains are not neatly partitioned and labeled.

I don't know what kind of input the NN itself gets, but computer vision is enough to translate a photo of a chessboard to a usable symbolic representation. But it would be nice to already have a black box-ish computer program that figures out what's the game at hand and how to play it.

The next variation is have the adversary start playing a chess variant and have the machine recognize it (assuming honesty) and play it to significant skill. Then "real life Pong" where the size and aerodynamics of the ball are unknown to it. This is the gist of human intelligence: answering questions is significantly easier than figuring out what the question is.


> Go back to an empty slate, with the same exactly "untrained" AG0, and now teach it Go, to the same result. No fine-tuning for the domain of the game you are training -- it is general(ized).

Not quite -- different input features, which implies slightly different network architecture at least at the front.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: