Mastering Chess and Shogi by Self-Play with General Reinforcement Learning

gwern · on Dec 6, 2017

This is an incredible demonstration that the AG Zero expert iteration method is a general method. If you go back to the discussions of AG Zero lo a month ago, there was a lot of skepticism that NNs would ever challenge Stockfish et al - they are just too good, too close to perfection, and chess not well suited for MCTS and NNs. Well, it turns out that AG Zero doesn't work as well in chess: it works better as it only takes 4 hours of training to beat Stockfish. This is going to be an impetus for researchers to explore solving many more MDPs than just chess or Go using expert iteration... ("There is no fire alarm.")

deepnotderp · on Dec 6, 2017

See the thing is though, Giraffe's evaluation actually was better than Stockfish's evaluation function, but it took much longer, and thus wasn't able to search as deep as Stockfish et al. So in a way, the real triumph of the AlphaGo series was the TPU and GPU army.

gwern · on Dec 6, 2017

> So in a way, the real triumph of the AlphaGo series was the TPU and GPU army.

Eh. It's still searching many fewer positions than Stockfish is.

skybrian · on Dec 6, 2017

Unlike in most algorithms where correctness and performance are independent, chess engines can't be evaluated without testing performance at the same time; faster is not just faster, it changes the results.

So there is a tradeoff between the depth of the search and quality of evaluation. For traditional chess algorithms, better evaluation was rarely worth the cost; it would slow down the search so much that it didn't pay for itself.

But this performance tradeoff (like all optimizations) critically depends on hardware. Change the hardware and you change which optimizations are "worth it".

AlphaZero is clearly good at using TPU's to maximum effect. But what would its performance be in a CPU only environment? Maybe dumb but deeper searches still win there? This evaluation hasn't been done.

This isn't to say that the AlphaZero evaluation is "unfair". Rather that chess engines evolved to be too dependent on their environment. Getting maximum use out of CPU's is a strength, but not being able to use TPU's or even GPU's is a weakness.

nojvek · on Dec 7, 2017

Agree with this. Stockfish is fast enough to run on modern iphone and Android phones. AlphaZero most probably not.

But the fact that a generic algorithm absolutely destroys humans and their human crafted programs is the most interesting.

Yes TPU + GPU army is a huge amount of computation power but I'm sure they'll be research coming out trying to compress the algorithms enough to use the same computation power as stock fish.

allenz · on Dec 6, 2017

It searches fewer positions because it decides where to search using 4 TPUs, which are 180 teraflops each according to Google.

varelse · on Dec 6, 2017

That's not clear, each (second generation) TPU is 45 FP16ish unspecific TFLOPs. A single board consists of 4 TPUs at 180 TOPs total. This is similar to the Dual P100 NVLINKed Quadro which is an absolutely killer HPC/DL card. I believe they have a similar Volta option, but that kind of HW is above my pay grade these days.

Further, they used 5,000 (first generation) TPUs at 90 INT8 TOPS each, page 4, to run the network during MCTS and 64 (second generation) TPUs to train this thing according to the methods. That's a nice mix of using INT8 for inference and FP16ish for training IMO.

In contrast, I personally own 8 GTX Titan XP class GPUs and 8 more GTX Titan XM GPUs across 4 desktops in my home network. I'd love to experiment with algorithms like this, but I suspect I'd get just about nowhere due to insufficient sampling. These algorithms are insanely inefficient at sampling at the beginning. So I guess I will seed the network with expert training data to see if that speeds things up.

That said, more brilliant work from David Silver's group! But not all of us have 5,000 TPUs/GPUs just sitting around so there's still a lot more work/research to make this more accessible to less sexy problems.

nojvek · on Dec 7, 2017

It's definitely worth a shot reproducing the results.

On the other hand, Google will make a shit load of money when they make TPUs available on gcloud. Papers like this are great marketing for them.

skybrian · on Dec 6, 2017

I wonder how much the hardware would cost to rent for a researcher not working at Google?

varelse · on Dec 6, 2017

So 3 2nd generation TPUs are ~= 1 Volta class GPU ~= $3 per hour on-demand on AWS: https://aws.amazon.com/ec2/pricing/on-demand/ and ~$1 (75 cents at the moment with p3.8xlarge and its 4 GPUs) in spot: https://aws.amazon.com/ec2/spot/pricing/ if you take the time to build a robust framework.

And to make things simple, let's do it all in FP16 because INT8 on Volta ~= 1/2 a first generation TPU, but FP16 ~= 3 first generation TPUs at INT8 (sad, right?), an accident that occurred because P100 didn't support INT8, but consumer variants did.

So, 5,064/3 = 1,688 Volta GPUs ~= $5000 per hour, probably half that reserved, a quarter of that in spot.

Say you need a week to train this, so $200K-$800K...

You can buy DGX-1Vs off-label for about $75K. Say they costs $20K annually to host. Say you use them for 3 years, so total TCO is ~135K, which comes down to $0.64/hour.

Conclusion: p3.8xl spot instances are currently a steal! But I don't have ~$200K burning a hole in my pocket, so I guess I'm out of luck.

Chickenosaurus · on Dec 8, 2017

Google provides 1000 TPUs free of charge to researchers https://www.tensorflow.org/tfrc/

allenz · on Dec 6, 2017

I don't think that the specific numbers are relevant for what deepnotderp and I were saying: that Giraffe already demonstrated the potential, and all that was missing was a boatload of compute.

EvgeniyZh · on Dec 7, 2017

P100 is 20 FP16 TFLOPs, V100 is ~30. So 4 TPU gen 2 is ~9 P100 or 6 V100

CydeWeys · on Dec 6, 2017

TPUs aren't "cheating" though, as they can be used for generalized machine learning models, and not just Go.

Computer graphics is still an impressive achievement even when done on a GPU instead of a CPU.

Retric · on Dec 6, 2017

I think his point is if you devote X Flops to something then a fair comparison would be to also give X Flops to the competitor. The specifics of how an algorithm does not matter as much as the total resources used and outcome.

CydeWeys · on Dec 6, 2017

A more fair comparison would be to cap the hardware used at a certain cost. That's much more reflective of the real world. There are plenty of tasks that perhaps you could do more efficiently on a CPU for a given number of operations, e.g. maybe some graphics operations, but in practice it's completely irrelevant because a GPU gives so much more performance for the given cost. There's nothing special about an operation, but dollars do matter.

Retric · on Dec 6, 2017

Only if you're buying hardware based on the algorithm used. Useful chess programs need to actually run on people's phones where performance on a cluster of ASIC's is mostly meaningless.

varelse · on Dec 6, 2017

I think we need to start capping total electricity and total $$$. I'd love to see AlphaZero 20W pitted against that other 20W supercomputer. When humans fall to that, be afraid(tm).

I'll even be charitable in order to simulate the existence of school/teachers/books: training from the start gets 2KW. But gameplay still gets capped to 20W.

CydeWeys · on Dec 6, 2017

Electricity isn't free though; why can't it simply be rolled into cost? Just assign it a standard cost per kW-hr and charge accordingly. This more accurately reflects economic incentives driving hardware development.

varelse · on Dec 6, 2017

Sure, why not, but then how do we compare to a human?

CydeWeys · on Dec 6, 2017

I don't think you can, and such a comparison is not really needed here anyway. People are not chattel slaves and cannot be racked into data centers to solve boring problems.

Of course, you can hire people, and that has a well-defined cost, so it does all come down to money again.

varelse · on Dec 6, 2017

Sure, but the whole point of the above idea is to compare our 20W computers to what we can build that eats 20W. And don't give Silicon Valley ideas about disrupting the lucrative Mechanical Turk ecosystem by scaling it up with ideas borrowed from growing veal because some VC sociopath will take it seriously. Just sayin'...

CydeWeys · on Dec 6, 2017

And I'm saying that this 20W limitation isn't particularly meaningful, as many organizations have way more power at their disposal to throw at a problem than that. The economics of a given solution, on the other hand, is applicable at all scales.

nojvek · on Dec 7, 2017

Meaningful in the sense that if an AI plays against humans, is it smarter at the same energy efficiency of humans.

We are comparing machine intelligence vs human intelligence.

It can be said that with more computational power, you can raise intelligence. Human brains consume the most power relative to body size than any other animal.

CydeWeys · on Dec 6, 2017

This argument is about state-of-the-art chess, not chess as a mobile phone game. Humans are so bad at chess compared to the best programs now that even a smartphone app can't be defeated by people.

Also, mobile phones have Internet access, so there's no reason the algorithm has to run on the phone itself. It could run on TPUs in the cloud. It's common for many games to have server-side components. Though this isn't even necessary except maybe if Magnus Carlsen wants to play it.

Retric · on Dec 6, 2017

I think you misunderstood. Sure, if you are willing to deal with the increased costs and lowered reliability you could write a chess program that required massive server resources.

But, I don't think a lot of people would pay for that vs. having a program that just runs on there phone and still beats them. So, in practice without a significant subscription fee you are going to be limited to cellphone hardware.

PS: In practice most games take about as much computing power from a server as a chat app as companies need to pay for that hardware. Remember 1,000,000+ X get's big unless you keep X very low.

CydeWeys · on Dec 6, 2017

Again, this entire article and discussion is about state-of-the-art chess. As in, literally working to "solve" the game and develop optimal strategy. I don't understand what relevance casual mobile chess games have. Computer chess is already very far beyond human capabilities, and it can't be pressed further just using mobile phone hardware (nor is that a reasonable restriction).

It'd be like in a discussion about SpaceX's BFR designs to colonize Mars, someone comes in and questions why they're using retropropulsion since the requisite control systems are infeasibly expensive for amateur model rockets. It's a completely different discussion.

Retric · on Dec 6, 2017

That's not why this is relevant. Given equivalent hardware it's still a worse solution for chess. The value is you can get results of similar quality with vastly more compute power even without 1,000+ years of analysis.

Otherwise the only takeaway is this failed to improve the state of the art.

CydeWeys · on Dec 7, 2017

"Equivalent hardware" is only relevant if we're talking about cost. When measured by that metric, the TPUs are indeed superior. Raw operations is an irrelevant metric given the existence of economic purpose-specific hardware that can perform a lot more of the operations required for matrix multiplication than for general computation. GPUs work exactly the same.

Retric · on Dec 7, 2017

Again, cost is relative to hardware you have. If you own a supercomputer already and you want to run chess on it for whatever reason it matters what the performance you get from each algorithm on that hardware. If your going to buy new hardware it's design depends on performance across every algorithm you expect to use.

So, the only case where chess performance per $ matters is if you are only going to ever use that hardware to run chess. In every other case which is the vast majority of the time you care about diffent metrics.

empath75 · on Dec 7, 2017

Why? Just do the computation in the cloud.

modeless · on Dec 6, 2017

Several phones already have neural net acceleration hardware in them today, including the latest iPhones.

Retric · on Dec 6, 2017

Some iPhones are manufactured with this, but again if you have paid for the hardware you care about performance on that hardware. If you have yet to buy anything then theoretical performance per $ becomes the meaningful metric.

jacksmith21006 · on Dec 7, 2017

Same with the Pixel 2. But the Pixel 2 appears to be a bit more powerful than the iPhone neural chip. The PVC is able to do 3 TOPS but we really need instructions supported and word size to truly compare.

gwern · on Dec 6, 2017

But better evaluation gives you asymptotic speedups. You can give Stockfish several times its computation (which is already a lot, I mean, 64 threads, come on) and it doesn't make good use of it since it just runs into the search wall. If you gave Stockfish the equivalent in CPU power (and I'm not sure this is a fair hypothetical since part of the appeal of NNs is that they have such efficient hardware implementations, so it seems unfair to then grant a less efficient algorithm equivalent computing power by fiat), I'm not sure it would be restored to parity or superiority.

allenz · on Dec 6, 2017

Absolutely. This required an exorbitant amount of compute, but DeepMind had to do novel, nontrivial research to make use of those resources.

allenz · on Dec 6, 2017

Edit: DeepMind's victory over Stockfish didn't need novel research. Giraffe already demonstrated that the asymptotic speedup was possible; it just needed more compute.

nl · on Dec 6, 2017

The number of positions evaluated is the number evaluated. Speed doesn’t change that.

Speed probably made the initial self play training quicker though.

allenz · on Dec 6, 2017

Compute absolutely matters. With tree search, there's a tradeoff between scoring cost and positions evaluated. AlphaZero can evaluate fewer positions because it uses a huge amount of compute to accurately score each position.

It's not just training. Training used 5,000 TPUs.

arcticfox · on Dec 6, 2017

Can't those teraflops be applied to evaluating more positions instead of deciding which positions to evaluate?

It seems that the metric should be compute time, not positions evaluated.

nl · on Dec 6, 2017

Presumably they both had equal clock time - that is a standard chess rule so it would be surprising to see it different.

setr · on Dec 6, 2017

Wall clock != cpu clock

I can do more in the same wall time with a faster cpu(s); I can afford inefficiencies that the opponent cannot, and accomplish just as much.

deepnotderp · on Dec 6, 2017

Right right, but my comparison was between giraffe and AlphaGo , not neural networks and Stockfish.

joshuamorton · on Dec 6, 2017

But it looks like AlphaGo is searching fewer positions per second than Giraffe did.

AlphaZero evaluates 80K positions per second, according to this paper, and the Giraffe paper says that Giraffe averaged 258570 evaluations per second when running STS.

While we can't directly compare the computer power, this implies that AZ has learned a better representation.

allenz · on Dec 6, 2017

It's unclear how much of the "better representation" was due to better algorithm vs. more compute/deeper NN.

joshuamorton · on Dec 7, 2017

Giraffe was trained until convergence. Maybe if there was more compute power then, a different model would have been used, but that's deep into the world of silly hypotheticals.

Retric · on Dec 6, 2017

By starving the competitor of computing power, if you compare A and B you can't give A 10+x the compute power and assume a fair comparison. What's interesting is a demonstration that enough compute power let's NN reach beyond human level play. Though, I don't think that was ever really in doubt.

sherjilozair · on Dec 6, 2017

While I'm glad to see you're excited about this, take note that this is still an approach which requires that an exact model is known, the state is fully visible, and the reward is perfectly define-able and known. Progress in this setup isn't necessarily correlated with the kind of AI for which we'd need a fire alarm.

visarga · on Dec 6, 2017

It's easy for many to think that solving Go and chess means we can also solve household work like cleaning, cooking and washing dishes but it's actually harder.

sherjilozair · on Dec 6, 2017

Next up: Google's Deepmind AI learns to perform arithmetic tabula rasa.

More seriously, it seems Deepmind and the AI community in general is having a Streetlight effect problem, i.e. looking for AI in what works now, rather than coming to terms with the hard challenges. This explains why there are so many papers on GANs. People are just doubling down on what works (where the streetlight is), rather than acknowledging that where we need to look for AI is dark. Since it's become such a cut-throat race to be the next one to say "we made a breakthrough!", it makes much more economic sense to solve simple problems and advertise them as huge challenges.

visarga · on Dec 6, 2017

I wouldn't dismiss GANs so easily. Yann LeCun was singing odes to GANs - as the most interesting idea in the last decade. The interesting thing about GANs is that they don't use a predefined loss, but instead the discriminator acts as the loss function for the generator - thus, it is learning a loss fn instead of using human guesswork to create it. That's quite a powerful new idea. Applications of GANs include making simulated images look more real, which is essential for RL, generating 'artificial' training images for other tasks and using the discriminator as an image embedding generator or classifier.

allenz · on Dec 6, 2017

I agree that the average Joe will misinterpret the significance of AlphaGo, to Google's benefit.

But most people in the research community already know how amazing it would be to make an affordable household robot or a search-and-rescue robot or a self-driving car. Many labs (including mine) are working on it. The streetlight adds a small bias, but the bigger problem is that we have no idea how to build human-level AI.

nojvek · on Dec 7, 2017

The biggest problem in robotics is vision. How do you translate pixels to a 3D scene graph with objects attributes and correlate with prior knowledge.

Do that in real time, on device without using a crazy amoubt of power because of batteries.

CNNs and faster GPUs are the biggest breakthrough in that regards but it's still a long way to go before we get to human level visual cortex.

allenz · on Dec 7, 2017

Vision is part of the puzzle--a large part in the case of self-driving cars. But blind people are way better than computers at everyday tasks, so I don't think that it's the Big Problem.

Translating to 3D is low-level and relatively easy. That's not the reason why we don't have household robots/self-driving cars.

Framing vision as "object attributes" and "correlate to prior knowledge" might be a good approach for current research. But humans do more--we understand what we look at. We form concepts and models of the world that allow us to adapt to very novel situations.

The main reason why we haven't solved vision, language, playing chess like a human, etc is that NNs are a poor approximation of human concepts. I agree that we probably need more compute and better compute.

skybrian · on Dec 6, 2017

Yes, but it doesn't seem like much of a problem? Exploiting a breakthrough before moving on to harder problems isn't cheating, it's the smart thing to do. It might even turn out to be the fastest way to make progress on the harder problems.

Cybiote · on Dec 6, 2017

Let's break this down and consider things carefully. To informed researchers, what is most surprising here is not that the AlphaGo Zero algorithm beat stockfish but that MCTS managed to outperform Alpha-beta search. I'll venture a hypothesis as to why this was.

Informed skepticism would have discounted MCTS against alpha-beta search but wouldn't have put much stock into the idea that Neural Networks couldn't learn better features than what has been painstakingly handcrafted. We know that given sufficient data and an appropriate architecture, neural nets have achieved better local minima than humans. This shouldn't be surprising anymore. A structurally adapted searcher will always do better in its adapted to domain. A Cat is so good at being a cat, it doesn't even have to think about how to cat. Choice of optimization method, input pre-processing, loss function, hyper-parameters and architecture together define a search space, a structural prior and how to navigate.

Returning to alpha-beta vs MCTS, my view is that earlier work on the chess search space being ill-suited to MCTS has not been invalidated once you account for the synergy between the neural net and search method brought about by the imitation learning approach. What might be happening here is the neural net not only learns to correct when it goes out of bounds, it also learns to account for missteps of MCTS!

The AlphaGo Zero Chess Program is clearly smarter than stockfish from the perspective of its ability to better navigate the search space but before talking about fire alarms there are some things to note.

Assuming the paper, AlphaGo zero does well if you hold compute fixed and adjust time, but how does it do as you move along both compute and time? This is of relevance to the general community, especially if AlphaGoZero skill degrades gracefully enough to allow it to be a better tutor than current engines.

Contrary to the no fire alarm claim, we should see sudden improvements everywhere due to how close joint, structured prediction, reinforcement and imitation learning are to each other. Unexpected improvement across a broad class of problems is a fire alarm. Right now, POMDP or games with hidden information and multiple interacting agents are still very difficult. Structured prediction is still difficult. Granted, this was before AGZ, but Neural Nets+MCTS had to be modified to Neural Self-Play before it could work just ok in poker-like games.

What we should take away is the power of combining searching and learning. I'll argue that what is now being called expert iteration was presaged in an antique 2006 paper [1] where Hal Daume et al discuss the power of a learning algorithm trained to imitate a search computed policy. Even with limited compute and data, you can use similar ideas under the learning to search framework. The imitation approach is what's consistently yielded great results, whether applied to neural nets or logistic regression.

[1] http://www.umiacs.umd.edu/~hal/docs/daume06searn-practice.pd...

https://link.springer.com/content/pdf/10.1007/s10994-009-510...

Cybiote · on Dec 6, 2017

Correction to the above: I stated Deepmind applied Neural Nets+MCTS and achieved ok results. I was actually misremembering two David Silver (Deepmind) papers as one. Smooth UCT modified UCT (popular brand of MCTS) to be able to handle imperfect information games. MCTS does not converge under imperfect information. Smooth UCT is strong at limit poker. Limit is much simpler than no-limit.

Neural Fictitious Self Play based on fictitious play (invented 1950s), is an approach to reinforcement learning using neural nets for function approximation. Typical RL methods like DQN are highly exploitable. Against strong programs, NFSP did okay, with a win rate of -50 mbb/h against the best bot it played against.

Looking not just at Deepmind, there's Deepstack. It's similar to AlphaGo OG, combining CFR+Neural nets. Deepstack did not win convincingly against humans at 2 player no limit hold em.

The general point I'm trying to make here is that Chess and Go are closer to checkers than to poker, which is itself a constrained game with known rules. I mention all this and this Deepmind paper: https://arxiv.org/pdf/1711.00832.pdf, to provide a sense of scale to those talking about smoke and fire alarms.

nopinsight · on Dec 6, 2017

What do you think of Libratus which won quite convincingly against top players in no-limit Texas hold ‘em poker?

https://en.m.wikipedia.org/wiki/Libratus

xoroshiro · on Dec 6, 2017

Probably the wrong engine to test this with then. Although it's interesting nonetheless. It's pretty well known that chess engines have this trade-off between searching and evaluating. Among the consistent top 3 I suppose Stockfish is the easiest to test, being open source and all. It's pretty well regarded that Komodo has the best evaluation function though. Even if it doesn't keep up with the nodes/sec of Houdini and Stockfish, it's consistently up there with the top 3. The other chess engines doen't even come close. (Fire is probably number 4 but is on a league of it's own. Not quite good enough to challenge the top 3, but eats everything else.)

I know it's complicated, between the hardware differences, search method used, etc. But when claiming that NNs beat hand crafted evaluation functions, keep in mind that Stockfish is probably are not the best choice to compare, since it has made different tradeoff choices to get more depth (which goes back to search method and hardware choices).

gcp · on Dec 6, 2017

Your comments about Stockfish, Komodo etc are entirely subjective. "It's pretty well regarded". No it's not.

You can't disconnect the search part from the discussion, as the search selectivity is ALSO learned by the neural network.

nlperguiy · on Dec 6, 2017

Yeah, I'm quite confused that there's no mention of SEARN or LOLS or similar imitation learning algorithms in the references of the Alpha Zero paper. The algorithm for learning looks severely derived from that 10 year old idea.

sanxiyn · on Dec 6, 2017

I agree that Searn is rather prophetic in retrospective.

Houshalter · on Dec 6, 2017

They aren't the first to apply NNs to chess though. What are they doing differently? And does anyone else smell smoke?

gwern · on Dec 6, 2017

It's certainly not the first NN chess program. You may remember one of OP author's Giraffe NN (https://arxiv.org/abs/1509.01549) which was essentially 'AlphaGo for chess'. But like the original AG, it struggles to learn and Lai had a lot less computation as a student than he does now at DM. What they're doing is applying AlphaGo Zero expert iteration with some simplifications and TPUs. And that pwns previous work like Giraffe the way AlphaGo Zero pwns AlphaGo. Quantity becomes a quality all its own.

evandijk70 · on Dec 6, 2017

Look at Figure 2, and remember that DM has access to a lot of hardware. At short thinking times, AlphaZero is weaker than Stockfish. This is equivalent to longer thinking times with weaker hardware, and it is likely that the former applications of NNs to chess had hardware that was a 1000-fold slower than what DM has access to. This means that even if the approach was identical to DMs, they would not have seen a better performance of NNs than the classical alpha beta approach.

nojvek · on Dec 7, 2017

In essence MCTS + NN is just another way of tree search just like AlphaBeta or its brute force cousin Minimax.

AlphaZero just tries to be smarter about which branches to evaluate so it can go deeper.

But I would love to see AlphaZero (trained) run side by side with stockfish on an iPhone hardware and defeat it. That would be a more apples to apples comparison.

make3 · on Dec 6, 2017

they are a huge company (Google) with access to top top top talent (experts) and infinite hardware resources. I don't know why it would be surprising if they acheived performance that hadn't been acheived before

grumpopotamus · on Dec 6, 2017

>it only takes 4 hours of training to beat Stockfish

In that time I figure they used the equivalent of about 1000 cpu-years. Imagine the things we'll be able to achieve as we can do more and more computation in less and less time.

cdancette · on Dec 6, 2017

But how many "cpu hours" of human work were used to design stockfish? You can't really compare that.

Some scientist say the brain has a power of several petaflops, so if you use this, I guess the design of stockfish was way less efficient.

You can't really compare things to cpu years, it doesn't make sense. Power consumption would be a better metric I think.

CydeWeys · on Dec 6, 2017

The best metric is total cost, including the cost of the hardware as well as the electricity. It might be worth prorating the hardware by the amount of time it spends on the task, too, assuming the hardware is general enough for many purposes (like TPUs are), vs say something like EFF's DES cracker which was not.

gwern · on Dec 6, 2017

A ton of CPU has gone into Stockfish, if only for their distributed computing project fishnet: http://tests.stockfishchess.org/tests

gwern · on Dec 6, 2017

To be a little more precise: Stockfish has used >5,667,382 CPU-hours (5.6 million CPU-hours) adding up just the participants who contributed >10,000 CPU-hours according to https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU...

nojvek · on Dec 7, 2017

Yes but that's training time. At runtime AlphaZero got magnitudes more computation power than stockfish.

A more fair comparison would be use as much computation needed for training but for runtime, use equal wattage hardware.

E.g 20W of cpu in mobile phone running stockfish to 20W of GPU in nvidia TX2 running AlphaZero to 20W human brain.

CydeWeys · on Dec 6, 2017

> In that time I figure they used the equivalent of about 1000 cpu-years.

Are you using some kind of conversion factor from TPUs to CPUs? If so, what is it? And is it valid to do that?

You could convert the amount of time it took to render an hour's worth of gameplay from 1 GPU-hour to 50 CPU-days (or whatever), but is that really meaningful?

allenz · on Dec 6, 2017

The conversion factor seems to be 1 TPU-hour ~ 500 CPU-hours in terms of flops. We can nitpick that number, but it won't change the conclusion that AlphaZero needs a boatload of compute.

CydeWeys · on Dec 7, 2017

I don't see how this is relevant though. A GPU also provides graphics rendering performance equivalent to some boatload of CPU-hours, but who cares? GPUs exist and are used for the tasks they are good at. TPU hardware isn't theoretical; it does exist and it is being mass-produced.

Yes, it needs a boatload of very simple compute (8 bit operations), the kind that CPUs are not even close to ideal at providing economically.

red75prime · on Dec 7, 2017

It needs huge amount of computations to perform a task which previously required a boatload of domain experts' brainpower too.

tantalor · on Dec 6, 2017

> doesn't work as well in chess: it works better

That's not how "as well" works.

dragonwriter · on Dec 6, 2017

Yes, it is; while the idiom standing on its own implicitly includes a leading “at least”, it is also idiomatic to use it in exactly the way used by the grandparent post, in an explicit contrast with better, where it comes with an implicit (or sometimes explicit) leading “merely” instead of “at least”.

T-hawk · on Dec 6, 2017

It's unnecessary, though, and makes the point harder to read. "It works even better" would be a perfectly sufficient description. "It works not as well as but better" is an unnecessary rhetorical flourish.

suchow · on Dec 6, 2017

The misdirection is being used as a rhetorical device — you're supposed to feel a brief confusion when you get to the colon; it's then quickly resolved.

soveran · on Dec 6, 2017

The ten sample games:

Sample game 1 https://lichess.org/VMe0gfa2

Sample game 2 https://lichess.org/Zqwn4Gzk

Sample game 3 https://lichess.org/G2fPHci8

Sample game 4 https://lichess.org/LLt8wyYp

Sample game 5 https://lichess.org/3r6CXx3H

Sample game 6 https://lichess.org/sbdyUYS4

Sample game 7 https://lichess.org/88vsAftE

Sample game 8 https://lichess.org/1uvCwaeB

Sample game 9 https://lichess.org/743quCXj

Sample game 10 https://lichess.org/SkCjxXkb

sireat · on Dec 6, 2017

Time and again Alpha shows it is much better at eval than Stockfish.

Alpha play feels "human" at least to this FM. This is fantastic news! It is what I would imagine a good correspondence GM would play like with engine assistance.

I already commented on Game 1 where Stockfish played extremely aggressively with 13. Ncxe5 ??! and 31. Qxc7 ?!

Game 3 is a positional masterpiece. Alpha is willing to play pawns + exchange down when it correctly evaluates that Black queen and rooks will be tied down.

This kind of long term thinking is beyond what regular engines perform.

Game 10 is also an impressive showing by Alpha. Alpha is willing to play down a piece and a pawn for 15 (30 ply) moves in a middle game beyond the reach of Stockfish's raw calculations.

If one could only get access to Alpha evals :) When do mere mortals get access to TPUs on Google Compute Engine?

forgot-my-pw · on Dec 6, 2017

Thanks for the analysis.

There's a project currently that emulates AlphaGo Zero using distributed computing / crowdsourcing: https://github.com/gcp/leela-zero . You can run it on the browser too and it will submit the games after: https://ntt123.github.io/leela-zero/

Hope such a project will be available soon for the chess variant.

Or maybe Deepmind will release this as a SaaS product?

nojvek · on Dec 7, 2017

Deepmind should release the TF compatible model with weights. And then it's just a matter of shrinking the model enough to run on desktop hardware.

But I don't know whether they'll do it. I hope they follow suit like other researchers who have github repos with code and models besides their papers. Really accelerates research.

EvgeniyZh · on Dec 6, 2017

So, 1. d4 for White, Berlin for Black. I got it

sillysaurus3 · on Dec 6, 2017

https://i.imgur.com/kwCyiHn.png

That was a bad move for white to play. It's easy to win when your opponent throws the game.

No human player would trade queens in that situation.

XR0CSWV3h3kZWg · on Dec 6, 2017

Are you thinking Qa5?

sillysaurus3 · on Dec 7, 2017

Yeah, or anything other than trading. You can see from the graph that it was all downhill from there; deservedly so.

sillysaurus3 · on Dec 6, 2017

Uhh... These games are actually broken. From the second link: https://imgur.com/a/P5tG6

See for yourself:

https://lichess.org/Zqwn4Gzk#87

https://lichess.org/Zqwn4Gzk#88

EDIT: Nope, I'm just a noob.

rditooait · on Dec 6, 2017

https://en.wikipedia.org/wiki/En_passant

sillysaurus3 · on Dec 6, 2017

Ah, thanks.

I'm delighted. Chess seemed so simple. I had no idea there was a special pawn capture.

imrehg · on Dec 6, 2017

This looks like a fantastic site! Is there anything similar for Shogi?

xianshou · on Dec 6, 2017

One impressive statistic from the paper: AlphaZero analyzes 80,000 chess positions per second, while Stockfish looks at 70,000,000. Seventy million, three orders of magnitude higher. Yet AG0 beats Stockfish half the time as White and never loses with either color.

A stunning demonstration of generality indeed.

1024core · on Dec 6, 2017

So ... what if you combined Stockfish and AG0, and let AG0 explore 70M positions instead of 80K? Would it improve even faster?

EvgeniyZh · on Dec 6, 2017

What if you combined a bus that gets you to work in 10 minutes and plane that gets you from Paris to Brazil, would it get you from Paris to Brazil in 10 minutes?

eternalcode · on Dec 6, 2017

yes. In an imaginary and hypothetical sense. :)

thom · on Dec 6, 2017

The issue is you can’t evaluate positions that fast in AlphaZero (currently).

thom · on Dec 6, 2017

It would be interesting to see if there were some way to extract a couple of new heuristics from AlphaZero that could be implemented fast enough to incorporate in Stockfish's evaluator though. I suppose this is the age old problem of black-box models: _why_ does it think this?

bhouston · on Dec 6, 2017

I think that it is almost always possible to extract optimized models from nn and implement them faster. I wonder if this can be generalized. Nn to optimized fix algo for Max speed?

This has to already exist as it is very obvious.

thom · on Dec 6, 2017

I dunno, seems like Google would just do this instead of keep around the pesky neural net at runtime. There's an _awful_ lot of computation going on inside, and it's necessarily hugely interconnected. I'd be impressed if someone had already done it, but it seems a great avenue of research if not. I suppose it goes hand in hand with models for which you can actually _explain_ their results, which certainly is an active area of research.

robrenaud · on Dec 6, 2017

There are well-known techniques that work pretty well to shrink neural nets a lot while keeping almost all of their performance. See Geoffrey Hinton's model distillation papers.

The first AlphaGo paper had a system that used tons of computation, and was followed up by one that used much less and worked even better. Not speaking for Google, but I think it's a bit of a race to publish great results first. I wouldn't be surprised to see something better than this that uses 1000 times less resources published in a year or two, just like what happened with Go. First prove it's possible, than figure out how to make it much more efficient.

gwern · on Dec 13, 2017

A really good example of model distillation also comes from DM: their new realtime WaveNet used in Google Assistant. The first WaveNet was ungodly slow due to redundant computation; but even after that, it still was not realtime simply because the CNN is too deep and slow. But you need the CNN to be deep & big in order to train good audio generation. Model distillation to the rescue: take a wide fast small CNN and train it to imitate the slow deep WaveNet. Result: WaveNet quality realtime voice generation which can be deployed to the masses.

thom · on Dec 6, 2017

Thanks for these googlable hints. :)

magoghm · on Dec 6, 2017

"We also analysed the relative performance of AlphaZero’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon." <- Amazing!

maxander · on Dec 6, 2017

Meanwhile a human player considers <1 position per second, so there’s a few orders of magnitude left to go in that direction.

But unsettlingly few, nonetheless.

nopinsight · on Dec 6, 2017

Humans are also much weaker than AlphaZero in these three games. The difference in the numbers of positions searched might be responsible for a substantial part of that.

yters · on Dec 6, 2017

It'd be interesting to weaken AZ until it is on par with a human, and then compare moves evaluated. I'd suspect humans still evaluate significantly fewer moves.

aidenn0 · on Dec 7, 2017

Strong human players consider a lot more than 1 position per second in Chess...

partycoder · on Dec 6, 2017

If you have seen the Stockfish project you will see many hardcoded weights in the configuration, found through experimentation. All these adjustments took probably years to achieve... and now Alpha Go Zero just self-learns everything and surpasses it.

Would be good to see Deepmind's solution play Arimaa and Stratego, and see what kind of strategy it comes up with. Or weird variations of Go.

Eventually this tech will make it into military strategy simulators and that's where things will get really messed up. 4 star generals will be replaced by bots.

gamegoblin · on Dec 6, 2017

I don't think this technique immediately applies to Stratego because it's not a perfect information game.

I suspect it would exceed the state of the art in Arimaa, since Arimaa is specifically designed to have a high branching factor (17281 -- compared to 35 for chess), and this technique was designed to work well in high-branching factor games (since Go is a high-branching factor game, though much lower than Arimaa).

partycoder · on Dec 6, 2017

In that regard then Stratego would share some aspects with Starcraft, another incomplete information game.

Deepmind is actively working in a StarCraft bot. It would be interesting to see if they can be put together a supraintelligent StarCraft bot and then translate those results to Stratego.

zwischenzug · on Dec 6, 2017

I smell a rat.

The paper says:

'AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi'

In the first game Stockfish's, 9. Qe1 is one of the strangest moves I've ever seen, which would never be considered by a human, let alone a superhuman.

11. Kh1 also makes little sense, but is not as bad. My Stockfish sees it as losing 0.2 pawns, which makes it highly suspect in such a position.

35. Nc4 is also a deeply puzzling move that my Stockfish sees as losing half a pawn immediately, and a whole pawn soon after.

50. g4 also suspect

52. e5 is insane.

This is bullshit.

Edit: bullshit is too much - see comments below.

Edit: Oh dear. We're doomed.

https://lichess.org/study/qiwMCyNQ

grumpopotamus · on Dec 6, 2017

The Stockfish engine provided by lichess on the game you linked doesn't seem to mind those moves - it has most of them in the top few lines after a few seconds of thinking time.

Qe1 and Kh1 are fine if the plan is to prepare f4.

35. Nc4 stuck around at the #2 / #3 best move for as long as I ran that position.

Remember the Stockfish in the paper had 64 cores so you'd have to run your Stockfish for a while to get it to arrive at the same principle variation.

zwischenzug · on Dec 6, 2017

Yeah that's right. I think this might say more about the efficacy of chess engines over a certain point vs human analysis rather than the 'bullshit' I called.

I'd certainly fancy my chances against this AI more than Stockfish on a lower power.

zwischenzug · on Dec 6, 2017

If I leave Stockfish to study for longer then Qe1 comes up in the analysis. Which makes me wonder whether SF gets weaker in some positions the more it's left to think.

zwischenzug · on Dec 6, 2017

Now I'm really intrigued.

SF plays really odd moves when left to its own devices for a time. As does this AI. So maybe chess looks really weird with play significantly better than the best humans.

It's actually really disturbing.

thom · on Dec 6, 2017

I think being able to play tactically perfect chess over 20 or so moves will often look weird to human strategic sensibilities. The computer sees every tiny exception to the patterns and heuristics you've incorporated into your gut feel about positions. In a way these moves are right just because they're right, and that's what's jarring - there's no _principle_ behind them that can be learned and generalised, which is something humans struggle with in all walks of life.

thomasahle · on Dec 6, 2017

Except AlphaZero doesn't evaluate nearly as many moves as Stockfish (80Knps vs 70Mnps), so in a sense, it has exactly generalized a principle (or likely a whole lot of principles) that allows it to estimate positions much better than Stockfish.

Of course you are right about perfect play, but the human-like aspect is part of what is exciting about these new Alpha engines.

zwischenzug · on Dec 6, 2017

Yeah. I'm stunned.

thom · on Dec 6, 2017

There's definitely nothing fishy going on, although it'd be nice to see a fully loaded Stockfish on its full complement of 512 cores and a proper endgame tablebase to really slog it out with AlphaZero.

thom · on Dec 6, 2017

Back to zero at 41 ply... I shan't give a running commentary anymore.

zwischenzug · on Dec 6, 2017

It's fascinating isn't it? I'd love to see this vs Magnus.

thom · on Dec 6, 2017

Pretty sure I input the moves wrong now I'm looking at it. Humans really have no place in chess. :P

allenz · on Dec 6, 2017

> which would never be considered by ... a superhuman

How would you know?

thom · on Dec 6, 2017

I don’t see it in my database, and it’s never been played on Lichess, even in bullet games.

allenz · on Dec 6, 2017

Which, of course, is not evidence that a superhuman wouldn't consider such a move. AlphaGo also made unusual moves that looked like mistakes, but turned out to be insights.

thom · on Dec 6, 2017

Apologies, misread the parent comment!

zwischenzug · on Dec 6, 2017

Ask any GM.

gamegoblin · on Dec 6, 2017

How would a GM (human) know what would be played by a superhuman?

nl · on Dec 6, 2017

The edits on this comment make it the second best HN comment of all time.

proaralyst · on Dec 6, 2017

What's the first?

nl · on Dec 6, 2017

The "Did you win the Putnam?" comment: https://news.ycombinator.com/item?id=35079

The whole thread is pretty hilarious. In another part of the same thread there is this comment:

we're in a similar space -- http://www.getdropbox.com (and part of the yc summer 07 program) basically, sync and backup done right (but for windows and os x). i had the same frustrations as you with existing solutions.

let me know if it's something you're interested in, or if you want to chat about it sometime.

drew (at getdropbox.com)

https://news.ycombinator.com/item?id=35103

parimarjan · on Dec 6, 2017

hmm...13.Nce5 looks like the move no strong human would play, and I suspect even engines after going sufficiently deep wouldn't choose it (I haven't checked it though).

sireat · on Dec 6, 2017

My perspective as FIDE master who has played Ruy Lopez Exchange type of positions for 30+ years.

9. Qe1 is a pretty normal maneuvering move

13. Ncxe5??! looks like a major howler.

Ask 100 strong chess players and 99 of them would completely ignore it. You are giving up a piece for two pawns in an open position and black has no real weaknesses. There is no real basis for a sacrifice.

This shouldn't work. The crazy thing is that Stockfish almost makes it work.

It is the kind of move you play when you absolutely must win and must win now.

The only reason Stockfish considered it is because of white pawn on a5 giving additional tactics in breaking up black pawn chain with a6 a couple of moves down. With pawn on a4 Ncxe5 wouldnt be worth attempting.

The crazy thing is that being such a bully almost worked!

At move 28. White looks very solid, with 3 perfect pawns for the piece + black has horrible weaknesses. 29. g3 is a bit suspect but the next super computer move is

31. Qxc7 this has to be losing but it is a typical computer bully move.

Most strong human players would prefer to defend h3 hole with Kg2 (on Qh5 f5 looks fine).

The idea is that black's white square bishop is boxed in with white pawns.

There must be a concrete reason why Stockfish did not play Kg2.

Overall the impression one gets is of very "human" play by Alpha and ultra aggressive play by Stockfish.

EDIT: so extremely impressive play by Alpha but a bit suspicious aggression by Stockfish.

dmurray · on Dec 6, 2017

> It is the kind of move you play when you absolutely must win and must win now.

I agree Ncxe5 looks crazy, but the weirder thing to me is that Stockfish offers a repetition the very next move. So it can't be caused by having high contempt (favouring wins over draws).

forgot-my-pw · on Dec 6, 2017

Thanks for the analysis

cdelsolar · on Dec 6, 2017

I wanted to contact the authors directly but can't seem to find contact info at the moment, with a question. I hope some of you might know enough to answer it.

I'm interested in applying this method, or a similar neural-network / tabula rasa based method to the game of Scrabble. I read the original AlphaGo Zero paper and they mentioned that this method works best for games of perfect information. The standard Scrabble AI right now is quite good and can definitely beat top experts close to 50% of the time, but it uses simple Monte Carlo simulations to evaluate positions and just picks the ones that perform better. It doesn't quite account for defensive considerations or other subtleties of the game. I was wondering if anyone who had more insight into MCTS and NN would be able to talk me through how to apply this to Scrabble, or if it even makes sense. One of the issues I can see currently would be very slow convergence; as it has a luck factor, the algorithm could make occasional terrible moves and still win games, and thus be "wrongly trained".

bo1024 · on Dec 7, 2017

Step 1: millions (?) of dollars of hardware.

ericand · on Dec 6, 2017

Two things to note:

1) Alpha Zero beats AlphaGo Zero and AlphaGo Lee and starts tabla rasa

2) "Shogi is a significantly harder game, in terms of computational complexity, than chess (2, 14): it is played on a larger board, and any captured opponent piece changes sides and may subsequently be dropped anywhere on the board. The strongest shogi programs, such as Computer Shogi Association (CSA) world-champion Elmo, have only recently defeated human champions (5)"

pacaro · on Dec 6, 2017

Shogi is a fun game, it always feels a little sad that it doesn't get more exposure outside of Japan (and my understanding is that, by and large, in Japan it is considered an "old persons" game)

Because captured pieces change sides, there is less of an "endgame" scenario, and as a beginner (like me) it is very easy to put too many captured pieces back into play, which makes it hard to defend everything and essentially you end up giving them back to your opponent

glandium · on Dec 6, 2017

It recently got renewed attention when Fujii Sota, 14 year old turned pro at the youngest age since Kato Hifumi, and subsequently had a record breaking winning streak (29).

kaffeemitsahne · on Dec 6, 2017

I've been interested in learning both shogi and xiangqi for a while. If anyone knows a nice engine with graphical frontend for either game, I'd love to know. Wasn't able to find much the last time I looked.

apetresc · on Dec 7, 2017

The best place to play Shogi online against others at http://81dojo.com/

foobaw · on Dec 6, 2017

It briefly became popular in the otaku culture from an anime called Hunter X Hunter.

mikekchar · on Dec 6, 2017

I'm curious to see if "San Gatsu no Lion" (the Lion of March) will spark interest. I highly recommend it to anyone interested in more slice-of-life/drama kinds of things. It's quite a beautiful anime/manga, even if the shogi isn't quite centre stage.

sanxiyn · on Dec 6, 2017

Recommendation seconded, Sangatsu no Lion is a lovely work. On the other hand, it has been running for 10 years (!), if it could spark interest like Hikaru no Go, it would have happened already.

muraiki · on Dec 6, 2017

Shion no Ou is another good shogi anime. I haven't read the manga.

Scarblac · on Dec 6, 2017

As a chess player I find the win rate astonishing.

Given the drawish tendency at top level, among human players, in correspondence chess and also in the TCEC final, I thought that even absolutely perfect play wouldn't score so well against a decent Stockfish setup (which 64 cores and 1 minute per move should be).

thom · on Dec 6, 2017

I can’t see any reference to whether Stockfish was configured with an endgame tablebase. It’d be interesting to see results then, as you’d expect AlphaZero’s superior evaluation to give it an advantage out of the opening, but later in the game Stockfish would have access to perfect evaluations. Obviously there’s nothing stopping you from plugging a tablebase into AlphaZero but that feels wrong.

thomasahle · on Dec 6, 2017

It's not clear that it had an opening book either. In any case it's not specified which one.

Invictus0 · on Dec 6, 2017

I'm not sure it's really fair to compare Stockfish to AlphaZero; AlphaZero used 24h of 5000 TPUs in compute time, and still needed 4 TPUs in real play, while Stockfish ran on just 64 threads and 1GB RAM. Nonetheless, still an impressive achievement.

nandemo · on Dec 6, 2017

Wait, how's the 24h x 5000 TPUs relevant? That is training time, and that training corresponds to years and years of hardcoding evaluations in Stockfish, not to compute time during the match.

Recursing · on Dec 6, 2017

Only 1GB RAM? Really?

bluecalm · on Dec 6, 2017

Yes, this is really strange. Hash table size is a major contributing factor for strength of chess programs. It looks like a very artificial limitation.

Aissen · on Dec 6, 2017

Serious question: how does one evaluate the results reproducibility of this paper ?

Maybe I'm missing some things but:

- Are 1st gen TPUs even accessible ? You have to fill out a form to learn more about those second generation TPUs: https://cloud.google.com/tpu/

- I can't find the source code

This does not look like a scientific paper, but a (very impressive) tech demo.

dandermotj · on Dec 6, 2017

This is definitely a scientific paper. Pretty much no scientific paper comes with source code and the majority of scientific papers are not reproducible without an entire university department of resources anyway.

gcp · on Dec 6, 2017

...and this attitude causes quite a bit of them to not be reproducible even when people try with the same resources.

Aissen · on Dec 6, 2017

> Pretty much no scientific paper comes with source code

Are we blindly accepting this as science now ?

voxl · on Dec 6, 2017

Yes? Sorry you've been out of the loop so long but science doesn't cater to your idealistic ideas of what it ought to be.

tdb7893 · on Dec 6, 2017

My main thing about source code and scientific papers is that it would just be so easy to release the source code along with the paper. Even if people don't reproduce work source code would often help to understand it as often I'm a little unclear on implementation details, which source code would be able to greatly clarify.

317070 · on Dec 6, 2017

How do you replicate CERN experiments? The LHC? Hubble? LIGO? LISA? At least this paper is reproducible by people who have the compute, and many universities have super computers.

Even at home, you can verify the results by replaying the games against stockfish. You might not be able to replicate the setup at home, but that does not mean it is not science.

Aissen · on Dec 6, 2017

Comparing projects done in the open with multiple different universities, on public funds, with something done behind closed doors with only personnel from a commercial entity is pretty far-fetched.

317070 · on Dec 6, 2017

Why? How are any of the factors you mention related to verifiability? How does being supported by public funds with academic personnel from multiple universities make LIGO any more verifiable for me at home? At least I can run these games against my stockfish, thus verifying the result. The method I cannot verify, but being able to verify the results is already more than most of science.

erk__ · on Dec 7, 2017

You can maybe not do the experiments from Cern, but you can do the calculations most of the software they use for that is open source iirc.

gcp · on Dec 6, 2017

There are too many details missing for the results to be reproducible.

Does it even qualify as a tech demo if the result only exists in DeepMind's lab?

bawana · on Dec 6, 2017

This raises an interesting concept. If you cannot reproduce an experiment because of lack of resources, can you believe it? Or is this the equivalent of 'photoshopping your results'?

A similar problem exists in cosmology. Can you verify the multiverse model if you only have one universe to experiment in?

As data storage requirements in RAM and TPU power requirements increase to run certain models/algorithms, machine learning is becoming more obscure. Not only can we not understand how an AI is reaching its conclusions (inscrutability), we cannot even probe it (by tweaking parameters, etc) to find weak points (inaccesibility). This is actually a good thing. Where humans cannot tread, there can be no evil?

gcp · on Dec 6, 2017

At least in computer chess such experiments were typically demonstrated by winning the World Championship. (And sometimes they failed...cough Deep Blue cough)

hyperbovine · on Dec 6, 2017

Definitely a scientific paper. It’s obvious from they way they formatted it that they’re going to submit this to Nature or Science.

sndean · on Dec 6, 2017

This then may suggest that there’ll be this detail-light manuscript in the journal and a 50-100 page supplemental document available to download, with all the details to reproduce (hopefully)

forgot-my-pw · on Dec 6, 2017

There's a project currently that emulates AlphaGo Zero using crowdsourcing: https://github.com/gcp/leela-zero . You can run it on the browser too and it will submit the games after: https://ntt123.github.io/leela-zero/

Hope such a chess project like this will be available in the future.

thomasahle · on Dec 6, 2017

Discussion at the Computer Chess Club (CCC) forum: http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...

and

http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...

tboerstad · on Dec 6, 2017

Stockfish plays like an ambitious amateur in the first game, giving away a piece for two pawns on move 13.

Perhaps this move was justified though, as later in the same game Stockfish gets a position which is at worst drawn, likely winning. Moves later however, around move 40, Stockfish gets its own knight trapped and the game is over.

This is not the kind of chess we normally see from Stockfish.

rozim · on Dec 6, 2017

Yeah, that game was kind of different from the others - in the other games the feeling I got was that over time AphaGo's pieces got increasingly effective while Stockfish's pieces would get bottled up and lose their mobility.

naveen99 · on Dec 6, 2017

Very happy to see this result. It's like a moral victory for humans, as alphago is more human like (discounting montecarlo search) than stockfish. Maybe deep learning will give us the next Euler, Newton, or Einstein.

visarga · on Dec 6, 2017

Shogi, chess and Go are "perfect information games", meaning you can see the whole game state. It's a whole different thing to be able to solve games where you don't see everything (based on uncertainty).

roenxi · on Dec 6, 2017

Is it really though?

A big class of imperfect information games can be modeled by having a record of everything the agent has seen so far. Then it has exactly the same, if not more, information available than a human player in the same position. We know that with equal information AIs can make better decisions than humans (see also, AlphaGo :] ) so at that point the AI could reasonably be expected to achieve superhuman performance.

The "imperfect information games are harder for AI" crowd are going to be surprised by just how badly humans deal with imperfect information. AIs have a much better memory than humans do, and much more potential to use actual probability which humans are truly shocking at utilising (although neural networks don't seem to utilise this edge; so far).

Cybiote · on Dec 6, 2017

The difficulty of imperfect information is from cross cutting through information sets and partial observability. With perfect information games like chess or Go, one can solve subgames with guarantees that the equilibrium is the same as for the full game. This is not the case for games like poker, which is why they have been difficult. In addition to that, for n > 2 players, there are no longer theoretical guarantees about converging to a nash equilibrium, which makes designing theory guided algorithms harder. Though empirical performance with n=3 of CFR is encouraging, I know of no results for n > 3.

Earlier this year, DeepStack, a system combining neural nets with search, competed live against humans without any side being dominant. Search policy guided training might improve its results, which are impressive compared to even 5 years ago, but this highlights how much more demanding imperfect information games are.

sharky6000 · on Dec 6, 2017

Yep, this. Btw there are some encouraging results for n=4 using sequence form replicator dynamics (which are implementing a form of CFR) in Kuhn poker. Toy example but the game gets large fast with n=4. Don't know of any results with n > 4.

http://mlanctot.info/files/papers/aamas14sfrd-cfr-kuhn.pdf

baq · on Dec 6, 2017

i'm not sure deepmind would publish a paper in which they describe a winning high stakes online no limit holdem player. the ethics would be quite shady. for all we know, they might have already done that just to see if it works.

visarga · on Dec 6, 2017

Could work, but it hasn't been widely demonstrated yet. I really hope we can tackle such games/RL tasks.

_jgvg · on Dec 6, 2017

You mean, like poker? https://www.cmu.edu/news/stories/archives/2017/january/AI-be...

Actually machines can have an even higher advantage in those cases, because they can be much better at estimating probabilities than humans. Think of card counting, for example.

grumpopotamus · on Dec 6, 2017

I disagree. Computers have been outplaying the best humans at chess for two decades, but they only recently beat the top players at 2-player NLHE and only with the aid of massive computational power during training.

Furthermore, techniques like monte-carlo tree search used in AlphaGo don't work very well for poker - You can't just try and find the "best move" from the current game state, or you will end up playing a highly-exploitable strategy. You essentially have to solve the entire game every time (or completely in advance) to make sure you are playing a balanced strategy.

Only the Counter-Factual Regret Minimization algorithm has been able to achieve this level of play in Heads Up, and right now it looks hard to scale to poker games with more players, like the full-ring games you see at the World Series of Poker, for example. We still have a ways to go in Poker AI.

jacobkg · on Dec 6, 2017

In 2015 Heads-Up Limit Hold'em was solved: http://science.sciencemag.org/content/347/6218/145.full

naveen99 · on Dec 6, 2017

Math can be perfect information too if you just start with axioms. Even when starting with conjectures, the rules are transparent for manipulating statements.

nl · on Dec 6, 2017

For those complaining about the TPU resources used during self training it is worth noting that Stockfish has used over 10,000 CPU hours for tuning its parameters. See https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU...

dmurray · on Dec 6, 2017

This understates it a bit. More like 10 million CPU hours according to that link.

110011 · on Dec 6, 2017

What an amazing result! Evaluating fewer (by a factor of 1000) positions AlphaZero still beats Stockfish.

In the figure on its preferred openings I find it very interesting that it doesn't like the Ruy Lopez very much over training time (there is a small bump but that is transient). I am hardly a chess expert but I know that it was very favored at the world championships so maybe the chess world will be turned upside down by this result now?

Positing that the chess world is bigger than the Go world (in terms of interest and finances) there is probably going to be a race to replicate these results "at home" and train yourself before your competitors :)

elcapitan · on Dec 6, 2017

What would be a good starting point to learn about the AI behind that for a "normal" programmer? There seem to be so many resources now that it's hard to choose. Combination of hands-on plus theory would be good.

tom_wilde · on Dec 6, 2017

Coursera - Andrew NG's course => Classic starting point, very thorough and digestable introduction to Neural Networks. I found he covered the 'how the heck do I use this?' rather well.. :)

From there, Coursera has a paid(?) DL course by Andrew NG or there's Fast.ai which looks good.

Good luck!

thanatropism · on Dec 6, 2017

The keyword is "reinforcement learning".

elcapitan · on Dec 6, 2017

I know the names of the general concepts, I was wondering if someone has concrete recommendations on where to start and which books/frameworks are sort of beginner-friendly.

AlexCoventry · on Dec 6, 2017

Try Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems for the fundamentals.

http://shop.oreilly.com/product/0636920052289.do

For reinforcement learning, I hear Barto and Sutton is very readable, but I haven't read it myself. You can just pick the concepts up by reading papers. The introduction in the Deep Q-Learning paper is not great, but it's how I first learned the concept.

http://ufal.mff.cuni.cz/~straka/courses/npfl114/2016/sutton-... https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

magoghm · on Dec 6, 2017

You can check this Reinforcement Learning Course by David Silver on YouTube: https://www.youtube.com/watch?v=2pWv7GOvuf0&t=836s

By the way, I believe David Silver was the lead programmer for AlphaZero.

asdfologist · on Dec 6, 2017

While this sounds impressive, I'll believe it when AlphaZero wins TCEC.

computerphage · on Dec 6, 2017

It beat the winner of TCEC-2016, Stockfish, with a record of 28-72-0. That's zero losses.

bluecalm · on Dec 6, 2017

If I run SF on my desktop computer it will kill SF run on my phone. It doesn't prove anything. Comparing TPUs and CPUs is hard but they could've at least let SF run on what is considered top of the line setup and sensible settings (1GB hash memory is very limited, 8GB is standard for rapid games on a quad core CPU, let alone 64core one).

sireat · on Dec 6, 2017

I can't figure out the reason for this stingy 1GB hash memory limit when using 64 cores. It pretty much negates advantage of 64 cores vs say 4/6 cores.

A nefarious suggestion would be that setting 1GB limit ensures that Alpha would always have the edge in depth as Stockfish would be forced to prune long lines to preserve hash memory.

Maybe someone who has read Stockfish source code can comment how Stockfish prunes hash memory.

galkk · on Dec 6, 2017

Well, the one explanation is that they wanted to win "convincingly", thus 1m per move and so low memory amount for hash.

namelost · on Dec 6, 2017

They didn't demonstrate that AlphaGo Zero can beat Stockfish in a fair contest: i.e. take the amount of money they spent on Stockfish's CPU and RAM, buy a commodity GPU for AlphaGo and then see.