Jeff Dean’s ML System Architecture Blueprint

mark_l_watson · on July 27, 2018

I work in this field, but nothing state of the art (relatively simple LSTM models and GAN models). While I found the article informative, it was also a little depressing to see how far research goes beyond what I am working on. I spend about 8 hours a week off-work-hours studying and reading papers and I find it difficult to keep up.

antpls · on July 27, 2018

I think it makes more sense to focus on the benchmarks. Benchmarks change less often than the underlying algorithms/models, and they are easier to follow.

Once a model performs consistently well on a given benchmarks over several years, then it makes sense to get more into the details.

For example in the NLP field, in 2018, there is a focus on multi-tasks models. Some studies (don't have the refs at hand, sorry) suggest that different models generalize differently (and some time better) when trained on several tasks at once.

Anyway, those papers and models are the result of team of researchers working on the problem full time, with tons of data at their hands. If any sane individuals were able to keep up with the state of the art, it wouldn't be a research field, I guess :-)

jahjaylee · on July 27, 2018

Although a neat thought, are the number of papers on ArXiv really something worth comparing to Moore's law? Like at that point, what can't you compare to Moore's law...

spullara · on July 27, 2018

I once asked Gordon Moore what the software equivalent of Moore's law was, he responded without pause: "the number of bugs doubles every 18 months".

zeusk · on July 27, 2018

Number of things being compared to moore law?

randcraw · on July 27, 2018

IMHO, the two are incomparable. The exponential growth rate of Moore's Law was driven largely by a linear rate of shrinkage in 2D, which drove up clock rates geometrically as microarchitecture component distances shrank two-fold (until CMOS' heat finally fought back). ML has no similar geometric driven basis that will continue to drive its rate of growth superlinear.

I suspect this plot is Dean's way of paying homage to Patterson, since he and Hennessy were famous for similar plots describing CPU performance in their two architecture textbooks.

lallysingh · on July 27, 2018

Each graduating PhD takes on several students. They all publish.

falcor84 · on July 27, 2018

I suspect that only a minority of ML PhD graduates stay in academia. So even though the growth is exponential, it's probably much closer to 1 than to the student:professor ratio.

joe_the_user · on July 27, 2018

Even more, an exponent increase in the number of papers on a subject might mean something but it could easily mean something other than the optimistic scenario.

kingvash · on July 27, 2018

Google really seems to be leading the pack with investments in silicon (e.g. TPU and recently announced edge tpu[1]). Other traditional silicon companies (Nvidia, Intel) seem to get it but I have yet to see investments from other tech companies (Amazon, FB, Netflix).

[1] https://techcrunch.com/2018/07/25/google-is-making-a-fast-sp...)

zeusk · on July 27, 2018

Microsoft Research is pushing along as well, they just don't publicize it as much.

Azure has already deployed FPGAs (they believe, being able to deploy and make changes including changes to workload on the fly is more beneficial than the efficiency compared to using an ASIC) for networking and accelerated ML (Project Catapult and Project Brainwave).

tbh, I do agree with using FPGAs over ASICs given the speed at which the tech is moving. Google has already cycled through 3 versions of the TPU.

ebikelaw · on July 27, 2018

The underlying operation of multiplying a shitload of numbers together hasn't changed at all. Google's revisions have essentially been more and way more multipliers, respectively.

The main problem with FPGAs is having to deal with the vendors and their evil tools. Those people have no idea what good software looks like, and no business being in my cloud. I guess if you're Microsoft and you already have demonstrated a 40-year history of having no taste in software then you'd be OK putting FPGA tooling into a datacenter. I personally wouldn't even execute that stuff in a sandbox, much less allow it to reprogram my platform.

shaklee3 · on July 27, 2018

Asic tools are very similar to fpga tools. You always simulate on fpga first before sending off the RTL for an asic.

hollerith · on July 27, 2018

and yet all 3 versions of the TPU are ASICs, not FPGAs :)

zeusk · on July 27, 2018

and? that only supports the argument of FPGAs over ASICs (unless Google is harboring some deep secret on cheap chip tapeout process).

seanmcdirmid · on July 27, 2018

I heard that Google was actually late to the game on this, not realizing the value of GPUs over (just) distributed computing when the DNN trend started. Now they even have custom silicon...

daviddumenil · on July 28, 2018

I think it was more a case that Google had a large existing investment in CPU-based compute.

A lot of Dean's initial work was getting models effectively training on that kind of hardware using distributed SGD.

typeformer · on July 27, 2018

The blueprint calls for the AI to train on a data set of everything Jeff Dean does or thinks for the entirety of his life.

typeformer · on July 27, 2018

No love for bad Jeff Dean jokes here I guess.

deboflo · on July 27, 2018

“Jeff Dean once shifted a bit so hard it ended up on another computer.”

deboflo · on July 27, 2018

“Chuck Norris can kill you. Jeff Dean can kill -9 you.“

godelmachine · on July 27, 2018

Did not understand. Please explain.

saagarjha · on July 27, 2018

kill -9 sends SIGKILL to a process, which cannot be caught and results in the immediate termination of the program (except in a couple very specific cases).

godelmachine · on July 27, 2018

Thanks :)

drewmassey · on July 27, 2018

Interesting. There is a kind of obvious conflict between cloud resources and hardware intensive applications like ML. The zeitgeist is obviously swinging.

ratsimihah · on July 27, 2018

Am I the only one thrown off by Jeff Dean wearing a suit?

thelastidiot · on July 27, 2018

It's called respecting your audience. Everyone probably looked like from a 1970s movie in the room.

ratsimihah · on July 28, 2018

Hahaha

sytelus · on July 27, 2018

First, -1 to authors for publishing paper in IEEE Macro which is behind paywall. We need to start mass boycott of all IEEE journals considering they are as bas as Elsevier but have successfully painted themselves as good guys. Number of papers in ML that I came across and behind paywall are mostly from IEEE. In any case, we expect better from authors in Google Brain to agree to publish behind any paywall!

Second, I fail to see any real takeaway or key new insight. Number of papers grows exponentially in many fields in initial periods.