“Given the long timelines of a PhD program, the vast majority of early ML researchers were self-taught crossovers from other fields. This created the conditions for excellent interdisciplinary work to happen. This transitional anomaly is unfortunately mistaken by most people to be an inherent property of machine learning to upturn existing fields. It is not.
Today, the vast majority of new ML researcher hires are freshly minted PhDs, who have only ever studied problems from the ML point of view. I’ve seen repeatedly that it’s much harder for a ML PhD to learn chemistry than for a chemist to learn ML.”
As somebody who has crossed the line between ML and chemistry many times, I would love to see: more ML researchers who know chemistry, more chemistry researchers who know ML, and best of all, fully cross-disciplinary researchers who are both masters of chemistry and ML, as those are the ones who move the field farthest, fastest.
Society is not structured to encourage this. Getting a job sooner is more lucrative. Any breakthough you make having studied for a couple of decades is property of a corporation not you.
I wish more people understood this and the value that it would add to society. For some reason most people understand how roads and transit improve commerce, but fail to understand how education and social support for the individual does as well.
Sport incentives work out nicely for getting a supply of top talent. Big money through sponsorship/ads/tickets, fairly formulaic system to get better at a sport (top coaches/teams, get fitter, practice a lot), an easy way to identify progress (wins games, scores goals).
But for sciences, someone could do great work for 20 years and produce obscure papers that might stay obscure, or may unlock the mystery of the universe or help us meet the world's energy demands without pollution! So the pay off if magnitudes higher than sport, but magnitudes less probable. Someone will run 100m fast, guaranteed, but will we solve science problems X Y Z and when?
To my understanding it'd need to be set-up so the proposition of R&D is granted by unbiased parties. I believe its been linked on HN before, where it describes what system would be best to incentivize research? It stated the dangers of funding the research of only the popular (and contemporary) topics, which tilts R&D away from discovery and more towards 'lets try to yield xyz, and if it fails, lets not release it'. Someone who is motivated to grant something must be completely separate from the publication, most likely those who researched it and many other scientists, & unlike peer review try not to trim any of it out
Present. I think there exist many of us. Chemistry is a very wide field though, so not sure if organic synthesis vs theoretical chemistry vs physical chemistry vs biochemistry will end up more useful to help tackle drug discovery problems or other chemistry applications. Same with ML I suppose; even though the specialties are less concrete nowadays, the breadth of publications has far exceeded the breadth of modern chemistry.
You could probably fit all the people who fit the last criteria in the same room (chemistry side is probably the bottleneck, especially drugs which is a effectively a specialization).
I agree with you but does anyone even recognize the last category outside blue-sky research? People have a tendency to bin other people into buckets. Being a master at 2 things means you can’t be easily placed in a typical team structure.
> I’ve seen repeatedly that it’s much harder for a ML PhD to learn chemistry than for a chemist to learn ML
I can confirm. We regularly look for people to write some computational physics code, and recently for people using ML to solve solid state physics problems. It’s way easier to bring a good physicist or chemist to a decent CS level (either ML or HPC) than the other way around.
In my experience, people with degrees in math, physics, chemistry, biology/medicine, astronomy, computer science have higher/stronger intellectual rigor that can be re-applied to other fields, e.g., finance. Also, those areas of study are much harder than economics. Partly, it is a self-selection process. Yes, there are some with economics degrees whom are very bright, but they probably could have majored in any of the sciences mentioned earlier.
The story I’ve heard is the economics undergrads can’t get into economics grad school. This is just a rumor but the sentiment is that undergrads get taught a watered down version of economics theory. Economics theory is potentially very technical and includes game theory and proofs. Even in CS, undergrads take intro theory courses and “bottom out” in their math skills, even though grad-level CS gets much more difficult. Therefore, I’d imagine the primary determinant of this rigor phenomenon is the GPA inflation of the major.
This is somewhat true - economics undergrads do get into grad school, though math+econ or an econ degree with lots of math is better. You can just the check the difference between the textbooks for undergrad and graduate economics, eg Principles of Economics by Mankiw (undergrad) and Mas Collel (grad).
That’s the opposite of what OP is observing. It’s easier to teach a domain expert “good enough” quantitative and technical skills than to teach a pure quant “good enough” domain expertise (and the corresponding intuition)
I think you’re reading this backwards, or perhaps it was edited? Mathematics is surely the more pure discipline compared with economics, so physics : CS :: mathematics : finance is the right ordering.
It's also because nobody goes to get a phd in solid state physics for the money or career prospects, at least not in the last decade. So it's a small and self selected group.
Lots of things, but mostly things like designing software architectures for massively parallel codes implementing fancy physics or coming up with ML models to predict complicated properties (things like wear, radiation, or corrosion resistance for which we just do not have any comprehensive model) or explore humongous problem spaces (things like predicting simple properties of very complex materials with 5 to 8 elements and complex microstructures like superalloys).
Neither the Physics nor the CS are cutting edge (that’s why we do not necessarily reject people with limited experience in Physics, though they need to show motivation and abilities to learn); what is is the combination of both.
We’d like to be as close to the cutting edge on ML as possible, though, because it’s a significant competitive advantage to be able to use fancy new techniques before our friendly competitors. But as I said it seems to be easier to train a Physics or Chemistry undergrad to get some feeling about how ML works than to train a CS undergrad to have some intuition about the Physics. And intuition is critical to detect when models hallucinate and get off the rails.
Damn, my contact at LSU has moved. I know there’s a good group at UT Knoxville where they do great stuff with Oak Ridge’s neutron source but that’s not specifically ML. There are adverts regularly for post-docs at Los Alamos; they can be found on lanl.jobs . They work with various universities depending on the group and might point you to something.
That's because application and research are quite different. If one does a PhD in ML they learn how to research ML. Someone with a PhD in chemistry learns how to research chemistry, they only need to apply ML to that research
Back when O'Reilly was still hosting events (sigh), at one of their AI conferences, someone from Google gave a talk about differences between research/academic AI and applied AI. I think she had a PhD in the field herself but basically she made the argument that someone who is just looking to more or less apply existing tools to business or other problems mostly doesn't need a lot of the math-heavy theory you'll get in a PhD program. You do need to understand limitations etc. of tools and techniques. But that's different from the kind of novel investigation that's needed to get a PhD.
Lol.
With the exception of niche groups in compressed sensing, math doesn't get too hard. Furthermore, ML isn't math driven in the sense people are trying things and somebody tries to come up with the explanation after the fact.
Well I think the issue is more of if you’re Genentech and you need ML people and can’t afford to pay them you’re probably better off retraining chemistry PhDs.
I think you missed my point. Genentech, AFAIK, was not doing research on machine learning as in the principles of how machine learning works and how to make it better. They do biotech research which uses applied machine learning. You don't need a PhD in ML to apply things that are already known
As a PhD student working on core ML methods with applications in chemistry, I second this. During my PhD, I read very few papers by chemists that were exciting from a ML perspective. Some work very well, but the chemists don't even seem to always understand why they made the right choice for a specific problem.
I don't claim that the opposite is easy either. Chemistry is really difficult, and I understand very little.
Genentech has several ML groups that do mostly applied work, but some do fairly deep research into the model design itself, rather than just applying off-the-shelf systems. For example, they acquired Prescient Design which builds fairly sophisticated protein models (https://nips.cc/Conferences/2022/ScheduleMultitrack?event=59...) and one of the coauthors is the head of Genentech Research (which itself is very similar to Google Research/Brain/DeepMind), and came from the Broad Institute having done ML for decades ('before it was cool').
I can't say I know anybody there who is doing what I would describe as truly pure research into ML; it's not in the DNA of the company (so to speak) to do that.
Sadly a lot of foundational ML research works for single-label image classification and not much else. ImageNet is a niche problem and way too much ML research is over-indexed on it. If you can make your problem look like ImageNet, you're going to do OK, but if not you effectively need to re-invent the wheel...
The HEAR benchmark is a great eye-opener. They have basically three classes of audio tasks, and find very different models excel in each, with the best overall models being kinda-mindless ensembles of the ones that do well on particular problems.
So if you've got something that works well for text... it'll take a couple years and maybe an entire new branch of research (diffusion!) to work well for image generation. I have no idea what generative models for chemistry will look like, but will happily bet that it takes some significant specialized effort.
1. The era of ML benchmarks is ending. New models have to be and will be evaluated the same way human experts are evaluated.
2. Foundational models are becoming multi modal. There will be no separation of text and image generation. Sure, different methods will be used for each, but the learned representations of visual and textual objects in models like Stable Diffusion already live in the same conceptual space.
I don’t think there will be specialized generative models for chemistry two years from now. There will be GPT-5 (and similar competitors) which will be used to perform all kinds of research, including chemistry.
For example, AlphaFold just fundamentally isn't a language model, but is fundamentally useful. We'll still need these models that Do Stuff in many areas, and that will still involve benchmarks... Even if we're able to ask GPT-N+1 to design the next version of the model for us.
I’m pretty sure the author is implying that the new crop of ML PhDs are just not a smart group of people - at least the level of intelligence required to do truly transformative things with ML in any field.
I think what you’re saying is a commonly found attitude that relates to this topic: it’s pretty limiting to think a cursory knowledge of a field is sufficient to go change it. That’s likely why most “use ML to solve x” projects fail when some like AlphaFold succeed because the ML engineers truly understood the fundamental tenets of the topic and exploited it.
I don’t understand what you mean. Here’s how many applied ML papers work: create a new dataset for a novel problem, download a PyTorch model, point model at dataset directory. Is it novel? By construction. Is the ML technique novel? No.
Not so dramatic, but I have a traditional CS background and got into ML much later in my career.
Last year my company formalized the process to hire ML Engineers. The interview is the same format as the software engineer round, but with an extra theoretical ML round.
I've observed two distinct groups of candidates in the process. One are recent grads with PhDs in ML. Other are people from diverse backgrounds that happened to start working in ML in their current job. The first group tends to excel in the ML theory part, but flunk the leetcode style coding questions. The second group tends to do better in coding, but do worse in the theory part. This is exactly what the process has been designed to do.
I am very put off from looking for ML Engineer positions in other companies if they follow a similar process. I know I would fail the interview for my current job. They could get away with it for a while, but I doubt it's sustainability or desirability in the long term.
With respect to more mature research fields the entry point to ML is much lower.
Hence I always recommended people to have major in Physics, Chemistry, Biology etc but look for projects in these fields that could benefit in ML (I have a number of them about Physics).
So that argument was not novel.
But the fact that pure ML PhDs will have significantly lower multidisciplinary knowledge is a good one. It could be compensated by the fact that ML is growing fast and all kinds of people join the ride, but still.
>I’ve seen repeatedly that it’s much harder for a ML PhD to learn chemistry than for a chemist to learn ML.
Perhaps this is selection bias. Among all the chemists, the ones who will dabble in ML will likely be the chemists with the highest ML related aptitude. In contrast, a ML expert on a chemist project is more likely not being internally driven to explore it but instead has been assigned the work, which means that there is less bias in selection and thus they have less chemistry aptitude.
CS is unusually easy to learn on your own. You can mess around, build intuition, and check your progress—-all on your own and in your pyjamas. It’s easy to roll things back if you make a mistake, and hard to do lasting damage. There are tons of useful resources, often freely available. Thus, you can get to an intermediate level quickly and cheaply.
Wet-lab fields have none of that. Hands-on experience and mentorship is hard for beginners to get outside of school. There are a few introductory things online, but what’s the Andrew Ng MOOC for pchem?
This is more about ML than CS. ML is fundamentally about developing general-purpose algorithms applicable to a wide range of problems. If your job is using ML to solve problems in chemistry, it's more about chemistry than ML, and a chemistry background is more important than an ML background. It's unlikely that you have to develop novel ML methods for the problems you are facing.
I've seen the opposite in bioinformatics. While dedicated bioinformatics programs are now common, you still see many CS / mathematics / statistics / physics / EE people moving to bioinformatics after bachelor's / masters's / PhD / postdoc. In some bioinformatics jobs, you often have to solve new computational problems, and it's easier to teach enough biology to people with a methodological background than the other way around.
I've seen both in bioinformatics, because the field is so wide now.
1) Bioinformatics as tool-building, algorithm-dev: you're right, you don't need to know much biology there if the problem is defined well.
2) Bioinformatics as a tool to answer biological questions: here I've seen ML-background people really struggle, either developing stuff that's not useful or reinventing-the-wheel-but-now-it's-deep-learning. I've seen ML people present their fancy plant disease image detector which turned out to be pretty good at spotting 'yellow' - very good at training accuracy and benchmarks, does not add anything to what people in the field are doing.
In regards to 2), it sounds a bit directionless to be proposing stuff people don't need? Isn't that more of a problem of selecting relevant problems to solve, and getting supervision on your ideas?
Yeah that's the problem! The ML people and the biology or agriculture people don't talk, they're not in the same building. A biologist might see the ML-person's work only after it's published.
On the flip side, software development and engineering rigor is largely absent in academics, as has been discussed previously here on HN. This is enough of an issue to make replicating research even when the code and data are provided, but it's an even bigger issue trying to turn academic code into a product.
Chemistry is a centuries old discipline, that people study undergrad a full four years before getting a PhD in the field of chemistry.
ML is a, practically speaking, 15 year old field that PhDs often begin to study after a couple of AI courses in undergrade and a specific track in grad school (while they study other parts of CS as part of their early graduate CS work).
There's just way less context in ML than Chemistry.
I don't think that's really true, but you don't need the full context in ML in order to apply it to your field of interest. Just like you don't need the full context in Physics to apply it to an engineering problem.
Some of the most successful ML researchers have read several decades of research papers. It's not uncommon to see references to papers from the 70s or 90s.
Edit: No doubt that the relevant parts of stats, random matrix theory, and ML is a newer field than Physics or Chemistry, though.
> realistically I don’t think you can even generously consider anything prior to perceptrons
I mean, sure, but for the practical reason that perceptrons (proposed 1943, implemented 1958) are about as old as digital computers, and ML as a field is tied up pretty strongly with the existence of computers.
I wasn’t comparing ML to digital computers though — but rather chemistry. Theory of computation and algorithms (bedrocks of CS) far predate digital computers.
ML is so much more than DL / NNs. I recommend that you open a book like Bishop or ESL, the overwhelming majority of ML has been non DL going back to the 70s.
Most of that I learned doing non-ML stats work. But regardless, my point was the history of it is far less than chemistry. Let’s say ML started at the turn of the 1990s to be generous. Still much younger than chemistry.
> I’ve seen repeatedly that it’s much harder for a ML PhD to learn chemistry than for a chemist to learn ML.
Haha, I've seen that for so many topics. "It's much easier for someone used to circuit switched phone networks to learn IP than the other way around", says the person who started with circuit switched.
I just thought "dude, you're literally the worst at IP networking that I've ever met. Your misunderstandings are dug into everything I've seen you do with IP".
Ironically, in my field I'd argue this isn't necessarily true - so long as the ML PhD sees value in that field.
Part of this is because public health as an undergraduate discipline is extremely new, so the field is used to having to teach Folks From Elsewhere about our field, rather than the fields built on the assumption of a large foundation of undergraduate coursework.
Interdisciplinary and intersectional skillsets are critically valuable.
Separation of concerns especially at the beginning innovation stages can be more of an inhibitor than accelerator of success. Scaling and growth is another thing.
A similar pattern existed in the late 90s with web developers coming from many different industries with their domain knowledge and domain insight.
The code and frameworks were early but the insights of what was a problem most pressing to solve.
An introductory book on neural networks, written a few decades ago, starts with a short history of the field at that time.
This included a phase where some physicists began having opinions about the subject, anticipating they might quickly find a model for the brain, with fields, or other physics-like paradigms.
Their expectations were that with their superior understanding of all things fundamental, they would rush in, and rush out. Leaving the stunned machine learning researchers dazzled, frazzled, and asking "Who was that masked physicist?!?"
Except their ideas went nowhere.
I can't find the book, but this story made for a memorable foreword.
That's a great xkcd, but there are 2 upsides to this arrogant approach.
First, arrogance is a nerd-snipe maximizer.
Second, there is a small chance you're absolutely right, and you've just obviated a whole field from first principles. It doesn't happen often, but when it does happen and there is no clout like "emporer's new clothes" clout.
EDIT: The downside, of course, is that you appear arrogant, and people won't like you. This can hurt your reputation because it is apparently anti-social behavior on several levels. I think its fair to call it a little bit of an intellectual punk rock move that is probably better left to the young. It's an interesting emotional anchor to mapping a new field, though.
> Second, there is a small chance you're absolutely right, and you've just obviated a whole field from first principles.
Mostly when I read about things like this happening, it's happening to a formerly intractable problem in mathematics. Do you have examples outside of math?
Alfred Wegener as the initial author on the theory of plate tectonics comes to mind. He was a trained meteorologist who observed the similarities between geological formations between the South American east coast and African west coast. He was lucky, in that his father in-law was a prominent geologist and helped him defend this thesis.
Oh yeah, revolutionary insights are very important for the advancement of knowledge and the elimination of wrong ideas. But as you wrote, this was the work of a thesis, not a random commenter from another field.
Biology has already been transformed by mathematics, statistics, and CS for decades. These days, if something like in that xkcd happens, those people are probably not even familiar with the relevant parts of math / stats / CS.
What exactly has it transformed? Nothing has fundamentally changed in biology, you still have to run western blots and give mice cancer. A great example of “math” solving biology is super resolution - which was just a dud.
Genomics, for example. The entire field could not exist without extensive algorithmic research done over several decades. Even today, many of the key people in the field have a background in CS and mathematics.
Theoretical ecology is a bit more old-school answer. Many mathematicians have been involved in that field.
I am not sure I have seen any meaningfully brilliant breakthroughs in those fields though.
I take that back. One can never forget the brilliance of HiC by Erez. But other than that TBH all other “mathematical” or computational breakthroughs I’ve seen are just at best meticulous application of obvious math and computational algorithms to biological problems (and may I say poorly? Thinking back to the microarray nightmare years).
I guess that depends on what you count as a breakthrough. From my perspective, HiC is a small detail, and even AlphaFold is just the latest improvement to an existing process. A true breakthrough would be something like ancient DNA, which gave the humanity a new tool for studying the past.
Pretty much everything in genomics depends on shotgun sequencing, which in turn depends on CS. The algorithms used for assembling something useful out of the sequence reads are highly non-obvious. In fact, most developments in string algorithms since the 80s have been driven by the needs of DNA sequencing. (Information retrieval used to be another contender, but word tokens turned out to be a better tool for that field.)
Actually most applied physicists like myself go down that path cause we're pretty efficient, lazy folk & skip through as fast as possible--I call it the principle of maximum laziness.
I work for Google Brain. I remember meeting Brian at a conference and I have nothing but good things to say about him. That said, I think Brian is underestimating the extent to which the Brain/DeepMind merger is happening because it's what researchers want. Many of us have a strong sense that the future of ML involves models built by large teams in industry environments. My impression is that the goal of the merger is to create a better, more coordinated environment for that kind of research.
> Many of us have a strong sense that the future of ML involves models built by large teams in industry environments
The gradient of current moment is that whatever approach is optimized to use more data and more compute is much easier to invest in than something which can do more with less, but with a significant number of possible dead-ends.
At some point, this will have diminishing returns, but until that is hit, this makes sense as a purely return-on-investment for both research progress and business returns.
The goal of the merger is for execs to look like they are doing something to drive progress. Actual progress comes from the researchers and developers.
Well, where exactly is this progress?
Where is Google's answer to GPT-4? Why weren't the 'researchers and developers' making a GPT-4 equivalent?
Turns out you sometimes you need a top down, centralised vision to execute on projects. When the goal is undefined, you can allow researchers to run free and explore, now its full on wartime, with clear goals (make GPT-5,6,7....).
Google is fundamentally allergic to top-down management. Most googlers will reject any attempt to be told what to do as wrong, because lots of IC's voting with their feet are smarter than any (google) exec at figuring out what to do.
Last time Google got spooked by a competitor was Facebook, and they built Google Plus in response. We all know that was an utter failure. Googlers could escape that one with their egos in tact because winning in "social" is just some UX junk, not hard-core engineering like ML.
It's gonna be super hard for them to come to grips with the fact that they are way behind on something that they should be good at. Plan for lots of cognitive dissonance ahead.
I don't think they are behind just because they have released less stuff. They had LaMDA way before ChatGPT, and they had multimodal models (both by DeepMind and by Google Brain) well before OpenAI.
They are behind because what they have released sucks. Have you tried Bard? It's dumb. Like you're talking to some 20th century gimmick dumb. GPT4 is far from perfect, but when it makes mistakes and you point them out, it understands and tries to adapt. Bard just repeats itself saying the same stupid things like a casette-tape answering machine.
If you ask a googler about this, they typically assume GPT is just as stupid as bard. Or say something like "so GPT is just trained on more data - we can do that." As if nothing's wrong.
Bard uses a smaller model currently, which was announced before release.
> We’re releasing it initially with our lightweight model version of LaMDA. This much smaller model requires significantly less computing power, enabling us to scale to more users, allowing for more feedback. [1]
> Bard is powered by a research large language model (LLM), specifically a lightweight and optimized version of LaMDA, and will be updated with newer, more capable models over time. [2]
"We are releasing something that's too underdeveloped, because Wall Street demands we release something. We're using a smaller training set because we have to rush to market with our smaller, stupider model ASAP."
I remember the CEO of google saying that more capable models will be releases a few months (a month?) ago already. Either they delayed or the model is just as bad (maybe they released after the announcement of coding in 20 languages?)
There is a first mover handicap there though. TF1.0 included a bunch of things that were harder to understand like tf.Session(). PyTorch was inspired from the good parts and "we will eager-everything". Internally I'm sure there was a lot of debate in the TF team that culminated with TF2.0, but by that time the damage was done and people saw PyTorch as easier.
Nope, Pytorch was inspired by the Lua version of Torch which well predates Tensor flow. To be fair, basically every other DL framework made the same mistake though.
Also, tensorflow was a total nightmare to install while Pytorch was pretty straightforward, which definitely shouldn't be discounted.
> Also, tensorflow was a total nightmare to install while Pytorch was pretty straightforward, which definitely shouldn't be discounted.
I think this is a very important point, and I remember sweating blood trying to build a standalone tf environment (admittedly on windows) in the past. I'm impressed by how much simpler and smoother the process has recently become.
I do prefer Keras to Pytorch though - but thats just me
tensorflow was a total nightmare to install while Pytorch was pretty straightforward
Hat tip for this comment. On HN, I read some great commentary about "time to achieve first HTTP 200 with your REST API". Regarding installed software libraries, lower friction to achieve "Hello, World!" is important.
PyTorch examples were also cleaner. torchvision had ResNet training batteries included while TF had role your own or clone some weird Keras repository.
I think the main problem was debugging tensors on the fly, impossible with TF/Keras, but completely natural to PyTorch. Most researchers needed to sequentially observe what is going on in tensors (histograms etc.) and even doing backprop for their newly constructed layers by hand and that was difficult with TF.
Nah, TF has had dynamic execution since TF2 and it’s still losing users, it seems. The execution model and API is simply more complicated. What’s a session, placeholder, constant, tensor, …? PyTorch was sold as numpy with GPU support and it is pretty close to that. JAX is an attempt to approach language simplicity and purity.
I have used both but ended up dropping TF for PyTorch after 2018. Mainly it was the larger PyTorch ecosystem in my field (NLP) and clear API design and documentation that did it for me.
However, TF was still a valid contender and it was not clearcut back in 2016-17 which framework was better.
I can speak from experience on this. Getting started with TensorFlow was very complicated with sparse documentation, so we dropped the idea of using it.
You are not correct about TPUs being drastically better for GPUs at this. If you look at public benchmarks, both have a similar cost per hardware flop ($0.88/hr for 312tflops A100 on GCP, $0.97/hr for 275tflops TPUv4) and both achieve similar model flop:hardware flop ratios (40-60%).
An existence proof that GPU mega-clusters are possible is that GPT-4 cost ~$100m over ~3 months, so ~100m a100-hours / (3 months * 30 days/month * 24 hours/day = 2160 hours) = ~45k a100s collaborating, which is the equivalent of ~10 TPUv4 pods on a single training run.
I assume/suspect internally Google has v5 already.
The thing about TPU clusters that they have hyper-torus optical interconnect between TPUs. This allows for extremely efficient weight updates. To replicate this with A100s you need very custom hardware/software deployment.
But to be fair, I don’t know what is latest and greatest available from NVidia or other clouds in this area right now.
EDIT: Looks like NVidia has NVSwitch, which provides interconnect for 256 GPUs. Pretty cool!
External NVSwitch (for more than 8 GPUs) isn't available for purchase yet. But even now you typically buy your A100 or H100 training GPUs in servers where each GPU has an individual 400 Gbps Mellanox networking adapter, then connect them in a sophisticated switched network that provides full throughput between all GPUs of the cluster. The deployment is not that custom, typically you would let Nvidia and your vendors implement their reference design for you: https://www.nvidia.com/en-us/data-center/dgx-superpod/ Then run an open-source software stack on top of CUDA etc.
I don't know how well the TPU hyper-torus interconnect performs, but the networking topology seems to be less general than switched NVLink or InfiniBand.
How is network adapter connected to GPU? They just seat on the same bus?
It looks like in TPU v4 cluster each pod with 2 or 4 (?) TPUs has 6 optical interfaces, which directly connect to next pods. I have no idea how they route though this configuration, but my guess most messages are weight updates, which are essentially broadcasts, so it should work out fine with some basic forwarding.
The torus topology makes sense for ring algorithms: Allreduce, Allgather, Reducescatter. For purely data parallel training you could put all model replicas into the same ring (although Nvidia also uses hierarchical algorithms that benefit from lower lately). With added model parallelism one will need smaller rings running concurrently. I guess the TPU cluster layout will then put constraints on the most efficient model architectures (as does the network topology of a GPU cluster).
TF in its first version was stellarly misdesigned. It was infuriatingly difficult to use, particularly if you were of the "I just want to write code and have it autodiffed + SGDed" school, I found it crazy to use Python to manually construct a computational graph...
You need something to construct a graph. Why not pick a well-known language already used in scientific computing and stats/data science? The other options are: pick a lesser known language (lua, julia) or a language not traditionally used for scientific computing (php, ruby), or a compiled language most researchers don't know (C++), or a raw config file format (which you would then use code to generate).
What's really crazy is using Pure, Idiomatic Python which is then Traced to generate a graph (what Jax does). I want my model definitions to be declarative, not implict in the code.
TF1 was pretty rough to use, but beat the pants off Theano for usability, which was really the best thing going before it. Sure it was slow as dirt ("tensorslow") even though the awkward design was justified on being able to make it fast. But it was by far the best thing going for a long time.
Google really killed TF with the transition to TF2. Backwards incompatible everything? This only makes sense if you live in a giant monorepo with tools that rewrite everybody's code whenever you change an interface. (e.g. inside google). On the outside it took TF's biggest asset and turned it into a liability. Every library, blog post, stackoverflow post, etc talking about TF was now wrong. So anybody trying to figure out how to get started or build something was forced into confusion. Not sure about this, but I suspect it's Chollet's fault.
Sure, my point is that only a googler would even consider this kind of breaking change as a sensible option. People in the real world with regular code tooling would reject the proposal before it got started.
The analogy to Angular that others have made is spot on. It's not just first-mover disadvantage. Google has particular blind spots for certain pain points, like deprecating APIs. Also q.v. Google Cloud.
Every attempt at a "clean break" new version of a commonly-used platform leads to such long-term weakness, yet the temptation to piggyback off of the mindshare/existing branding forces companies to avoid calling it a new platform.
The APIs were messed up early on, which is a reason TF2 happened. Every team started making their own random implementations of stuff. You had the TF Slim API, you had Keras, etc. The API just got fatter and fatter and then libraries would make cross dependencies to bake in the API mistakes.
> Unfortunately, this early lead would be completely squandered within a few short years, with PyTorch/Nvidia GPUs easily overtaking TensorFlow/Google TPUs. ML was, and frankly is, still too nascent to have significant technical barriers to entry. The sustained eye-popping funding for AI companies generated a surge in supply, with the number of ML researchers growing ~25% YoY for the past decade. I taught myself enough ML to blend in with the researchers at Brain over a relatively short 2 years, and so have many others. Nobody, not even Google, can afford to throw money into a bottomless pit.
Perhaps NVidia is close now. A bit hard to say without specific hardware info.
Google’s were already available 5-6 years ago. And probably current versions are even faster. They have super fast optical interconnects in torus or hyper-torus configuration that allow synchronous weight updates on 1k+ TPUs. This leads to dramatically lower training times and less noise, which leads to better-performing models. I.e. you can’t even train model to the same level on traditional GPUs.
Once they started to get deployed, models that trained for 3 weeks on 30 GPUs were trained in 30 minutes on 1k TPU cluster.
All this reiterated main point in the article - Google had tremendous lead and wasted it due to the lack of vision and product execution ability.
because once Jeff Dean had solved Google's maslow problems (scaling web search, making ads profitable, developing high performance machine learning systems) he wanted to return to doing academic-style research, but with the benefit of Google's technical and monetary resources, and not part of X, which never produces anything of long-term value. I know for sure he wanted to make an impact in medical AI and felt that being part of a research org would make that easier/more possible than if he was on a product team.
I generally agree with this though with some tweaks. I think Jeff wanted to do something that he thought was both awesome (he's liked neural networks for a long time - his undergrad thesis was on them) and likely to have long-term major impact for Google, and he was able to justify the Awesome Thing by identifying a way for it to have significant potential revenue impact for Google via improvements to ad revenue, as well as significant potential "unknown huge transformative possibilities" benefits. But I do suspect that you're right that the heart of it was "Jeff was really passionate about this thing".
Of course, this starts to get at different versions of the question: Why did Google Brain exist in 2012 as a scrappy team of builders, and why did Brain exist in 2019 as a mega-powerhouse of AI research? I think you and I are talking about the former question, and TFA may be focusing more on the second part of the question.
[I was randomly affiliated with Brain from 2015-2019 but wasn't there in the early days]
It grew from the scrappy group to the mega-powerhouse by combining a number of things: being the right place at the right time with the right resources and people. They had a great cachet- I was working hard to join Brain in 2012 because it seemed like they were one of the few groups who had access to the necessary CPU and data sets and mental approaches that would transform machine learning. And at that time, they used that cachet to hire a bunch of up and coming researchers (many of them Hinton's students or in his sphere) and wrote up some pretty great papers.
Many people outside of Brain were researchers working on boring other projects who transferred in, bringing their internal experience in software development and deployment, which helped a lot on the infra side.
I agree that the OP makes a bunch of interesting points, but I think historically Brain really grew out of what Dean wanted to do and the fact that he wanted it to be full-stack, e.g. including the TPU. Also, crucially, Brain would use Google data and contribute back to Ads/Search directly versus Google X which was supposed to be more of an incubator.
But it's also notable how the perspective of an ex-Brain employee might differ from what sparked the Brain founders in the first place.
Waymo is kind of like DeepMind - they're costing Alphabet billions of dollars a year for a decade+ with no appreciable revenue to show for it, but they're working on something neat, so surely it must be good?
X exists as a press-release generation system, not as a real technology creation system. They onboard many impractical projects that are either copies of something being done already in industry ("but with more Google and ML!") or doesn't have a market (space elevators).
Waymo has developed the modern autonomous vehicle from the ground up. It's basically a matter of scale now. It's a mindblowing tech stack. The first time riding in one is much more otherwordly than using GPT for the first time. The value of the technology is far greater whatever PR they have generated (not many people know about it)
I have infinite respect for the process that Waymo followed to get to where they are. And I'm impressed that Google continued to fund the project and move it forward even when it represents such a long-term bet.
but it's not a product that has any real revenue. and most car companies keep their distance from google self-driving tech because they're afraid. afraid google wants to put them out of business. It's unclear if google could ever sell (as a product, as an IP package, etc) what they've created because it depends so deeply on a collection of technology google makes available to waymo.
I was just disputing "X exists as a press-release generation system, not as a real technology creation system." Definitely agree the path to profitability will be tough.
Controvery of what? Did you read Gebru's paper? For instance, her calculation of carbon footprint of training BERT assumes that companies will train BERT 24x7. Gebru is a disgrace to the community because she always, I mean literally always, attacks her critics by motives. You think bias is a data problem? You're a bigot (See her dispute with LeCun). You disagree with my assessment on an ML model? You are white male oppressor (her attacking a Google's SVP).
Gebru is not a researcher. She is a modern-age Trofim Lysenko, who politicizes everything and weaponizes political correctness.
Ok but the lack of underrepresented minorities in the field and the important role people like Gebru played in extending the political and status of minorities is ok to extinguish? We need more than just white male / Chinese male / Indian male monoculture “STEM lords”. This is already recognized in fields like medicine, where minorities treating minorities results in better outcomes and the greater push to open positions of status to minorities.
> important role people like Gebru played in extending the political and status of minorities is ok to extinguish?
No, she didn't. Attacking everyone for baseless motives and identities is the worst kind of activism. She alienated people by attacking them without basis. She disdained those who truly fought for the fairness and justice of every race. She left a bad taste in people who truly cared about progress. Yes, it's totally worth "extinguishing" her role, as her role is nothing but a political charade.
As for under-repented minorities, do you even know the Chinese Exclusion Act? Do you know how large the pipeline of the STEM students in different races and why there was a gap? Do you know why the median GPA of the high school students in the inner city was 0.5X out of 4? Why was that? The questions can fill a book. Yeah, activism is easy, as long as you have the right skin and shameless attitude. Solving real problems is hard.
Take a look at this lecture about the history of slavery and racism in the US then [1]. This will provide an answer many of your questions. Dismissing a prominent PoC’s position for being outspoken and critical of existing unfairness really demonstrates who is maintaining political power here. Do you really think that centuries of unfairness and trauma that still is perpetuated today will be swept under the rug by your fantasy of “equivalent” treatment? Equity is about Justice which is fundamentally about ethics. So tell me why are you so rooted and uncomfortable with confronting historical injustice?
>Ok but the lack of underrepresented minorities in the field and the important role people like Gebru played in extending the political and status of minorities is ok to extinguish?
Yes it's okay to extinguish it if hiring underrepresented minorities means hiring bad actors like her who contribute nothing of value. Scientific truth is scientific truth; if you hire people for the color of their skin or their sexuality instead of their ability to produce truth, you slow the progress of science and make the world worse for everyone.
This seems like a pretty bad faith argument that illustrates exactly the point the parent comment was making. Firing Gebru for insubordination is not "extinguishing" anything, it's getting rid of an employee that was actively taking pot shots at the company in her paper and somehow equated getting fired with anti-minority bias. In practice, Google is already much more tolerant of activism than the average tech company and she was unable to play by the corporate rules.
I personally believe that racial or diversity quotas are even more racist or sexist. We should expect minorities to develop their own culture of intellectual excellence. After all, they are no longer children. Giving them a shortcut is a form of insult. Providing someone an advantage based on their race or sex at the expense of someone else who is more qualified due to their race or sex is nonsensical. Companies may fail as a result of such practices. Ultimately, what truly matters is how innovative and efficient a company is.
That's a very simplified version of the story, but I would say that Dean greatly reduced his stature when he defended Megan Kacholia for her abrupt termination of Timnit. Note that Timnit was verbally abusive to Jeff's reports, anybody who worked there could see what she was posting to internal group discussions, so her time at Google was limited, but most managers would say that she should had at least been put on a pip and given 6 months.
Dean has since cited the paper in a subsequent paper (which tears apart the Stochastic Parrots paper).
Google has since fired other folks on her team and was in crisis mode to protect Dean. Like, I’m not really going to give them the benefit of the doubt on this.
When people brought Dean up Timnit came up as something to consider, it’s interesting to see how all anyone has to say in these threads is reverence towards him. People should try to see the whole picture.
Timnit and the other ex-ML ethics crowd who got fired from Google seem like some of the most ignorant people around. I don't defend Dean reflexively, it just seems like he is on the right side of the issue. For example, here is Emma Strubell accusing Dean of creating a "toxic workplace culture" after he and David Patterson had to refute her paper in print.
The thing is if David Patterson and Jeff Dean think your numbers for the energy cost of machine learning might be wrong, then you are probably wrong. These ML meta-researchers are not practitioners and appear to have no idea what they are talking about. Keeping a person like Timnit or Strubell on staff seems like it costs more than its worth.
Timnit is ex-Google, but very much not ex-ML ethics (fouded Distributed AI Research Institute focussed on the field in late 2021). Very much also true of Margaret Mitchell, who has been at Hugging Face since 2021.
Being somewhat involved in one bad thing doesn’t justify cancelling someone.
To my knowledge Dean was essentially doing post-hoc damage control for what one of the middle managers in his org did. Even if they did want Timnit gone (as others mention, you are getting only one side of the story in media) they did it in a bad way, for sure. At the same time I don’t think one botched firing diminishes decades of achievements from a legitimately kind person.
> Also not surprised at the immediate down votes for questioning Googles new AI lead!
That's because you are wrong to pretend he did anything wrong by firing T.G. And also, because you added this weird lie/mudslinging/whatever on top of it:
These safety people guarantee a useless product that never does unsafe things. ChatGPT proved that you can have a product do unsafe things and still be useful if you put a disclaimer on it. Overall, as a user, I couldn't give a damn if things are unsafe by the definition of this style of ethicist. They were a ZIRP and my life is better for their absence.
She appears to be a symbol for everything that went wrong at Google. These are the kind of problems that arise when life is too easy, just before the downfall. In other words, decadence. How else can one explain that Google's AI research was dethroned by OpenAI?
Yeah, Dean's fault is hiring such people in the first place. If you hire an activist, you get activism. And if you hire someone whose livelihood depends on finding more problems, well, they will scream more problems, one way or another. Otherwise, why would state U of Mich got one DEI officer per three staff members?
A lot of people, especially on hacker news, feel disdain for researchers of ethics, bias and fairness, as they are perceived as both holding technology back and profiting from advances in it (that they can then analyse and criticize).
I don't think you're necessarily wrong in your assesment about HN and AI enthusiasts, but also in this case I think it's more accurate to talk about a Twitter agitator and race-baiter [1], rather than a "researcher of ethics, bias and fairness".
Google has good engineers and a long history of high throughput computing. This, combined with a lack of understanding what ML research is like (versus deployment), led to the original TF1 API. Also, the fact that google has good engineers working in a big bureaucracy probably hid a lot of the design problems as well.
TF2 was a total failure, in that TF1 can do a few things really well when you get the hang of it, but TF2 was just a strictly inferior version of pytorch, further plagued by confusion due to TF1. In alternate history, if Google pivoted in to JAX much earlier and more aggressively, they could still be in the game. I speak as someone who has at some point knew all the intricacies and differences between TF1 and TF2.
> google has good engineers working in a big bureaucracy probably hid a lot of the design problems as well.
I feel like this is true of every google product in the last decade, maybe more. Their customer products, but especially their dev tools like Angular and k8s screams "We were so preoccupied with whether we could, we didn’t stop to think if we should."
> it is becoming increasingly apparent to Google that it does not know how to capture that value
To paraphrase, its the business model, stupid.
Inventing algorithms, building powerful tools and infrastructure etc is actually a tractable problem: you can throw money and brains at it (and the latter typically follows the former). While the richness of research fields is not predictable, you can bet that the general project of employing silicon to work with information will keep bearing fruits for a long time. So creating that value is not the problem.
The problem with capitalizing (literally) on that intellectual output is that it can only be done 1) within a given business model that can channel effectively it or 2) through the invention of totally new business models. 1) is a challenge: These billions of users on which AI goodies can surface are not customers, they are product. They don't pay for anything and they don't create any virtuous circle of requirements and solutions. Alas, option 2) inventing major new business models is highly non-trivial. The track record is poor: the only major alternative business model to adtech (cloud unit) was not invented there anyway and in any case selling sophisticated IT services whether to consumers or enterprise is a can of worms that others have much more experience in.
For a industrial research unit to thrive, its output must be congruent with what the organization is doing. Not necessarily in the detail, but definitely in the big picture.
Seems more than a nitpick to me. I find the essay interesting but this line raised some distrust in me. How can someone have these deep insights into Google's ML strategy and the evolution of the field and simultaneously think LSTMs were invented by Google in 2014?
>How can someone have these deep insights into Google's ML strategy and the evolution of the field and simultaneously think LSTMs were invented by Google in 2014?
It may not have been accidental; there's a deliberate movement among some people in the ML community to deny Jürgen Schmidhuber credit for inventing LSTMs and GANs.
It's become somewhat of a meme, where Schmidhuber seemingly tries to claim credit for nearly everything. I _think_ it's because he published ideas back in the 90s or so that weren't fully executable/realized at the time, and later people figured out how to actually flesh them out do it, and supposedly didn't cite him appropriately/enough. Often the ideas weren't exactly the same - but rather he claims they're derivatives of the concept he was going for.
Just to make it clear, you're hearing one side of the argument. There are also a lot of people who firmly believe that Schmidhuber was wrongfully deprived of recognition he deserved.
I don't have a strong opinion one way or another, but the issue isn't as clear cut as your parent comment might have made it sound like.
My theory is that broadly, tech learned not to act like Microsoft in the 90s -- closed off, anti-competitive, unpopular -- but swung too far in the opposite direction.
Google has been basically giving away technology for free, which was easy because of all the easy money. It's good for reputation and attracting the best talent. That is, until a competitor starts to threaten to overtake you with the technology you gave them (ChatGPT based on LLM research, Edge based on Chromium, etc.).
Ehh, I mildly disagree. I'm not entirely bought-in on the notion that giving one's technical innovations for free is obviously the right move, but I don't think it's why the company is in trouble.
Chrome is experiencing unprecedented competition because it faltered on the product. Chrome went from synonymous with fast-and-snappy to synonymous with slow-and-bloated.
Likewise Google invented transformers - but the sin isn't giving it away, it's failing to exercise the technology itself in a compelling way. At any moment in time Google could have released ChatGPT (or some variation thereof), but they didn't.
I've made this point before - but Google's problems have little to do with how it's pursuing fundamental research, but everything to do with how it pursues its products. The failure to apply fundamental innovations that happened within its own halls is organizational.
Well, Google could have easily not have shared its technology.
However, the bloat problem you’ve described are difficult problems to solve, and are to some degree endemic to large businesses with established products.
> "Well, Google could have easily not have shared its technology."
Sure, but the idea is that if they didn't share their technology, they'd still be in the same spot: they would have invented transformers and still not shipped major products around it.
Sure maybe OpenAI won't exist, but competitors will find other ways to compete. They always do.
So at best they are very very slightly better off than the alternative, but being secretive IMO wouldn't have been a major change to their market position.
Meanwhile, if Google was better at productizing its research, it matters relatively little what they give away. They would be first to market with best-in-class products, the fact that there would be a litany of clones would be a minor annoyance at best.
Also, these things are not secret for a very long time and eventually other people figure out (completely independently or maybe even influenced by some google's ex-employee). Then, they wouldn't even be able to say they invented it.
True, but they only feel the fire now, and you can tell they’re rapidly trying to productionalize stuff like you’ve described. It will take time though.
"I’ve seen repeatedly that it’s much harder for a ML PhD to learn chemistry than for a chemist to learn ML. (This may be survivorship bias; the only chemists I encounter are those that have successfully learned ML, whereas I see ML researchers attempt and fail to learn chemistry all the time.)"
This is something that rings really true to me. I work in imaging and it's just very clear that there are groups of people in ML that don't want to learn how things actually work and just want to throw a model at it (this is a generalization obviously, but it's more often than not the case). It only gets you 80% there, which is fine usually, but not fine when the details are make or break for a company. Unfortunately that last 20% requires understanding of the domain and people just don't like digging into a topic to actually understanding things.
To pick on the game of Go as an example, the insight from the Bitter Lesson is the best method to use for Go is probably going to be the most broad method that abuses the magic of brute force the most, compared to a method that really encodes the best existing strategies of Go.
I think what OP is referring to and what I’ve observed is that in some fields you have to have a certain amount of expertise to just understand the rules of the game, and there are a lot of applications where someone approaching a problem with the goal of applying ML can’t quite get over the hump of understanding all the rules.
Someone still needs to have enough domain knowledge to figure out what needs solving and how to validate that it was solved.
Being able to frame problems, and being able to cobble together the data and infra for the AI to solve them are the most valuable things in the medium term.
Yes I think having domain knowledge (such as Chemistry, Physics etc) is invaluable for real world problem solving. I am a Materials Engineer I see it in my field with ML as well. I think people are starting to understand now that being able to validate the inputs and outputs requires some understanding of the process you are trying to model. i.e. To ensure you aren't proposing things that fundamentally contradict basic Thermodynamics etc.
Even for optimization type modelling you need domain knowledge to design sensible constraints to frame the problem.
> Neither side “won” this merger. I think both Brain and DeepMind lose. I expect to see many project cancellations, project mergers, and reallocations of headcount over the next few months, as well as attrition.
This merger will be a big test for Sundar, who has openly admitted years ago to there being major trust issues [1]. Can Sundar maintain the perspective of being the alpha company while bleeding a ton of talent that doesn't actively contribute to tech dominance? Or will he piss off the wrong people internally? It's OK to have a Google Plus / Stadia failure if the team really wanted to do the project. If the team does _not_ want to work together though, and they fail, then Sundar's request that the orgs work together to save the company is going to get totally ignored in the finger-pointing.
> I sat on it because I wasn’t sure of the optics of posting such an essay while employed by Google Brain. But then Google made my decision easier by laying me off in January. My severance check cleared...
I'm really baffled by how people think it's OK to write public accounts of their previous (and sometime current!) employers' inner workings. This guy got paid a shitload of money to do work and to keep all internal details private, even after he leaves. They could not be more clear about this when you join the company.
Why do people think it's OK to share like this? This isn't a whistleblowing situation -- he's just going for internet brownie points. It's just an attempt to squeeze a bit more personal benefit out of your (now-ended) employment.
Contractual/legal issues aside, I think this kind of post shows a lack of personal integrity (because he did sign a paper agreeing not to disclose info), and even a betrayal of former teammates who now have to deal with the fallout.
How do you know he was paid to keep all internal details private after he leaves? Do you have knowledge of the employment contract, can you share the relevant language with us?
All he says is his concern about "optics", which has nothing to do with contract.
If Google has a problem with his post, they can go after him, but that's an issue between Google and him, not with you or me or the rest of the internet.
I'm definitely struggling to see what any of this has to do with personal integrity, betrayal, or squeezing personal benefit. To the contrary, it simply seems informative and he's sharing knowledge just to be helpful. Unless I've missed something, I don't see anything revealed that would harm Google or his former teammates here. No leaks of what's coming down the pipeline, no scandals, nothing of the sort.
People are allowed to share opinions of their previous employment and generally describe the broad outlines of their work and where they worked. This isn't a situation of working for the CIA with top-secret clearance.
> Do you have knowledge of the employment contract, can you share the relevant language with us?
I actually could dig up my own contract from years ago (ugh, the effort though) but Confidentiality clause is in there, and it was made clear during on-boarding what is expected from employees: don't share any internal info unless you're an authorized company representative.
General working conditions are not "internal info"; it is beneficial to society to discuss working conditions (which can be pretty detailed), and is thus a protected activity under various laws. Nothing in any contract can obviate this lawful right (and, to some, a basic duty of citizenship, since the country is more important than any single company within the country). At best contracts can highlight what is privileged information that there is a duty to keep secret.
Broadly, section 7 of the NLRA, but it is not spelled out in the text of the act. Instead, Quicken Loans, Inc. v. NLRB established the precedent that discussing working conditions is not a violation of confidentiality: https://www.lexisnexis.com/community/casebrief/p/casebrief-q...
The big one in the USA is the National Labor Relations Act, but this generally applies to group action. However, such group action can be literally publically posting about a job's work conditions.
Contracts are required to be 'reasonable' for both parties. This puts limits on the ability to constrain in a contract. I don't know how much this reasonableness standard is statutory or judicial.
https://www.pullcom.com/working-together/there-are-limits-to...
> there is no statutory protection if the employee was only complaining about personal matters, such as the terms or conditions of employment. The employee has to show that he was commenting on a matter of public concern, rather than attempting to resolve a private dispute with the employer.
You have to be careful thinking you owe your employer everything they would wish to have. Disclosing the inner workings is extremely helpful to people trying to figure out where to work, and how their current employer compares to others. I got a job at Google X Robotics in 2017 in large part because the place was so secret and I always wanted to know what happened there. It was quite an interesting experience, but I do wonder how I would have felt if someone like me working there had written something like this before I made the decision.
I've never worked at Google Brain, but I've been a research manager in tech for a decade, and nothing here seems surprising to me. It discusses the archetype of the well funded industry lab that is too academic and ultimately winds down.
The post makes sensible but generic statements based on the view of an IC. It tries to work back from the conclusion (Google struggles to move academic research into product) and produces plausible but hardly definitive explanations, with no ranking, primarily because there's no discussion of the thinking, actions and promises made at the executive level that kept the lab funded for all these years.
You're right that I don't know enough as an IC to comment on the thinking, actions, and promises at the exec level that kept Brain funded. But even if I did, to the GP's point, I would not talk about that part at all.
I'm curious about your perspective as a research manager in tech - would you be willing to chat privately?
> But even if I did, to the GP's point, I would not talk about that part at all.
Even if you were one of said executives, who was moving to a higher-level job at another company, and the company was interviewing you as to the reasoning behind decisions made in your previous job?
I've been quite careful not to divulge anything confidential, and anything that is remotely close to sensitive has publicly accessible citations. My opinions about Google are tantamount to discussions about workplace conditions, and it would be very bad for society if ex-employees were not allowed to discuss those.
It’s obvious you care about the people you worked with, and the potential for what you were building. From my perspective you wrote this for the people who couldn’t.
But all your opinions are informed by your years of TGIFs and internal emails and discussions and presentations and your insider perspective. When you talk about promotions, or internal values and prioritizations, you are leveraging info gained privately.
If I'm wrong and nothing in your contract or on-boarding said you shouldn't talk about internals, then my bad. But I suspect they were as clear with you as they were with me, that it's not ok to post anything based on inside info. And in your opening paragraph you say:
> As somebody with a unique perspective
Your unique perspective was your access as an employee.
> and the unique freedom to share it
Your unique freedom is that you're done receiving money from them. But contractually, this doesn't matter.
I care because I value these Confidentiality commitments, and I believe that if someone doesn't like them, they should not sign them to begin with, rather than breaking them. A company (like any group of people) is allowed to define the culture and standards required for membership.
I've worked in extremely secretive companies, and very open ones. I prefer the open ones. But I still don't say anything about internals at the secretive ones -- because that was part of the commitment I made in exchange for employment.
From my perspective, in the most bold words I can think to phrase this: You're betraying the citizens and lawful residents of your country by not informing them of what to expect should they accept a job at one of these companies. Have fun with your 30 pieces of silver (https://en.wikipedia.org/wiki/Thirty_pieces_of_silver ).
> and I believe that if someone doesn't like them, they should not sign them to begin with, rather than breaking them.
Do you, at least, believe that confidentiality agreements should be broken if it is to make the police or public aware of a crime? How about a civil infraction, such as a hostile working environment?
> Do you, at least, believe that confidentiality agreements should be broken if it is to make the police or public aware of a crime? How about a civil infraction, such as a hostile working environment?
Absolutely. I referenced whistleblowing in my original post above. This isn't a such a case.
Does this apply to people who infiltrate what they believe are corrupt companies with the expectation of digging up dirt, and thus sign the confidentiality agreements with the expectation of violating them?
Does this apply to moral wrongs, which are technically legal (e.g. cruel conditions at animal farms)?
Because a promise that imprisons knowledge indefinitely has no integrity. To me, it's clear that it is absolutely in the public interest to - perhaps not go out of one's way to spread, but at least feel free to - explain the conditions and inner workings of large and impactful organizations.
We're all thinkers who are asked to apply our minds in exchange for money, not slaves whose brain is leased to or owned by our employers for the duration of our tenure.
Even when asked to keep things secret, there's still no way for a company to own every last vestige of knowledge or understanding retained in our minds, and there's still an overwhelming public interest in building on and preserving knowledge, to the point that, in my opinion, nearly any piece of human knowledge short of trade secrets should eventually be owned by humanity as a whole. (and there's even some moral arguments to be made about some trade secrets, but that's a much deeper discussion)
I personally find that people who are overly concerned with secrecy view their integrity from the lens of their employer, but not at a human interest level. To be clear, there are still times for secrecy or the security or integrity of information to be respected, but it's nuanced and generally narrower than people expect.
I'm really baffled by how many people think a job is more than a job and there's some ownership over the employee's critical thinking capabilities during the job and after it ends.
While I agree the OP is "going for internet brownie points" (or probably a bit butthurt from being laid off from a certifiably top-5 cushiest job in the United States) the article doesn't include anything even remotely trade secret and is predominantly opinion. It's really totally fine to blog about how you feel about your employer. There are certainly risks, but a company has to pay extra if they actually don't want you to blog at all (or they have to be extremely litigious).
There's a strong precedent of employer-employee loyalty that has substantially been set due to pre-internet information disparities between the two. In the past year or so, there have been some pretty unprecedented layoffs (e.g. Google lays off thousands and then does billions of stock buybacks ...)... The employer-employee relationship needs to evolve.
Part of my problem is that Google (and similar companies) are already paying insane (in the best way possible) amounts of money, and when you sign the agreement to take said money, you explicitly promise you won't talk about company internals.
To me this is quite simple: if you accept what winds up being millions of dollars in cash+equity, and you give your word that you'll keep your mouth shut as one of the conditions for that pile of money... then you shut keep your mouth shut.
You keep your mouth shut about material property of Google. E.g. don't post code, and probably don't give details about code that hasn't been made public. Sure, this area is not clear and can also depend on one's internal position in the company, but it's important to separate moral hazard from legal hazard.
As far as one's opinions go, and in particular how the company made you feel, that's not paid for. A severance agreement might outline some things, but again that's legal hazard and not moral hazard. There are certainly some execs and managers who will only want to work with really, really loyal people, who throw in a lot more for the money. And some execs will pay a lot more for that... e.g. look at how much Tesla spends on employee litigation.
Should you also not use any skills you gained at your previous employer at your new employer? Not to mention any techniques you learned about that may help your current employer? Would doing so be "talking about company internals"?
So how do you ever get a better job than entry level, if you aren't willing to use the knowledge you gained at prior jobs in new jobs?
I read the article and thought he did a fine job of not spilling too many secrets - I'm curious what you thought he said that crossed the line?
I'm not personally aware of signing something that says "I'll keep all internal details private" though I agree I'd be highly unlikely to refer to anyone below the SVP level by name -- but I think that's exactly what OP did?
> he's just going for internet brownie points. It's just an attempt to squeeze a bit more personal benefit out of your (now-ended) employment.
So, many researcher types (and not only them, but also including many of us) are motivated by — find it personally rewarding at a psychological level — to share their thoughts in a dialog with a community. They just find this to be an enjoyable thing to do that makes them feel like they are a valuable member of society contributing to a general collaborative practice of knowledge-creation.
I hope it doesn't feel like I'm explaining the obvious; but it occurs to me that to ask the question the way you did, this is probably _not_ a motivation you have, not something you find personally rewarding. Which is fine, we all are driven by different things.
But I don't think it's quite the same thing as "internet brownie points" -- while if you are especially good at it, you will gain respect and admiration, which you will probably appreciate it -- you aren't thinking "if I share my insight gained working at Google, then maybe more people will think I'm cool," you're just thinking that it's a natural urge to share your insight and get feedback on it, because that itself is something enjoyable and gives you a sense of purpose.
Which is to say, i don't think it's exactly a motivation for "personal benefit" either, except in the sense that doing things you enjoy and find rewarding are a "personal benefit", that having a sense of purpose is a "personal benefit", sure.
I'm aware that not everyone works this way. I'm aware that some people on HN seem to be motivated primarily by maximizing income, for instance. That, or some other orientation, may lead to thinking that one should never share anything at all publicly about one's job, because it can only hurt and never help whatever one's goals are (maximizing income or what have you).
(Although... here you are commenting on HN; why? For internet brownie points?)
But that is not generally how academic/researcher types are oriented.
I think it's a sad thing if it becomes commonplace to think that there's something _wrong_ with people who find purpose and meaning in sharing their insights in dialog with a community.
The primary purpose of an NDA is to allow the company to enforce trade secrets: the existence of the NDA is proof that the company took steps to maintain the secrets' secrecy. Nothing in this blog post looks like a trade secret to me; rather, it's one person's fairly high-level reflections on the work environment at a particularly high profile lab.
While he technically may have violated the NDA, it's really hard for me to see any damage or fallout from this post. It's gentle, disparages only at the highest levels of abstraction, doesn't name names, etc. I don't think it makes sense to view it in a moralistic or personal integrity light. Breach-of-contract is not a moral wrong, merely a civil one that allows the counterparty (Google) to get damages if they want.
> a betrayal of former teammates who now have to deal with the fallout.
What fallout is this ? Did you sign a contract with him ? If you are harmed by it, why don't you seek legal recourse ? Your entire rant started with some NDA stuff, and in the end you say, "legal issues aside". This is like the "Having said that" move from Curb. You start with something, then contradict yourself completely with "Having said that". If you have a contractual grievance, seek it. If not, you are grieving on the internet, just like he is.
That's a very optimistic take on how new generations perceive the ethics surrounding confidentiality. Are you really _baffled_ by this? I understand that it's a common position, but it's a position that is so clearly tainted by the conflicts of interest between employer and employee. And a keen awareness of those conflicting interests _only_ serves to better an employee's ability to serve themselves best in a capitalist economy.
I'm not saying you are wrong per se. But if you don't see why employees are willing to act in this way, you don't see how employees feel about being trapped in a system where no matter how much you are paid - you are ultimately making someone else more.
I totally get why people act this way. Because it's Very Important That Everyone Knows What I Think.
But there is no inequity or conflict of interest here. None of that about being trapped in a capitalist economy is to the point here. He has probably a couple million dollars in his bank account that wasn't there before, and the deal was to not talk publicly about internals (which includes promotions process, internal motivations and decision-making, etc).
You don't think there's any inequity between somebody with a couple million in the bank and google, or any conflict of interest between my desire to talk about my work and my employer's desire that I do not? Your position is valid enough without being willfully obtuse.
You're thinking about employment contracts like an abstract economic exchange between two free parties, which is very micro. Try thinking about it instead like a bargain with the (macro) devil.
In other words, consider someone's perspective who has society split into two camps: the people who do all the work, and the corrupt elite that make their living through theft and oppression.
In such a world, signing a contract with an employer (i.e. capitalist i.e. elite) is more of a practical step than a sacrosanct bond.
There's a level of "what's reasonable" and "what legally enforceable" beyond the basic "never break a promise" level you're working at, IMO.
No ones endorsing publishing trade secrets randomly, but you're treating all disclosures like they're equivalent.
I'm not suggesting people shouldn't post thoughts and opinions. This is about whether a personal desire to self-express, should override their explicit prior commitment/promise not to do so.
I checked with some Google friends who told me their contract even makes it illegal to tell anyone that they work for Google (no joke).
One side is what's in the papers you signed and the other side is to what
extent the terms can be enforced. But you have a point in that it would be good professional practice to wait for a decade before disclosing internals especially when names of people are dropped...
> I checked with some Google friends who told me their contract even makes it illegal to tell anyone that they work for Google (no joke).
Which country are they in? Is their employment contract directly with Google, or with some other, independent company that provides services to Google?
(I've been employed by G in multiple jurisdictions and have never seen or heard of such a clause.)
Organized, concise, and not wordy. Props to the writer, he shows a deep degree of written communication skills on a topic frequently cluttered with jargon.
While Google is busy imploding the next generation of startups can flourish. I'm being hopeful that they decimate a lot of big tech and they don't just all get bought out.
Hmm. Anything that slows Google down and maintains a diversity of leaders in the field is ok with me.
Imagine a host of "helpful" Google AI's, Facebook AI's, Amazon AI's, etc., that know their very existence depends them monetizing you more effectively than competitive AI's.
Of course, the first versions will be very helpful. But continuous efforts to remain "the most helpful" will cost a lot, and eventually need to pay for themselves.
> The next obvious reason for Google to invest in pure research is for the breakthrough discoveries it has yielded and can continue to yield. As a rudimentary brag sheet, Brain gave Google TensorFlow, TPUs, significantly improved Translate, JAX, and Transformers.
Except that these advances have made other companies an existential threat for Google. 2 years ago it was hard to imagine what could topple Google. Now a lot of people can see a clear path: large language models.
From a business perspective it's astounding what a massive failure Google Brain has been. Basically nothing has spun out of it to benefit Google. And yet at the same time, so much has leaked out, and so many people have left with that knowledge Google paid for, that Google might go the way of Yahoo in 10 years.
This is the simpler explanation of the Brain-Deep Mind merger: both Brain and Deep Mind have fundamentally failed as businesses.
> Basically nothing has spun out of it to benefit Google
Quite a ridiculous statement. Google has inserted ML all over their products. Maybe you just don't notice, to their credit. But for example the fact that YouTube can automatically generate subtitles for any written language from any spoken language is a direct outcome of Google ML research. There are lots of machine-inferred search ranking signals. Google Sheets will automatically fill in your formulas, that's in-house ML research, too.
> Quite a ridiculous statement. Google has inserted ML all over their products. Maybe you just don't notice, to their credit. But for example the fact that YouTube can automatically generate subtitles for any written language from any spoken language is a direct outcome of Google ML research. There are lots of machine-inferred search ranking signals. Google Sheets will automatically fill in your formulas, that's in-house ML research, too.
I noticed all the toy demos. None of these have provided Google with any competitive advantage over anyone.
For the investment, Google Brain has been a massive failure. It provided Google with essentially zero value. And helped create competitors.
Automatic subtitles and translation is actually a huge feature which is very useful to the many people that don't speak English. It definitely did provide Google with a lot of value.
> Automatic subtitles and translation is actually a huge feature which is very useful to the many people that don't speak English. It definitely did provide Google with a lot of value.
It lost Google immense value.
Before Google Brain the only speech recognizers that halfway worked were at Google, IBM and Amazon. And Amazon had to buy a company to get access.
After Google Brain, anyone can run a speech recognizer. One that is state of the art. There are many models out there that just work well enough.
Google went from having an ok speech recognizer that sort of worked in a few languages and gave YouTube an advantage that no company aside from IBM and Amazon could touch. Neither of which compete with Google much. No startup could have anything like Google's captioning. It was untouchable. Like, speech recognition researchers actively avoided this competition, that's how inferior everyone was.
To now, post Google Brain, any startup can have captions that are as good as YouTube's captions. You can run countless models on your laptop today.
This is a huge competitive loss for Google.
They got a minor feature for YouTube and lost one of the key ML advantages they had.
It's too late now. YouTube penetrated every single market outside of China and is now unshakeable from network effect
It completely paid off already, and Google is going to be reaping the dividends of the advantage they had in emerging markets for the next 15 years.
The real advantage has always been network effect. Purely technological moats don't work in the long term. People catching up was inevitable, but Google was able to cash it into a untouchable worldwide lead, and on top of that they made their researchers happy and recruited others by allowing them to publish, and they don't need to maintain an expensive purely technical lead.
You can run your speech recognizer on AWS, you don't need to give Google a cent.
Whatever comes after YouTube, if it's a startup or not, it will have top-notch captioning, just like YouTube. Google gave up a massive competitive advantage with huge technical barriers.
Oh, I don't disagree at all that Google Brain provided enormous value for society. Just like Xerox PARC. Both of them were a massive drain on resources and provided negative value for the parent company.
And I agree, it's not Google Brain's fault. Google's management has been a disaster for a long time. It's just amazing how you can have every advantage and still achieve nothing.
Google never talked much about it externally, but Google Research (the predecessor to Brain) had a single project which almost entirely funded the entire division- a growth-oriented machine learnign system called Sibyl. What was sibyl used for? Growing youtube and google play and other products by making the more addictive. Sibyl wasn't a very good system (I've never seen a product that had more technical debt) but it did basically "pay for" all of the research for a while.
Yep- that goes into a fair amount of detail. Sibyl was retired but many of the ideas lived on in TFX. I worked on it a bit and it was definitely the weirdest, most technically-debt-ridden systems I've ever seen, but it was highly effective at getting people additicted to watching Youtube and downloading games that showed ads.
It's evil if you phrase it (as OP did) as "getting people addicted to YouTube".
Less so if you phrase it "show people recommendations that they're likely to actually click on, based on what they've watched previously", which is what Sybil really was.
Google has very good LLMs. It just let OpenAI beat them to the punch by releasing them earlier.
As an established business, Google felt it had a lot to lose by releasing "unsafe" AI into the world. But OpenAI doesn't have a money printing machine, and it's sink-or-swim for them.
I keep hearing this, but Bard sucks so badly when i've tried to use it like GPT-4 or compare results its like night and day. What makes you so confident they have "secret" LLMs that are superior?
I work for Google and have been playing with it. It’s pretty good.
The decision to release Bard, an LLM that was clearly not as good as ChatGPT, struck me as reactive and is why people think Google is behind. I’d think so too if I had just demoed Bard.
No, but would love to try it. I'm using these models 20-30 times a day throughout the average work day for random tasks so have a pretty good sense of performance levels. Didn't think it was available to public yet but just saw it's apparently on google cloud now, i'll have to try it out. How do you compare Palm with GPT4 if you've had a chance to try both?
Seems pretty similar. In general Google LLMs seem better suited for just conversation and ChatGPT is built to honor “write me X in the style of Y” prompts.
The latter is more interesting to play around with, granted, and I think it’s an area where Google can catch up, but it doesn’t seem like a huge technical hurdle.
Bard is in full-scale production to all U.S. users for free. GPT-4 costs $20/month. Rather a big difference in the economics of the situation. Also it's pretty clear that even the $20 is highly subsidized. Microsoft is willing to incinerate almost any amount of money to harm Google.
Yes I think it has less utility than the free version of ChatGPT, but it also has some nice points, is faster, and has fewer outages.
For my use case none of them is worth using. All three of the ones we've mentioned in this thread will just make up language features that would be useful but don't exist, and all three of them will hallucinate imaginary sections of the C++ standard to explain them. Bard loves `std::uint128_t`. GPT-4 will make up GIS coordinate reference systems that don't exist. For me they are all more trouble than they are worth, on the daily.
> Also it's pretty clear that even the $20 is highly subsidized.
This isn't the case. There's a podcast (somewhere? I thought it was a Lex one but I can't find it) where someone from open AI went into some depth about the economics.
GPT-4 is also free to all users, not just from the US, with 200 turns per day and 20 per conversation. It's just called "Bing Chat mode" instead of GPT-4. Of course Microsoft is losing money with it. But Microsoft can afford to lose money.
> And yet at the same time, so much has leaked out, and so many people have left with that knowledge Google paid for, that Google might go the way of Yahoo in 10 years.
Google couldn't have hired the talent they did without allowing them to publish.
“Given the long timelines of a PhD program, the vast majority of early ML researchers were self-taught crossovers from other fields. This created the conditions for excellent interdisciplinary work to happen. This transitional anomaly is unfortunately mistaken by most people to be an inherent property of machine learning to upturn existing fields. It is not.
Today, the vast majority of new ML researcher hires are freshly minted PhDs, who have only ever studied problems from the ML point of view. I’ve seen repeatedly that it’s much harder for a ML PhD to learn chemistry than for a chemist to learn ML.”