I have published an addendum to an article I wrote about AlphaChip (https://vighneshiyer.com/misc/ml-for-placement/) at the very bottom that addresses this rebuttal from Google and the AlphaChip algorithm in general.
In short, I think the Nature authors have made some reasonable criticisms regarding the training methodology employed by the ISPD authors, but the extreme compute cost and runtime of AlphaChip still makes it non-competitive with commercial autofloorplanners and AutoDMP. Regardless, I think the ISPD authors owe the Nature authors an even more rigorous study that addresses all their criticisms. Even if they just try to evaluate the pre-trained checkpoint that Google published, that would be a useful piece of data to add to the debate.
In the conclusion of the article, you said: "While I concede that there are things the ISPD authors could have done better, their conclusion is still sound. The Nature authors do not address the fact that CMP and AutoDMP outperform CT with far less runtime and compute requirements."
One key argument in the rebuttal against the ISPD article is that the resources used in their comparison were significantly smaller. To me, this point alone seems sufficient to question the validity of the ISPD work's conclusions. What are your thoughts on this?
Additionally, I noticed that the neutral tone of this comment is quite a departure from the strongly critical tone of your article toward the AlphaChip work (words like "arrogance", "disdain", "hyperbole", "belittling", "hostile" for AlphaChip authors, as opposed to "excellent" for a Synopsys VP.) Could you share where this difference in tone originates?
> One key argument in the rebuttal against the ISPD article is that the resources used in their comparison were significantly smaller. To me, this point alone seems sufficient to question the validity of the ISPD work's conclusions. What are your thoughts on this?
I believe this is a fair criticism, and it could be a reason why the ISPD Tensorboard shows divergence during training for some RTL designs. The ISPD authors provide their own justification for their substitution of training time for compute resources in page 11 of their paper (https://arxiv.org/pdf/2302.11014).
I do not think it changes the ISPD work's conclusions however since they demonstrate that CMP and AutoDMP outperform CT wrt QoR and runtime even though they use much fewer compute resources. If more compute resources are used and CT becomes competitive wrt QoR, then it will still lag behind in runtime. Furthermore, Google has not produced evidence that AlphaChip, with their substantial compute resources, outperforms commercial placers (or even AutoDMP). In the recent rebuttal from Google (https://arxiv.org/pdf/2411.10053), the only claim on page 8 says Google VLSI engineers preferred RL over humans and commercial placers on a blind study conducted in 2020. Commercial mixed placers, if configured correctly, have become very good over the past 4 years, so perhaps another blind study is warranted.
> Additionally, I noticed that the neutral tone of this comment is quite a departure from the strongly critical tone of your article
I will openly admit my bias is against the AlphaChip work. I referred to the Nature authors as 'arrogant' and 'disdainful' with respect to their statement that EDA CAD engineers are just being bitter ML-haters when they criticize the AlphaChip work. I referred to Jeff Dean as 'belittling' and 'hostile' and using 'hyperbole' with respect to his statements against Igor Markov, which I think is unbecoming of him. I referred to Shankar as 'excellent' with respect to his shrewd business acumen.
Thank you for your thoughtful response. Acknowledging potential biases openly in a public forum is never easy, and in my view, it adds credibility to your words compared to leaving such matters as implicit insinuations.
That said, on page 8, the paper says that 'standard licensing agreements with commercial vendors prohibit public comparison with their offerings.' Given this inherent limitation, what alternative approach could have been taken to enable a more meaningful comparison between CT and CMP?
So I'm not sure what Google is referring to here. As you can see in the ISPD paper (https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.p...) on page 5, they openly compare Cadence CMP with AutoDMP and other algorithims quantitatively. The only obfuscation is with the proprietary GF12 technology, where they can't provide absolute numbers, but only relative ones. Comparison against commercial tools is actually a common practice in academic EDA CAD papers, although usually the exact tool vendor is obfuscated. CAD tool vendors have actually gotten more permissive about sharing tool data and scripts in public over the past few years. However, PDKs have always been under NDAs and are still very restrictive.
Perhaps the Cadence license agreement signed by a corporation is different than the one signed by a university. In such a case, they could partner with a university. But I doubt their license agreement prevents any public comparison. For example, see the AutoDMP paper from NVIDIA (https://d1qx31qr3h6wln.cloudfront.net/publications/AutoDMP.p...) where on page 7 they openly benchmark their tool against Cadence Innovus. My suspicion is they wish to keep details about the TPU blocks they evaluated under tight wraps.
The UCSD paper says "We thank ... colleagues at Cadence and Synopsys for policy changes that permit our methods and results to be reproducible and sharable in the open, toward advancement of research in the field." This suggests that there may have been policies restricting publication prior to this work. It would be intriguing to see if future research on AlphaChip could receive a similar endorsement or support from these EDA companies.
Cadence in particular has been quite receptive to allowing academics and researchers to benchmark new algorithms against their tools. They have also been quite permissive with letting people publish TCL scripts for their tools (https://github.com/TILOS-AI-Institute/MacroPlacement/tree/ma...) that in theory should enable precise reproduction of results. From my knowledge, Cadence has been very permissive from 2022 onwards, so while Google's objections to publishing data from CMP may have been valid when the Nature paper was published, they are no longer valid today.
We're not just talking about academia—Google's AlphaChip has the potential to disrupt the balance of the EDA industry's duopoly. It seems unlikely that Google could easily secure the policy or license changes necessary to publish direct comparisons in this context.
If publicizing comparisons of CMPs is as permissible as you suggest, have you seen a publication that directly compares a Cadence macro placement tool with a Synopsys tool? If I were the technically superior party, I’d be eager to showcase the fairest possible comparison, complete with transparent benchmarks and tools. In the CPU design space, we often see standardized benchmarking tools like SPEC microbenchmarks and gaming benchmarks. (And IMO that's part of why AMD could disrupt the PC market.) Does the EDA ecosystem support a similarly open culture of benchmarking for commercial tools?
> Does the EDA ecosystem support a similarly open culture of benchmarking for commercial tools?
If only. The comparison in Cheng et al. is the only public comparison with CMP that I can recall, and it is pretty suss that this just so happens to be a very pro-commercial-autoplacer study. (And, Cheng et al. have cited 'licensing agreements' as a reason for not giving out the synthesized netlists necessary to reproduce their results.)
Reminded a bit of Oracle. They likewise used to (and maybe still?) prohibit any benchmarking of their database software against that of another provider. This seems to be a common move for solidifying a strong market position.
I am trying to understand what you mean here by potential to disrupt. AlphaChip addresses one out of hundreds of tasks in chip design. Macro placement is a part of mixed-size placement, which is handled just fine by existing tools, many academic tools, open-source tools, and Nvidia AutoDMP. Even if AlphaChip was commonly accepted as a breakthrough, there is no disruption here. Direct comparisons from the last 3 years show that AlphaChip is worse. Granted, Google is belittling these comparisons, but that's what you'd expect. In any case, evidence is evidence.
> Direct comparisons from the last 3 years show that AlphaChip is worse.
Do you have any evidence to claim this? The whole point of this thread is that the direct comparisons might have been insufficient, and even the author of "The Saga" article who's biased against the AlphaChip work agreed.
> Granted, Google is belittling these comparisons, but that's what you'd expect.
This kind of language doesn't help any position you want to advocate.
About "the potential to disrupt", a potential is a potential. It's an initial work. What I find interesting is that people are so eager to assert that it's a dead-end without sufficient exploration.
That's the ISPD paper referenced many times in this whole thread.
> Stronger Baselines
Re: "Stronger baselines", the paper "That Chip Has Sailed" says "We provided the committee with one-line scripts that generated significantly better RL results than those reported in Markov et al., outperforming their “stronger” simulated annealing baseline." What is your take on this claim?
As for 'regurgitating,' I don’t think it helps Jeff Dean’s point either. Based on my and vighneshiyer's discussion above, describing the work as "fundamentally flawed" does not seem far-fetched. If Cheng and Kahng do not agree with this, I believe they can publish another invited paper.
On 'belittle,' my main issue was with your follow-up phrase, 'that’s what you’d expect.' It comes across as overly emotional and detracts from the discussion.
Regarding lack of follow-ups (I am aware of), the substantial resources required for this work seem beyond what academia can easily replicate. Additionally, according to "the Saga" article, both non-Jeff Dean authors have left Google until recently, but their Twitter/X/LinkedIn seem to say they came back to Google and seem to have worked on this "Sailing Chip" paper.
Personally, I hope they reignite their efforts on RL in EDA and work toward democratizing their methods so that other researchers can build new systems on their foundation. What are your thoughts? Do you hope they improve and refine their approach in future work, or do you believe there should be no continuation of this line of research?
The point is that the Cheng et al results and paper were shown to Google and apparently okayed by Google points of contact. After this, complaining that Cheng et al didn't ask someone outside Google makes little sense. These far fetched excuses and emotional wording by Jeff Dean leave a big cloud over the Nature work. If he is confident everything is fine, he would not bother.
To clarify "you'd expect" - if Jeff Dean is correct, he'd deny problems and if he's wrong he'd deny problems. So, his response carries little information. Rationally, this should be done by someone else with a track record in chip implementation.
Could you please point out the specific lines you are dissatisfied with? Is it something an additional publication cannot resolve?
Additionally, in case you forgot to answer, what is your wish for the future of this line of research? Do you hope to see it improve the EDA status quo, or would you prefer the work to stop entirely? If it is the latter, I would have no intention of continuing this conversation.
I am referring to direct comparisons in Cheng et al and in Stronger Baselines that everyone is discussing. Let's assume your point about "might have been insufficient". We don't currently have the luxury to be frequentists, as we don't have many academic groups reporting results for running Google code. From the Bayesian perspective, that's the evidence we have.
Maybe you know more such published papers than I do, or you know the reasons why there aren't many. Somehow this lack of follow-up over three years suggests a dead-end.
As for "belittle", how would you describe the scientific term "regurgitating" used by Jeff Dean? Also, the term "fundamentally flawed" in reference to a 2023 paper by two senior professors with serious expertise and track record in the field, that for some reason no other experts in the field criticize? Where was Jeff Dean when that paper was published and reported by the media?
Unless Cheng and Kahng agree with this characterization, Jeff Dean's timing and language are counterproductive. If he ends up being wrong on this, what's the right thing to do?
We're talking 16 GPUs for ~6 hrs for inference, and 48 hrs for pre-training. This is not an exorbitant amount of compute.
A GPU costs $1-2/hr on the cloud market. So, ~$100-200 for inference, and ~$800-1600 for pre-training, which amortizes across chips. Cloud prices are an upper bound -- most CS labs will have way more than this available on premises.
In an industry context, these costs are completely dwarfed by the rest of the chip design process. (For context, the licensing costs alone for most commercial EDA software are in the millions of dollars.)
You are correct. For commercial use, the GPUs used for training and fine-tuning aren't a problem financially. However, if we wanted to rigorously benchmark AlphaChip against simulated annealing or other floorplanning algorithms, we have to afford the same compute and runtime budget to each algorithm. With 16 GPUs running for 6 hours, you could explore a huge placement space using any algorithm, and it isn't clear if RL will outperform the other ones. Furthermore, the runtime of AlphaChip as shown in the Nature paper and ISPD was still significantly greater than Cadence's concurrent macro placer (even after pre-training, RL requires several hours of fine-tuning on the target problem instance). Arguably, the runtime could go down with more GPUs, but at this point, it is unclear how much value is coming from the policy network / problem embedding vs the ability to explore many potential placements.
> However, if we wanted to rigorously benchmark AlphaChip against simulated annealing or other floorplanning algorithms, we have to afford the same compute and runtime budget to each algorithm.
The Nature authors already presented such a study in their appendix:
"To make comparisons fair, we ran 80 SA experiments sweeping different hyperparameters, including maximum temperature (10^−5, 3 × 10^−5, 5 × 10^−5, 7 × 10^-5, 10^−4, 2 × 10^−4, 5 × 10^−4, 10^−3), maximum SA episode length (5 × 10^4, 10^5) and seed (five different random seeds), and report the best results in terms of proxy wirelength and congestion costs in Extended Data Table 6"
You're saying that if the other methods were given the equivalent amount of compute they might be able to perform as well as AlphaChip? Or at least that the comparison would be fairer?
Existing mixed-placement algorithms depend on hyperparameters, heuristics, and initial states / randomness. If afforded more compute resources, they can explore a much wider space and in theory come up with better solutions. Some algorithms like simulated annealing are easy to modify to exploit arbitrarily more compute resources. Indeed, I believe the comparison of AlphaChip to alternatives would be fairer if compute resources and allowed runtime were matched.
In fact, existing algorithms such as naive simulated annealing can be easily augmented with ML (e.g. using state embeddings to optimize hyperparameters for a given problem instance, or using a regression model to fine-tune proxy costs to better correlate with final QoR). Indeed, I strongly suspect commercial CAD software is already applying ML in many ways for mixed-placement and other CAD algorithms. The criticism against AlphaChip isn't about rejecting any application of ML to EDA CAD algorithms, but rather the particular formulation they used and objections to their reported results / comparisons.
That sounds like future work for simulated annealing fans to engage in, quite honestly, rather than something that needs to be addressed immediately in a paper proposing an alternative method. The proposed method accomplished what it set out to do, surpassing current methods; others are free to explore different hyperparameters to surpass the quality again... This is, ultimately, why we build benchmark tasks: if you want to prove you know how to do it better, one is free to just go do it better instead of whining about what the competition did or didn't try on one's behalf.
Yes, they are. The other approaches usually look like simulated annealing, which has several hyperparameters that control how much computing is used and improve results with more compute usage.
I understand and have read the article. Running 80 experiments with a crude form of simulated annealing is at most 0.0000000001% of the effort that has been spent on making that kind of hill climb work well by traditional EDA vendors. That is also an in-sample comparison, where I would believe the Google thing pre-trained on Google chips would do well, while it might have a harder time with a chip designed by a third party (further from its pre-training).
The modern versions of that hill climb also use some RL (placing and routing chips is sort of like a game), but not in the way Jeff Dean wants it to be done.
The comparison in that paper was very much not fair to Google's method. Google's original published comparison to simulated annealing is not fair to simulated annealing methods. That is, unfortunately, part of the game of publication when you want to publish a marginal result.
It is possible that the pre-training step may overfit to a particular class of chips or may fail to converge given a general sample of chip designs. That would make the pre-training step unable to be used in the setting of a commercial EDA tool. The people who do know this are the people at EDA companies who are smart and not arrogant and who benchmarked this stuff before deciding not to adopt it.
If you want to make a good-faith assumption (that IMO is unwarranted given the rest of the paper), the people trying to replicate Google's paper may have done a pre-training step that failed to converge, and then didn't report it. That failure to converge could be due to ineptitude, but it could be due to data quality, too.
The Google internal paper by Chatterjee and the Cheng et al paper from UCSD made such comparisons with Simulated Annealing. The annealer in the Nature paper was easy to improve. When given the same time budget, the improved annealer produced better solutions than AlphaChip. When you give both more time, SA remains ahead. Just read published papers.
The UCSD paper didn't run the Nature method correctly, so I don't see how you can draw this conclusion.
From Jeff's tweet:
"In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article), robbing our learning-based method of its ability to learn from other chip designs, then used 20X less compute and did not train to convergence, preventing our method from fully learning even on the chip design being placed."
As for Chatterjee's paper, "We provided the committee with one-line scripts that generated significantly better RL results than those reported in Markov et al., outperforming their “stronger” simulated annealing baseline. We still do not know how Markov and his collaborators produced the numbers in their paper."
Yes, they even do at $1/GPU/hr. However, 8xH100 cluster at full utilization is ~8kWh of electricity and costs almost ~0.5M$. 16xH100 cluster is probably 2x of that. How many years before you break-even at ~24$/GPU/day income?
You should care about counterparty risks. If your business model depends on unsustainable 3rd party prices powered by VC largesse and unrealizable dreams of dominance, the very least you can do is plan for the impending reckoning, after which GPU proces will be determined by costs.
Look, I understand that some people are short-sighted and can hardly think out of the box and that is totally fine by me. I don't judge you for being that so I kindly ask you not to judge my question. Learn to give some benefit of the doubt.
At this point in time, why wouldn't we give at least benefit of the doubt to Jeff Dean immediately? His track record is second to none, and he's still going strong. Has something happened that cast a shadow on him? Sometimes it is the messenger that brings in the weight.
Looks like he aligned himself with the wrong folks here. He is a system builder at heart but not an expert in chip design or EDA. And also not really an ML researcher. Some would say he got taken for a ride by a young charismatic grifter and is now in too deep to back out. His focus on this project didn’t help with his case at Google. They moved all the important stuff away from him and gave it to Demis last year and left him with an honorary title. Quite sad really for someone of his accomplishments.
Not a ML researcher, too? He was working on neural networks in 1990. Last year he was under Research and now reports directly to Sundar. What do you know that we don't?
I don't think he got taken for a ride. Rather, he also wanted to believe that AlphaChip would be as revolutionary as it claimed to be and chose to ignore Chaterjee's reservations. Understandable, given all the AlphaX models coming out around that timeframe.
I mean Jeff Dean is probably more ML researcher than probably 90% of the ML researchers out there. Sure, he may not be working on state of the art stuff himself; but he's too up the chain to do that.
What are you even talking about? Jeff had a hand in TPU, which is so successful that all other AI companies are trying to clone this project and spin up their own efforts to make custom AI chips.
> Some would say he got taken for a ride by a young charismatic grifter and is now in too deep to back out.
Was the TPU physical design team also taken in? And also MediaTek? And also TF-Agents, which publicly said they re-produced the AlphaChip method and results exactly?
What did the TPU physical design team say about this publicly? Can you also point to a statement from MediaTek? (I've seen a quote in Google blog, but was unable to confirm it). Who in the TF-agents team has serious physical design background?
Are you really suggesting that the TPU team does not stand behind the graphs in Google's own blog post? And that MediaTek does not stand behind their quoted statement?
That made sense when Jeff Dean gave talks in 2020, 2021, and 2022. He is now responding to skepticism from the EDA community by unscholarly personal attacks and vague references to "many companies" using the work. He is beyond benefit of the doubt, and into the realm of probable cause.
Curious why there's so much emotion and unpleasantness in this dispute? How did it evolve from the boring academic argument about benchmarks, significance, etc to a battle of personal attacks?
This is a big part of the reason. But it behooves us to ask why a key innovation in a field (and I trust Jeff Dean that this is one, I’ve never seen any reason to doubt either his integrity or ability) should produce such a reaction. What could make people act not just chagrined that their approach wasn’t the end state, but as though it was existential to discredit such an innovation?
Surely all of the people who did the work that the innovation rests on should be confident they will be relevant, involved, comfortable, and safe in the post-innovation world?
And yet it’s not clear they should feel this way. Luddism seems an unfounded ideology over the scope of history since the origin of the term. But over the period since “AI” entered the public discussion at the current level? Almost two years exactly? Making the Luddite agenda credible has seemed a ubiquitous talking point.
Over that time frame technical people have been laid off in staggering numbers, a steadily-shrinking number of employers have been slashing headcount and posting EPS beats, and “AI” has been mentioned in every breath. It’s so extreme that even sophisticated knowledge of the kinds of subject matter that goes into AlphaChip is (allegedly) useless without access to the Hopper FLOPs.
If the AI Cartel was a little less rapacious, people might be a little more open to embracing the AI revolution.
Making extraordinary claims without a way to replicate it. And then running to the press, which will swallow anything. Because "AI designs AI... umm... I mean chips" sounds futuristic to a liberal-arts majors (and apparently programmers too, which I'd expect to know better and question everything "AI")
The whole publication process seems dishonest, starting from publishing in Nature (why not ISCCC or something similar?)
The issue is that Big Tech commercial incentives around AI have polluted the “boring academic” waters with dishonest infomercials masquerading as journal articles or arXiv preprints[1], and as a direct result contemporary AI research has a much worse “replication crisis” than the social sciences, yet with far fewer legitimate excuses.
Assuming Google isn’t lying, a lot of controversy would go away if they actually released their benchmark data for independent people to look at. They are still refusing to do so: https://cacm.acm.org/news/updates-spark-uproar/ Google thinks we should simply accept their conclusions by fiat. And don’t forget about this:
Madden further pointed out that the “30 to 35%” advantage of RePlAce was consistent with findings reported in a leaked paper by internal Google whistleblower Satrajit Chatterjee, an engineer who Google fired in 2022 when he first tried to publish the paper that discredited the “superhuman” claims Google was making at the time for its AI approach to chip design.
It is entirely appropriate to make “personal attacks” against Jeff Dean, because the heart of the criticism is that his personality is dishonest and authoritarian: he publishes suspicious research and fires people who dissent.
[1] Jeff Dean hypocritically sneering about the critique being a conference paper is especially galling. What an unbelievable asshole.
In the tweet Jeff Dean says that Cheng at al. failed to follow the steps required to replicate the work of the Google researchers.
Specifically:
> In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article), robbing our learning-based method of its ability to learn from other chip designs
But in the Circuit Training Google repo[1] they specifically say:
> Our results training from scratch are comparable or better than the reported results in the paper (on page 22) which used fine-tuning from a pre-trained model.
I may be misunderstanding something here, but which one is it? Did they mess up when they did not pre-train or they followed the "steps" described in the original repo and tried to get a fair reproduction?
Also, the UCSD group had to reverse-engineer several steps to reproduce the results so it seems like the paper's results weren't reproducible by themselves.
Markov’s paper also has links to Google papers from two different sets of authors that shows minimal advantage of pretraining. And given the small number of benchmarks using a pretrained model from Google whose provenance is not known would be counterproductive. Google likely trained it on all available benchmarks to regurgitate the best solutions of commercial tools.
Training from scratch could presumably mean including the new design attempts and old designs mixed in.
So no contradiction: pretrain on old designs then finetune on new design, vs train on everything mixed together throughout. Finetuning can cause catastrophic forgetting. Both could have better performance than not including old designs.
> Did they mess up when they did not pre-train or they followed the "steps" described in the original repo and tried to get a fair reproduction?
The Circuit Training repo was just going through an example. It is common for an open-source repo to describe simple examples for testing / validating your setup --- that does not mean this is how you should get optimal results in general. The confusion may stem from their statement that, in this example, they produced results that were comparable with the pre-trained results in the paper. This is clearly not a general repudiation of pre-training.
If Cheng et al. genuinely felt this was ambiguous, they should have reached out to the corresponding authors. If they ran into some part of the repo they felt they had to "reverse-engineer", they should have asked about that, too.
"These major methodological differences unfortunately invalidate Cheng et al.’s comparisons with and conclusions about our method. If Cheng et al. had reached out to the corresponding authors of the Nature paper[8], we would have gladly helped them to correct these issues prior to publication[9].
[8] Prior to publication of Cheng et al., our last correspondence with any of its authors was in August of 2022 when we reached out to share our new contact information.
[9] In contrast, prior to publishing in Nature, we corresponded extensively with Andrew Kahng, senior author of Cheng et al. and of the prior state of the art (RePlAce), to ensure that we were using the appropriate settings for RePlAce."
That is misleading. The first two authors left Google in August 2022 under unclear circumstances. The code and data were owned by Google, that's probably why Kahng continued discussibg code and data with his Google contacts. He received clear answers from several Google employees, so if they were at fault, Google should apologize rather than blame Cheng and Kahng.
"Prior to publication of Cheng et al., our last correspondence with any of its authors was in August of 2022 when we reached out to share our new contact information."
You don't stop being the corresponding authors of a paper when you change companies,
and whatever "unclear circumstances" you imagine took place when they left, they were also re-hired later, which a company would only do if they were in good standing.
In any case, those "Google contacts" also expressed concerns with how Cheng et al. were doing their study, which they ignored:
3.4 Cheng et al.’s Incorrect Claim of Validation by Google Engineers
Cheng et al. claimed that Google engineers confirmed its technical correctness, but this is untrue. Google engineers (who were not corresponding authors of the Nature paper) merely confirmed that they were able to train from scratch (i.e. no pre-training) on a single test case from the quick start guide in our open-source repository. The quick start guide is of course not a description of how to fully replicate the methodology described in our Nature paper, and is only intended as a first step to confirm that the needed software is installed, that the code has compiled, and that it can successfully run on a single simple test case (Ariane).
In fact, these Google engineers share our concerns and provided constructive feedback, which was not addressed. For example, prior to publication of Cheng et al., through written communication and in several meetings, they raised concerns about the study, including the use of drastically less compute, and failing to tune proxy cost weights to account for a drastically different technology node size.
The Acknowledgements section of Cheng et al. also lists the Nature corresponding authors and implies that they were consulted or even involved, but this is not the case. In fact, the corresponding authors only became aware of this paper after its publication.
It seems that Chatterjee - the bad guy in your linked article - is now suing Google because he thinks he got canned for pointing out that his boss - Jeff Dean mentioned in the article discussed here - was knowingly publishing fraudulent claims.
"To be clear, we do NOT have evidence to believe that RL outperforms academic state-of—art and strongest
commercial macro placers. The comparisons for the latter were done so poorly that in many cases the commercial tool failed to run due to installation issues." and that's supposedly a screenshot from an internal presentation done by Jeff Dean.
As an outsider, I find it very difficult to judge if Chatterjee was a bad and expensive hire (because he suppressed good results by coworkers) or if he was a very valuable employee (because he tried to prevent publishing false statements).
You're linking to his amended complaint - his original complaint was thrown out because it alleged things like "Google's motto is don't be evil, but they were evil, thus defrauding me."
According to a Google investigator's sworn statement, he admitted that he didn't have evidence to suspect the AlphaChip authors of fraud: "he stated that he suspected that the research being conducted by Goldie and Mirhoseini was fraudulent, but also stated that he did not have evidence to support his suspicion of fraud".
I feel like if someone persistently makes unsupported allegations of fraud, they should not be surprised if they get shown the door.
"In May of 2020, we performed a blind internal study[12] comparing our method against the latest version of two leading commercial autoplacers. Our method outperformed both, beating one 13 to 4 (with 3 ties) and the other 15 to 1 (with 4 ties). Unfortunately, standard licensing agreements with commercial vendors prohibit public comparison with their offerings."
[12] - "Our blind study compared RL to human experts and commercial autoplacers on 20 TPU blocks. First, the physical design engineer responsible for placing a given block ranked anonymized placements from each of the competing methods, evaluating purely on final QoR metrics with no knowledge of which method was used to generate each placement. Next, a panel of seven physical design experts reviewed each of the rankings and ties. The comparisons were unblinded only after completing both rounds of evaluation. The result was that the best placement was produced most often by RL, followed by human experts, followed by commercial autoplacers."
Something is off with your source references. Chatterjee's amended complaint is a legal document filed under penalty of perjury, accepted by a US judge, available publicly, apparently not "thrown out". How does an earlier document figure into this? How do we know it was "thrown out" and for what reason? Obviously, a later document is what matters.
Also, you are using an unreviewed document from Google not published in any conference to counter published papers with specific results, primarily the Cheng et al paper. Jeff Dean did like that paper, so he can take it up with the conference and convince them to unpublish it. If he can't, maybe he is wrong.
Perhaps, you are biased toward Google, but why do think we should trust a document that was neither peer-reviewed nor published at a conference?
His original complaint being dismissed matters because it suggests that he was fishing around for a complaint that was valid, and that perhaps his primary motivation was to get money out of Google.
Legal nitpick - you can get away with alleging pretty much whatever you want in a legal complaint. You can't even be sued for defamation if it turns out later you were lying.
Jeff Dean isn't saying that Cheng et al. should be unpublished; he's saying that they didn't run the method the same way. It is perfectly fine for someone to try changing the method and report what they found. What's not fine is to claim that this means that Google was lying in their study.
No, it doesn't suggest that. Complaints are often dismissed on technicalities or because they are written poorly.
Google claimed their new algorithm as a breakthrough. If this were the so, the algorithm would have helped design chips in many different cases. Now, the defense is that it only works for some inputs, and those inputs cannot be shared. This is not a serious defense and looks like a coverup.
The court case provides more details. Looks like the junior researchers and Jeff Dean teamed up and bullied Chatterjee and his team to prevent the fraud from being exposed. IIRC the NYT reported at the time that Chatterjee was fired within an hour of disclosing that he was going to report Jeff Dean to the Alphabet Board for misconduct.
He even ran a study internally (with Markov), but, as the AlphaChip authors describe:
In 2022, it was reviewed by an independent committee at Google, which determined that “the claims and conclusions in the draft are not scientifically backed by the experiments” [33] and “as the [AlphaChip] results on their original datasets were independently reproduced, this brought the [Markov et al.] RL results into question” [33]. We provided the committee with one-line scripts that generated significantly better RL results than those reported in Markov et al., outperforming their “stronger” simulated annealing baseline. We still do not know how Markov and his collaborators produced the numbers in their paper.
(https://arxiv.org/pdf/2411.10053)
It is indeed a big deal to hire people who will commit or contrive at fraud: academic, financial, or otherwise.
But the best (probably only) way to put downward pressure on that is via internal incentives, controls, and culture. You push hard enough for such percent per cadence with no upper bound and graduate the folks who reliably deliver it without checking if the win was there to begin with? This is scale-invariant: it could be in a pod, a department, a company, a hedge fund that owns much of those companies, a fund of those funds, the federal government.
Sooner or later your leadership is substantially penetrated by the unscrupulous. We see this in academia with the spate of scandals around publications. We see this in finance with, who can even count that high anymore. You see Holmes and SBF in prison but the folks they funded still at the apex of relevance and everyone from that clique? Everyone who didn’t just fall of a turnip truck knows has carried that ideology with them and has better lawyers now.
There’s an old saw that a “fish rots from the head”. We can’t look at every manner of shadiness and constant scandal from the iconic leaders of our STEM industry and say “good for them, they outsmarted the system” and expect any result other than a broad-spectrum attack on any honest, fair, equitable status quo.
We all voted with our feet (and I did my share of that too before I quit in disgust) for a “might makes right” quasi-religious system of ideals, known variously as Objectivism, Effective Altruism, and Capitalism (of which it is no kind). We shouldn’t be surprised that everything is kind of tarnished sticky now.
The answer today? I don’t know. Work for the less bad as opposed to more bad companies, speak out at least anonymously about abuses, listen to the leaders speak in interviews and scrutinize it. I’m open to suggestions.
I get why Jeff would be pressed to comment on this, given he's credited on basically all of "Google Brain" research output. But saying "they couldn't replicate it because they're idiots, therefore it's replicable" is not a rebuttal, just bullying. Sounds like the critics struck a nerve and there's no good way for him to refute the replication problem his research apparently exhibits.
> But saying "they couldn't replicate it because they're idiots, therefore it's replicable" is not a rebuttal, just bullying
That's not an argument made in the linked tweet. His claim is "they couldn't replicate it because they didn't follow the steps", which seems like a very reasonable claim, regardless of the motivation behind making it.
At the end of the day my question is simply why does anyone care about the drama over this one way or another?
Either the research is as much of a breakthrough as is claimed and Google is about to pull way ahead of all these other "idiots" who can't replicate their method even when it is described to them in detail, or the research is flawed and overblown and not as effective as claimed. This seems like exactly the sort of question the market will quickly decide over the next couple of years and not worth arguing over.
Why do a non-zero amount of people have seemingly religious beliefs about this topic on one side or the other?
The reason Jeff Dean cares is that his team's improvement compared to standard EDA tools was marginal at best and may have overfitted to a certain class of chips. Thus, he is defending his research because it is not widely accepted. Open source code has been out for years and in that time the EDA companies have largely done their own ML-based approaches that do not match his. He attributes this not to failings in his own research but to the detractors at these companies not giving it a fair chance.
The guys at EDA companies care because Google's result makes them look like idiots when you take the paper at face value, and does advance the state of the art a bit. They have been working hard for marginal improvements, and that some team of ML people can come in and make a big splash with something like this is offensive to them. Furthermore, the result is not that impressive and does not generalize enough to be useful to them (and competent teams at these companies absolutely have checked).
The fact that the result is so minor is the reason that this is so contentious.
The result is minor AND Google spent a (relative) lot of money to achieve it (especially in the eyes of the new CFO). Jeff Dean is desperately trying to save the prestige of the research (in a very insular, Google-y way) because he wants to save the 2017-era economically-not-viable blue sky culture where Tensorflow & the TPU flourished and the transformer was born. But the reality is that Google’s core businesses are under attack (anti-trust, Jedi Blue etc), the TPU now has zero chance versus NVidia, and Google is literally no longer growing ads. His financing is about to pop in the next 1-2 years.
What makes you say TPU has zero chance against growing NVIDIA?
If anything, now is the best time for TPU to grow and I'd say investing in TPU gave Google an edge. There is no other large scale LLM that was trained on anything but NVIDIA GPUs. Gemini is the only exception. Every big company is scrambling to make their own hardware in the AI era while Google already has it.
Everyone I know who worked with TPUs loves how well they scale. Sure Jax has a learning curve but it's not a problem, especially given the performance advantages it gives.
Besides the many CAPEX-vs-OPEX tradeoffs that are completely unavailable due to not being able to buy physical TPU pods, there are inherent Google-y risks e.g. risk of the TPU product and/or support getting killed or fragmented / deprecated (very very common with Google), your data & traffic must also be locked in to Google’s pricing, and you must indefinitely put up with / negotiate with Google Cloud people (in my experience at multiple companies: worst customer support ever).
Google does indeed lock in their own ROI with deciding to not compete with AMD / Graphcore etc, but that also rooflines their total market. If they were to come up with a compelling Android-based Jetson-like edge product, and if demand for said product eclipses total GPU demand (robotics explosion?) then they might have a ramp to compete with NVidia. But the USB TPUs and phone accelerators today are just toys. And toys go to the Google graveyard, because Googlers don’t build gardens they treat everything like toys and throw them away when they get bored.
> Why do a non-zero amount of people have seemingly religious beliefs about this topic on one side or the other?
Because lots of engineers are being told by managers "Why aren't we using that tool?" and a bunch of engineers are stuck saying "Because it doesn't actually work." aka "Google is lying through their teeth." to which the response is "Oh, so you know better than Google?" to which the reponse is "Yeah, actually, I fucking do. Now piss off and let me finish timing closure this goddamn block that is already 6 weeks late."
Now can you understand why this is a bit contentious?
Marketing "exaggerations" from authority can cause huge amounts of grief.
In my little corner of the world, I had to sit and defend against the lies that a startup with famous designers were putting out about power consumption while we were designing similar chips in the space. I had to go toe to toe with Senior VPs over it and I had to stand my ground and defend my team who analyzed things dead on. All this occurred in spite of the fact that they had no silicon. In addition, I knew the famous designers involved would happily lie straight to your face having worked with them before and having been lied straight to my face and having had to clean up the mess when they left the company.
To be fair, it is also the only time I have had a Senior VP remember the kerfuffle and apologize when said startup finally delivered silicon and not only were the real numbers not what they claimed they weren't even close to the ones we were getting.
And do you believe that that is what's happening in this case?
If you have personal experience with Jeff Dean et al that you're willing to share, I'd be interested in hearing about it.
From where I'm sitting it looks like, "Google spent a fortune on deep learning, and got a small but real win. People who don't like Google failed to follow Google's recipe and got a large and easily replicated loss."
It's not even clear that Google's approach is feasible right now for companies not named Google. It is not clear that it works on other classes of chip. It is not clear that the technique will grow beyond what Google already got. It is really not clear that anyone should be jumping on this.
But there is a world of difference between that, and concluding that Google is lying.
> From where I'm sitting it looks like, "Google spent a fortune on deep learning, and got a small but real win. People who don't like Google failed to follow Google's recipe and got a large and easily replicated loss."
From where I'm sitting it looks like Google cooked the books maximally, barely beat humans let alone state of the art algorithms, published a crappy article in Nature because it would never have passed editorial muster at something like DAC or an IEEE journal and now have to browbeat other people who are calling them out on it.
And that's the best interpretation we can cough up.
I'll go further, we don't even have any raw data that says that they actually did beat the humans. Some of the humans I know who run P&R are REALLY good at what they do. The data could be completely made up. Given how much scientific fraud has come out lately, I'm amazed at the number of people defending Google on this.
Where I'm from, we call what Google is doing both "lying" and "bullying".
Look, Google can easily defuse this in all manner of ways. Publish their raw data. Run things on testbenches and benchmarks that the EDA tools vendors have been running on for years. Run things on the open source VLSI designs that they sponsored.
What I suspect happened is that Google's AI group has gotten used to being able to make hyperbolic marketing claims which are difficult to verify. They then poked at place and route, failed, and published an article anyway because someone's promotion is tied to this. They expected that everybody would swallow their glop just like every other time, be mostly ignored and the people involved can get their promotions and move on.
Unfortunately, Google is shoveling bullshit around something that has objective answers; real money is at stake; and they're getting rightfully excoriated for it.
Look, either the follow-up article did pretraining or not. Jeff Dean is claiming that the importance of pretraining was mentioned 37 times and the follow-up didn't do it. That sounds easy to verify.
Likewise the importance of spending 20x as much money on the training portion seems easy to verify, and significant.
That they would fail to properly test against industry standard workbenches seems reasonable to me. This is a bunch of ML specialists who know nothing about chip design. Their background is beating everyone at Go and setting a new state of the art for protein folding, and not chip design. If you dismiss those particular past accomplishments as hyperbolic marketing, that's your decision. But you aren't going to find a lot of people in these parts who agree with you.
If you think that those were real, but that a bunch of more recent accomplishments are BS, I haven't been following closely enough to have an opinion. The stuff that crossed my radar since AlphaFold is mostly done at places like OpenAI, and not Google.
Regardless, the truth will out. And what Google is claiming for itself here really isn't all that impressive.
Reading those papers and looking at the code, it doesn't look easy. However, let's imagine that the Cheng et al team comes back with results for pretraining a few months from now, and they support the conclusions of their earlier paper. What should they do to help everyone reach a conclusion?
"If Cheng et al. had reached out to the corresponding authors of the Nature paper, we would have gladly helped them to correct these issues prior to publication" (https://arxiv.org/pdf/2411.10053)
That's how you actually do a reproduction study - you reach out to the corresponding authors and make sure you do everything exactly the same. But at this point, it's hard to imagine the AlphaChip folks having much patience with them.
> published a crappy article in Nature because it would never have passed editorial muster at something like DAC or an IEEE journal and now have to browbeat other people who are calling them out on it.
I don't think it's easier to get into DAC / an IEEE journal than Nature.
Their human baseline was the TPU physical design team, with access to the best available tools: rdcu.be/cmedX
and this is still the baseline to beat in order to get used in production, which has happened for multiple generations of TPU.
TPU is export controlled and super confidential -- multi-billion dollar IP! -- so I don't see raw data coming out anytime soon.
Nature papers get retracted every year. I have not heard of DAC papers retracted.
If the Nature paper made it clear that RL is not seriously expected to work on non-TPU chips, it would have have probably been rejected. If RL works on many other chips, then evidence should be easy to publish.
When Google published the Nature article, Nature included a rosy intro article by a leading expert in chip design. His name was Andrew Kahng, and he apparently liked Google at the time. But when he dug into Google code (released way after publication), he retracted his intro and co-authored the Cheng et al article. You see how your theory breaks down here.
As Andrew Kahng was one of the co-authors of Cheng et al., all of the issues with his reproduction still matter here. The Nature paper went through an investigation and second round of peer review.
AlphaChip is used to make real chips in production. Google publicly announced its use in multiple generations of TPUs and Axion CPUs, and MediaTek said they've built on it as well.
Pulling way ahead sounds sufficient, not necessary. Can we prove it's not the case? Let's say someone says that's why Gemini inference is so cheap. Can we show that's wrong?
> they couldn't replicate it because they're idiots
If they did not follow the steps to replicate (pre-training, using less compute, etc.) and then failed, so what's wrong with calling out the flaws in their attempted "replication"?
It's not a value judgement, just doesn't help his case at all. He'd need to counter the replication problem, but apparently that's not an option. Instead, he's making people who were unable to replicate it look bad, which actually strengthens their criticism.
I don't know how you rebut a flawed paper without making its authors look bad? That would be a general-purpose argument against criticizing papers.
Actually, people should criticize flawed papers. That's how science works! When you publish scientific papers, you should expect criticism if there's something that doesn't look right.
The only way to avoid that is to get critical feedback before publishing the paper, and it's not always possible, so then the scientific debate happens in public.
The situation here is different though.. If I'm making an existence claim by demonstrating a constructive argument and then being criticized for it, the most effective response to that critique would be a second, alternative construction, not attacking the critic's argument. After all, I'm the one claiming existence.. the burden of proof is on me, not my critics.
I don't know which argument is more constructive, though? Both teams reported what they did. They got different results. Figuring out why is the next step, and pointing out that they did different things seems useful.
Though, the broader question is how useful the results of the original paper are to other people who might do the same thing.
> But saying "they couldn't replicate it because they're idiots, therefore it's replicable" is not a rebuttal, just bullying.
> It's not a value judgement, just doesn't help his case at all.
Calling it "bullying" looks like a value judgment to me. Am I missing something?
To me, Dean's response is quite sensible, particularly given his claims the other papers made serious mistakes and have potential conflicts of interest.
I'm not saying "Bullying is bad and bullies are bad people", that would be a value judgement. I'm saying bullying is the strictly worse strategy for strengthening his paper's claims in this scenario. The better strategy would be to foster an environment in which people can easily replicate your claims.
Are you suggesting Dean take a different approach in his response? Are you saying it was already too late given the environment? (I’m also not sure I know what you mean by environment here.)
The tweet just says that the reproduction attempt didn't didn't actually follow the original methodology. There is no claim that the authors of the replication attempt were "idiots" or anything similar, you just made that up. The obviously fallacious logic in "they couldn't replicate it ..., therefore it's replicable" is also a total fabrication on your part.
A Google Nature Paper has not been replicated for over 3 years, but I'm the one fabricating stuff :D
Making a novel claim implies its *_claimed_ replicability.
"You did not follow the steps" is calling them idiots.
The only inference I made is that he's pressed to comment. He could have said nothing.. instead he's lashing out publicly, because other people were unable to replicate it. If there's no problem replicating the work, why hasn't that happend? Any other author would be worried if a publication about their work were saying "it's not replicable" and trying their best to help replicate it.. but somehow that doesn't apply to him.
We're actually talking about the difference between Cheng using 8 GPUs and 2 CPUs while Google used 16 GPUs and 40 CPUs. These are under-your-desk levels of resources. Cheng et al authors are all affiliated with UCSD which owns the Expanse supercomputer which is orders of magnitude larger than what you would need to reproduce the original work. Cheng et al does not explain why they used fewer resources.
But that was explicitly limited to 8 hours for all setups. Do they have another paper that shows that you can't increase the number of hours of a smaller GPU setup to compensate?
They also changed the ratio of RL experience collectors to GPU workers (~1/20th the RL experience collectors, 1/2 the GPUs). I don't know what impact that has --- maybe each GPU episode has less experience? Maybe that makes for an effectively small batch size and therefore more chaotic training? But either way, why change things when you can just match them exactly?
I love how he’s claiming bias due to his critic’s employer. As though working for Google has no conflicts? A company that is desperately hyping each and every “me too” AI development to juice the stock price?
Jeff drank so much kool aid he forget what water is.
He's criticizing Markov for not disclosing the conflict, not for the conflict itself. Hiding your affiliation in a scientific publication is far outside the norms of science, and they should be criticized for that. The publication we are discussing — "That Chip Has Sailed" — dismisses Markov in a few paragraph and spends the bulk its arguments on Cheng.
Markov's article has been on arxiv since 2023 before the perceived conflict.
https://arxiv.org/abs/2306.09633
All his affiliations have been disclosed and likely known to Jeff Dean. Jeff is simply unable to respond to Markov's article in the technical dimension and is resorting to dirty tricks instead.
"they couldn't replicate it because they're idiots, therefore it's replicable"
That's literally not what he says though. He says, "they didn't replicate it so their conclusions are invalid", which is a completely different thing than what you're accusing him of, and is valid.
"You didn't use enough compute in your reproduction of our methods" is kind of a funny criticism today. Well yeah, you're Google. Sorry guys, no reviewing of our methodology unless you own a cloud. It wouldn't surprise me if it's true that more compute, more pre-training etc. provides a lot of utility, but that does make it difficult to verify the work.
One interesting aspect of this though is vice-versa, whilst Google has oodles of compute, Synopsys has oodles of data to train on (if, and this is a massive if, they can get away with training on customer IP).
Chip designers run EDAtools on premises. How would EDA companies have access to customer data? Maybe for debugging purposes under NDA, but that won't allow training on customer IP.
I don't get. Why isn't the model open if it works? If it isn't this is just a fart in the wind. If it is the findings should be straightforward to replicate.
All these papers doing "research" on how to better prompt ChatGPT would be unpublishable then, given that API access to older models gets retired, so the findings of these papers can no longer be reproduced.
(I agree with you in principle; my example above is meant to show that standards for things such as reproducibility aren't easily defined. There are so many factors to consider.)
Well since you put "research" in quotes, I think you also agree that this type of work does not really belong in a quality journal with a high impact factor ;)
As far as I understand it, only kind of? It's open source, but in their paper they did a tonne of pre-training and whilst they've released a small pre-training checkpoint they haven't released the results of the pre-training they've done for their paper. So anyone reproducing this will innevitably be accused of failing to pretrain the model correctly?
I think the pre-trained checkpoint uses the same 20 TPU blocks as the original paper, but it probably isn't the exact-same checkpoint, as the paper itself is from 2020/2021.
"Settled" does not mean "Dean did nothing wrong". It means "Google paid the plaintiffs a lot of money so they'd stop saying publicly that Dean did something wrong", which is very different.
How the hell would you verify an AI-generated silicon design?
Like, for a CPU, you want to be sure it behaves properly for the given inputs. Anyone remember that floating point error in, was it Pentium IIs or Pentium IIIs?
I mean, I guess if the chip is designed for AI, and AIs are inherently nonguaranteed output/responses, then the AI chip design being nonguaranteed isn't any difference in nonguarantees.
> How the hell would you verify an AI-generated silicon design?
I think you're asking a different question, but in the context of the OP researchers are exploring AI for solving deterministic but intractable problems in the field of chip design and not generating designs end to end.
Here's an excerpt from the paper.
"The objective is to place
a netlist graph of macros (e.g., SRAMs) and standard cells
(logic gates, such as NAND, NOR, and XOR) onto a chip
canvas, such that power, performance, and area (PPA) are
optimized, while adhering to constraints on placement density and routing congestion (described in Sections 3.3.6
and 3.3.5). Despite decades of research on this problem,
it is still necessary for human experts to iterate for weeks
with the existing placement tools, in order to produce solutions that meet multi-faceted design criteria."
The hope is that Reinforcement Learning can find solutions to such complex optimization problems.
> Despite decades of research on this problem, it is still necessary for human experts to iterate for weeks with the existing placement tools, in order to produce solutions that meet multi-faceted design criteria.
Ironically, this sounds a lot like building a bot to play StarCraft, which is exactly what AlphaStar did. I had no idea that EDA layout is still so difficult and manual in 2024. This seems like a very worth area of research.
I am not an expert in AI/ML, but is the ultimate goal: Train on as many open source circuit designs as possible to build a base, then try to solve IC layouts problems via reinforcement learning, similar to AlphaStar. Finally, use the trained model to do inference during IC layout?
This is not a board where you put resistors, capacitors and ICs on a substrate.
These are chip layouts used for fabbing chips. I don't think you will find many open source designs.
EDAs works closely with foundries (TSMC, Samsung, GlobalFoundaries). This is the bleeding edge stuff to get the best performance for NVIdia or AMD or Intel.
As an individual, it's very hard and expensive to fab your chip (though there are companies that pool multiple designs).
A well working CPU is probably beside the point. What's important now is for researchers to publish papers using or speaking about AI. Then executives and managers to deploy AI in their companies. Then selling AI PC (somehow, we are already at this step). Whatever the results are. Customers issues will be solved by using more AI (think chatbots) until morale improves.
Sure, we want individuals to act in a way to mitigate collective action problems. But the collective action problem exists (by definition) because individuals are trapped in some variation of a prisoner's dilemma.
So, collective action problems are nearly a statistical certainty across a wide variety of situations. And yet we still "blame" individuals? We should know better.
> So you're saying Head of AI of Google of Jeff can't choose a better venue?
Phrasing it this way isn't useful. Talking about choice in the abstract doesn't help with a game-theoretic analysis. You need costs and benefits too.
There are many people who face something like a prisoner's dilemma (on Twitter, for example). We could assess the cost-benefit of a particular person leaving Twitter. We could even judge them according to some standards (ethical, rational, and so on). But why bother?...
...Think about major collective action failures. How often are they the result of just one person's decisions? How does "blaming" or "judging" an individual help make a situation better? This effort on blaming could be better spent elsewhere; such as understanding the system and finding leverage points.
There are cases where blaming/guilt can help, but only in the prospective sense: if a person knows they will be blamed and face consequences for an action, it will make that action more costly. This might be enough to deter than decision. But do you think this applies in the context of the "do I leave Twitter?" decision? I'd say very little, if at all.
Yes, but the game matrix is not that simple. There's a whole gamut of possible actions between defect and sleep with Elon.
Cross-posting to a Mastodon account is not that hard.
I look at this from two viewpoints. One is that it's good that he spends most of this time and energy doing research/management and not getting bogged down in culture war stuff. The other is that those who have all this power ought to wield it a tiny tiny bit more responsibly. (IMHO social influence of the elites/leaders/cool-kids are also among those leverage points you speak of.)
Also, I'm not blaming him. I don't think it's morally wrong to use X. (I think it's mentally harmful, but X is not unique in this. Though character limit does select for "no u" type messages.) I'm at best cynically musing about the claimed helplessness of Jeff Dean with regards to finding a forum.
The fact that the EDA companies are garbage in no way mitigates the fact that Google continues to peddle unsubstantiated snake oil.
This is easy to debunk from the Google side: release a tool. If you don't want to release a tool, then it's unsubstantiated and you don't get to publish. Simple.
That having been said:
1) None of these "AI" tools have yet demonstrated the ability to classify "This is datapath", "This is array logic", "This is random logic". This is the BIG win. And it won't just be a couple of percentage points in area or a couple of days saved when it works--it will be 25%+ in area and months in time.
2) Saving a couple of percentage points in random logic isn't impressive. If I have the compute power to run EDA tools with a couple of different random seeds, at least one run will likely be a couple percentage points better.
3) I really don't understand why they don't do stuff on analog/RF. The patterns are smaller and much better matches to the kind of reinforcement learning that current "AI" is suited for.
I put this snake oil in the same category as "financial advice"--if it worked, they wouldn't be sharing it and would simply be printing money by taking advantage of it.
> Google continues to peddle unsubstantiated snake oil
I read your comment, but I'm not following -- or maybe I disagree with it -- I'm not sure yet.
"Snake oil" is an emotionally loaded term that raises the temperature of the conversation. That usually makes having a conversation harder.
From my point of view, AlphaGo, AlphaZero, AlphaFold were significant achievements. Agree? Are you claiming that AlphaChip is not? Are you claiming they are perpetrating some kind of deception or exaggeration? Your numbered points seem like valid criticisms (I haven't evaluated them closely), but even if true, I don't see how they support your "snake oil" claim.
Really not sure how you’re conflating product demos which are known to be pie in the sky across the industry (not just Google) with peer reviewed research published in journals. Super basic distinction imho.
All of the highest impact papers authored by DeepMind and Google Brain have appeared in Nature, which is the gold standard for peer-reviewed natural science research. What exactly are you trying to claim about Google's peer-reviewed papers?
Nature is just as susceptible to the perverse incentives at play in the academic publishing market as anyone else, and has had their share of controversies over the years including having to retract papers after they were found to be bogus.
In and of itself, "Being published in a peer reviewed journal" does not place the contents of a paper beyond reproach or criticism.
From personal experience: in Nature Communications the handling editor and editor in chief absolutely do intervene, in my example to suppress a proper lit review that would have revealed the paper under review as much less innovative than claimed.
Well here’s one exaggeration that was pretty obvious to me straight away as a somewhat disinterested observer. In her status on X Anna Goldie says [1] “ AlphaChip was one of the first RL methods deployed to solve a real-world engineering problem”. This seems very clearly untrue- for example here’s a real-world engineering use of reinforcement learning by google AI themselves from 6 years ago [2] which if you use Anna Goldie’s own timeline is 2 years before alphachip.
That is definitely a cool project, but I don't see how it contradicts "one of the first RL methods deployed to solve a real-world engineering problem". "One of the first" does not mean literally the first ever.
Agreed but if someone at your own company did it two years before you in the context of something that recent it’s stretching credibility to say you were one of the first.
I mean, I think second is still "one of the first?" And, no offense to this project, but I don't know of it being used in a real industrial setting, whereas AlphaChip was used in TPU.
However it's hard to see how being provably 2 years behind the first even in your own company in an incredibly hot area that people are doing tons of work in makes you suddenly second. By that logic I might still be in time to claim the silver for the 100m at the Paris olympics if I pop over there in the next 18 months or so.
I can see you created this account just to comment on this thread so I'm sure you have more inside information than I do given that I'm really not connected to this in any way. Enjoy your work at Google Research. I think you guys do cool stuff. It's a shame in my opinion that you choose to damage your credibility by making (and defending) such obviously false claims rather than concentrating on the genuinely innovative work you have done advancing the field.
I think the paper was probably done honestly, but also very poorly. They claimed synthesis of 36 new materials. When reviewed, for 24/36 "the predicted structure has ordered cations but there is no evidence for order, and a known, disordered version of the compound exists". In fact, with other errors, 36/36 claims were doubtful. This reflects badly for authors and worse for peer review process of Nature.
/[01]{8,}/: I was hoping to have a conversation. This is why I asked questions. Any responses to them?
Looking up the thread, you can see the context. Many of us pushed back against vague claims that AlphaChip was "snake oil". Like good engineers, we split apart the problem into clearer concepts. The "snake oil" proponents did not offer compelling replies, did they? Instead, they retreated to irrelevant points that have no bearing on making sense of the "snake oil" claim.
Sometimes technical people forget to bring their "debugging" skills to bear on conversations. There is a metaphorical connection; good debuggers would disambiguate terms, decompose the problem, answer questions, find cruxes, synthesize, find clearer terms, generate alternative explanations, and so on.
> From my point of view, AlphaGo, AlphaZero, AlphaFold were significant achievements.
These things you mentioned had obvious benchmarks that were easily surpassed by the appropriate "AI". The evidence that they were better wasn't just significant, it was obvious.
This leaves the fact that with what appears to be maximal cooking of the books, the only thing AlphaChip seems to be able to beat is human, manual placement and not anything algorithmic--even from many, many generations ago.
Trying to pass that off as a significant "advance" in a "scientific publication" borders on scientific fraud and should definitely be called out.
The problem here is that I am certain that this is wired to the career trajectories of "Very Important People(tm)" and the fact that it essentially failed miserably is simply not politically allowed.
If they want to lie, they can do that in press releases. If they want published in something reputable, they should have to be able to provide proper evidence for replication.
And, if they can't do that, well, that's an answer itself, no?
These air quotes suggests the commenter above doesn't think the paper qualifies a scientific publication. Such a characterization is unfair.
When I read the Nature article titled "Addendum: A graph placement methodology for fast chip design" [1], I see writing that more than meets the bar for a scientific publication. For example:
> Since publication, we have open-sourced a software repository [21] to fully reproduce the methods described in our paper. External researchers can use this repository to pre-train on a variety of chip blocks and then apply the pre-trained model to new blocks, as was done in our original paper. As part of this addendum, we are also releasing a model checkpoint pre-trained on 20 TPU blocks [22]. For best results, however, we continue to recommend that developers pre-train on their own in-distribution blocks [18], and provide a tutorial on how to perform pre-training with our open-source repository [23].
[18]: Yue, S. et al. Scalability and generalization of circuit training for chip floorplanning. In Proc. 2022 International Symposium on Physical Design 65–70 (2022).
[21]: Guadarrama, S. et al. Circuit Training: an open-source framework for generating chip floor plans with distributed deep reinforcement learning. GitHub https://github.com/google-research/circuit_training (2021).
> Trying to pass that off as a significant "advance" in a "scientific publication" borders on scientific fraud and should definitely be called out.
If true, your stated concerns with the AlphaChip paper -- selective benchmarking and potential overselling of results - reflect poor scientific practice and possible intellectual dishonesty. This does not constitute scientific fraud, which occurs when the underlying method/experiment/rules are faked.
If the paper has issues with how it positions and contextualizes its contribution, criticism is warranted, sure. But don't confuse this with "scientific fraud".
Some context: for as long as benchmark suites have existed, people rightly comment on which benchmarks should be included and how they should be weighted.
There are benchmarks in this space. You can also bring your chip designs into the open and show what happens with different tools. You can run the algorithm on the placed designs that you sponsor for open source VLSI to show how much better they are.
None of this has been done. This is table stakes if you want to talk about your EDA algorithm advancement. If this weren't coming out of Google, everybody would laugh it out of the room (see what happened to a similar publication with similar claims from a Chinese source--everybody dismissed it out of hand--rightfully so even though that paper was MUCH better than anything Google has promulgated).
Extraordinary claims require extraordinary evidence. Nothing about AlphaChip even reaches ordinary evidence.
If they hadn't gotten a publication in Nature for effectively a failure, this would be way less contentious.
Can you stop with this pure appeal to authority. Publishing in nature is not proof it works. It's only proof the paper has packaged the claim it works semi well.
As Markov claims Nature did not follow their own policy. Since Google’s results are only on their designs, no one can replicate them. Nature is single blind, so they probably didn’t want to turn down Jeff Dean so that they wouldn’t lose future business from Google.
> if it worked, they wouldn't be sharing it and would simply be printing money by taking advantage of it.
Sure, there are some techniques in financial markets that are only valuable when they are not widely known. But claiming this pattern applies universally is incorrect.
Publishing a technique doesn't prove it doesn't work. (Stating it this way makes it fairly obvious.)
DeepMind, like many AI research labs, publish important and useful research. One might ask "is a lab leaving money off the table by publishing?". Perhaps a better question is "What 'game' is the lab playing and over what time scale?".
EDA companies are gatekeeping monopolies. They absolutely abuse their monopoly position to extract huge chunks of money out of companies, and are pretty much single-handedly responsible for the fact that the hardware startup ecosystem is moribund compared to that of the software startup ecosystem.
They have been horrible liars about performance and benchmarketing for decades. They dragged their feet miserably over releasing Linux versions of their software because they were extracting money based upon number of CPU licenses (everything was on Sparc which was vastly inferior). Their software hasn't really improved all that much over decades--mostly they benefited from Moore's Law. They have made a point of stifling attempts at interoperability and open data exchange. They have bought lots of competitors mostly to just shut them down. I can go on and on.
The EDA companies aren't quite Oracle--but they're not far off.
This is one of the reasons why Google is getting pounded over this--maybe even unfairly. People in the field are super sensitive about bullshit claims from EDA vendors--we've heard them all and been on the receiving end of the stick far too many times.
> The EDA companies aren't quite Oracle--but they're not far off.
Agreed with most you mentioned but not about EDA companies are not worst than Oracle, at least Oracle is still supporting popular and useful open source projects namely MySQL, Virtualbox, etc.
What open-source design software these EDA companies are supporting currently although most of their software originated from open source EDA software from UC Berkeley, etc?
and are pretty much single-handedly responsible for the fact that the hardware startup ecosystem is moribund
Yes but not single-handedly -- it's them and the foundries, hand-in-hand.
No startup can compete with Synopsys because TSMC doesn't give out the true design rules to anybody smaller than Apple for finfet processes. Essentially their DRC+LVS software has become a DRM-encoded version of the design rule manual.
> pretty much single-handedly responsible for the fact that the hardware startup ecosystem is moribund compared to that of the software startup ecosystem.
This was the case before EDA companies even appeared. Hardware is hard because it's manufacturing. You can't "iterate quickly", every iteration costs millions of dollars and so does every mistake.
Given infinite time and compute - maybe the approach is significantly better. But that’s just not practical. So unless you see dramatic shifts - no one is going to throw away proven results on your new approach because of the TTM penalty if it goes wrong.
The EDA industry is (has to be) ultra conservative.
Taping out a chip is an incredibly expensive (7-8 figure) fixed cost. If the chips that come out have too many bugs (say because your PD tools missed up some wiring for 1 in 10,000 blocks) then that money is gone. If you're Intel this is enough to make people doubt the health of your firm; if you're a startup, you're just done.
> if it worked, they wouldn't be sharing it and would simply be printing money by taking advantage of it.
This is a fallacious argument. A better chip design process does not eliminate all other risks like product-market fit or the upfront cost of making masks or chronic mismanagement.
> None of these "AI" tools have yet demonstrated the ability to classify "This is datapath", "This is array logic", "This is random logic".
Sounds like a good objective, one that could be added to training parameters. Or maybe it isn't needed (AI can 'understand' some concepts without explicitly tagging)
> If I have the compute power to run EDA tools with a couple of different random seeds, at least one run will likely be a couple percentage points better.
Then do it?! How long does it actually take to run? I know EDA tools creators are bad at some kinds of code optimization (and yes, it's hard) but let's say for a company like Intel, if it takes 10 days to rerun a chip to get 1% better, that sounds like a worthy tradeoff.
> I put this snake oil in the same category as "financial advice"--if it worked, they wouldn't be sharing it and would simply be printing money by taking advantage of it.
Yeah I don't think you understood the problem here. Good financial advice is about balancing risks and returns.
I’ve not followed this story at all, and have no idea what is true or not, but generally when people use a boatload of adjectives which serve no purpose but to skew opinion, I assume they are not being honest. Using certain words to describe a situation does not make the situation what the author is saying, and if it is as they say, then the actual content should speak for itself.
For instance:
> Much of this unfounded skepticism is driven by a deeply flawed non-peer-reviewed publication by Cheng et al. that claimed to replicate our approach but failed to follow our methodology in major ways. In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article),
This could easily be written more succinctly, and with less bias, as:
> Much of this skepticism is driven by a publication by Cheng et al. that claimed to replicate our approach but failed to follow our methodology in major ways. In particular the authors did no pre-training,
Calling the skepticism unfounded or deeply flawed does not make it so, and pointing out that a particular publication is not peer reviewed does not make its contents false. The authors would be better served by maintaining a more neutral tone rather than coming off accusatory and heavily biased.
If someone tries to run your method but messes it up, and then accuses you of fraud when the results don't match their expectations, I'm not sure they're entitled to a neutral tone in response.
Maybe you're right, and a more neutral tone would have been effective! I think it's just that Jeff is just really done with this.
In short, I think the Nature authors have made some reasonable criticisms regarding the training methodology employed by the ISPD authors, but the extreme compute cost and runtime of AlphaChip still makes it non-competitive with commercial autofloorplanners and AutoDMP. Regardless, I think the ISPD authors owe the Nature authors an even more rigorous study that addresses all their criticisms. Even if they just try to evaluate the pre-trained checkpoint that Google published, that would be a useful piece of data to add to the debate.