Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know if I have missed the big thing here (was supposed to do exactly the same flowery thing described there for crystals around 2019-2020), but the graphic with the Autoencoder is roughly what people did in 2018 (Gomez-Bombarelli, https://pubs.acs.org/doi/10.1021/acscentsci.7b00572), I think the review cited reproduced this. Also notice: it has the MLP in the middle, the performance of which was/is not really helping either - especially if your model should actually produce novel stuff, e.g. extrapolate.

Finally: every kid can draw up novel structures. Then: how do you actually fabricate these (in the case of real novel chemistry and not some building-block stuff). Noone has a clue!

I for myself have decided that for now (with the data at hand and non-Alphafold-budgets) the 2 keys areas, where you can actually help computational chemistry are:

- creating really robust and generally applicable ML-MD-potentials, potentially using graphs https://arxiv.org/abs/2106.08903 (or a traditional approach: https://www.nature.com/articles/s41467-020-20427-2). Facebook is also working in this area: https://pubs.acs.org/doi/10.1021/acscatal.0c04525

- and approximating exchange correlation functionals (... Google and some guys at Oxford, which got stomped over by the deepmind-PR machine https://arxiv.org/pdf/2102.04229.pdf): https://www.science.org/doi/10.1126/science.abj6511

If anyone can tell me how those generative models spit out graphs which look like reality (actually this is imho part of AlphaFold), wake me up.



> every kid can draw up novel structures. Then: how do you actually fabricate these (in the case of real novel chemistry and not some building-block stuff). Noone has a clue!

Yep. I worked at a biotech startup in the early/mid 2000s.

We had a 2-pronged approach to finding small molecule drugs: 1) traditional medicinal chemistry based on simple SAR (structure-activity relationships) and 2) predictive modeling (before ML was hot).

The traditional med chemists were, in my opinion, rightfully skeptical of the suggestions coming out of the predictive modeling group ("That's a great suggestion, but can you tell me how to synthesize it?").

As one of my co-workers said to me: "The predictions made by the modeling group range from pretty bad to ... completely worthless."

It's possible that things have gotten better, though, as I haven't done that type of work since about 2008.


It's gotten better. The future looks like generative or screening models that feed structures to ADMET/retrosynthesis models which close a feedback loop by penalizing/rewarding the first-pass models. Then there's machine-human pair design flows that are really promising, basically ChemDraw with the aforementioned models providing real-time feedback and suggestions. It wouldn't surprise me if in a decade you could seed generative models with natural language, receive a few structures, alter them in a GUI with feedback, and then publish a retrosynthetic proposal + ADMET estimations with the candidate structure.

With high-throughput screening and automation, even small/medium-sized players can start building internal databanks for multi-objective models.


> The traditional med chemists were, in my opinion, rightfully skeptical of the suggestions coming out of the predictive modeling group ("That's a great suggestion, but can you tell me how to synthesize it?").

Start by plugging it into askcos.mit.edu/retro/ then, do your job?

> As one of my co-workers said to me: "The predictions made by the modeling group range from pretty bad to ... completely worthless."

Workers feeling threatened by technology think the technology is bad or worthless, news at 11.


Chemical synthesis is much more than just retrosynthesis. Even if it was, you can not just plug in the novel previously unsynthesized molecule and expect any reasonable results. Furthermore, the tool is based on Reaxys which in turn uses already established reaction routes and conditions from literature and patents. Good luck optimizing yields for something you don't know even how to synthesize, let alone what conditions to use.


> Chemical synthesis is much more than just retrosynthesis.

Of course, but we are talking about chemical discovery - after which you want to test if the compound theoretical capabilities work on cells. Yields are not yet a concern!

> expect any reasonable results

No it won't do all the work, but it will direction, and suggestion for which pathways could be used.


> Workers feeling threatened by technology think the technology is bad or worthless, news at 11.

I appreciate the sentiment and I think it's understandable to think that. In this particular case, however, my co-worker was one of the smartest / most talented people I've worked with. I can assure you that he did not feel threatened in any way. His comment was sardonic, but not borne of insecurity.

To be fair, the members of the modeling group were also quite talented. They were largely derived from one of the more famous physical/chemical modeling groups at one of the HYPS schools. But even they acknowledged that on a good day, the best they could do was offer suggestions / ideas to the medicinal chemists.

In fact, one of the members of the modeling group said this to me once (paraphrasing): The medicinal chemists are the high-priests of drug discovery. We can help, but they run the show.

As mentioned by someone who responded to my original comment, the usefulness of ML/modeling has likely gotten much better over the past 10 - 15 years.


> Finally: every kid can draw up novel structures. Then: how do you actually fabricate these (in the case of real novel chemistry and not some building-block stuff). Noone has a clue!

I personally have a clue, and the entire field of organic chemistry has a clue, given enough time and money most reasonable structures can be synthesized (and QED+SAScore+etc and then human filter is often enough to weed out the problem compounds that will be unstable or hard to make). Actually even some of the state of the art synthesis prediction models are able to predict decent routes if the compounds are relatively simple [0]. The issue is that in silico activity/property prediction is often not reliable enough for the effort to design and execute a synthesis to be worth it, especially because as typically the molecules will get more dissimilar to known compounds with the given activity, the predictions will also often become less reliable. In the end, what would happen is that you just spend 3 months of your master student's time on a pharmacological dead end. Conversely, some of the "novel predictions" of ML pipelines includign de novo structure generation can be very close to known molecules, which makes the measured activity to be somewhat of a triviality.[1] For this reason, it makes sense to spend the budget on building block-based "make on demand" structures that will have 90% fulfillment, that will take 1-2 months from placed order to compound in hand and that will be significantly cheaper per compound, because you can iterate faster. Recent work around large scale docking has shown that this approach seems to work decently for well behaved systems.[2] On the other hand, some truly novel frameworks are not available via the building block approach, which can also be important for IP.

More fundamentally, of course you are correct, and I agree with you: having a lot of structures is in itself not that useful. Getting closer to physically more meaningful and fundamental processes and speeding them up to the extent possible can generate way more transparent reliable activity and novelty.

[0] https://www.sciencedirect.com/science/article/pii/S245192941... [1] http://www.drugdiscovery.net/2019/09/03/so-did-ai-just-disco... [2] https://www.nature.com/articles/s41586-021-04175-x.pdf


There's a lot that can be learned with building-block based experiments. If you do a building block based experiment then train a model, then predict new compounds, the models do generalize meaningfully outside the original set of building blocks into other sets of building blocks (including variations on different ways of linking the building blocks). Granted that's not the "fully novel scaffold" test, however it suggests that there should be some positive predictive value on novel scaffolds.

We've done work in this area and will be publishing some results later in the year.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: