Hacker News new | past | comments | ask | show | jobs | submit login
Introduction to Program Synthesis (csail.mit.edu)
79 points by squircle 2 days ago | hide | past | favorite | 33 comments





https://evanthebouncy.github.io/program-synthesis-minimal/

Here's my take on it, you can view it as a modern extension to Armando's work (he's my PhD advisor)


For practical systems, to improve synthesis/codegen, do you think working on the compile/eval toolchain is a good direction ?

For example, if we can cache/re-use the compiler run on previous codegen candidates, to speed up compilation of the next candidate sniplet


eval is crucial, as that's the only way to agree with the computer what it is doing matches what you want it to do.

I wouldn't worry about speeds. We should expect compute and inference to be faster in the future, where we can sample easily 10k programs in a second, check all of them against test cases.

I'd worry about communicating precisely to computers when test cases are awkward to write.


This is a very classic view, and while interesting, IMO the modern take doesn't start till lecture 22. I would expect a reformulation of the lecture series by now. As is, it reads as if an AI course does 21 lectures on Bayesian methods and then ends with a couple on neural networks: Bayesian tricks are neat, but not how I'd structure a general course. Cool but... I wonder what an update would look like.

(I overlapped in the same synthesis research lab as the author by a few years, and currently do LLMs for code synthesis in louie.ai... and much of the coursework was from around those Berkeley years and some MIT ones 10-20 years ago afaict. A lot of foundational work has happened since then.).


The classical synthesis approaches seem to have much more emphasis on correctness and specification than modern LLM-based synthesis though — things like deriving provably correct lock-free data structures.

The LLM-based synthesis work I've seen, in contrast, maybe uses a set of unit tests for correctness testing.

It doesn't feel like modern LLM-based synthesis supplants the classical approaches.


If you think about classical synthesis as fancy tricks around optimizing SAT encodings & search, neural synthesis becomes about how to solve np-hard problems there instantly. There are other perspectives as well, but that's already a dual LLM paper for a bunch of individual classical papers, without losing safety. Ex: leapfrog the concolic execution papers via a fast oracle.

Likewise, a lot of cool work is about automating the proof theory much better. Terrance Tao has been going down that rabbithole, which is amazing, and I hope the security community does too. I know Dawn Song, one of the big names here, is.

Another big area is making programming more accessible by assuming synthesizers, and language design decisions around that. A lot of that work has been shifting bc LLMs too. Ex: Historically PLs had zero domain understanding, just logic, and that has really flipped since GPT4, so the concept of a DSL is now also flipping.

Finally.. a lot of neural synthesis can and does entirely ignore the SAT aspect. There is a lot of depth to it on its own, and amazing results.


> about how to solve np-hard problems there instantly. ... without losing safety. Ex: leapfrog the concolic execution papers via a fast oracle

Just want to point out to unknowing readers: none of this is true and the author of these comments is "talking his book".

Source: do I really need to explain how LLMs do not solve NP-hard problems "instantly"?


Most of the papers in the syllabus are about the SAT/SMT-based solver era of program synthesis, and neural synthesis is popularly - with open benchmarks - succeeding on problems those failed to gain traction on.

The research question is shifting to how to either bridge the worlds (better together) or leave the old one. That doesn't make SMT era papers wrong, but as with Bayesian vs NN, relatively impractical or otherwise irrelevant for most people until a major breakthrough happens (if ever). Ex: Agentic loops are interesting from a direct reuse of classical synthesis results perspective, and I know big labs are experimenting with different methods.

I'm transparent that I was part of the old world, and while synthesis was a design hope for our startup (I was publicly demoing SOTA here 10 years ago!), it was never to the quality we could invest in commercializing those school of methods. When GPT4 hit, we could finally add synthesis-based methods that ship. I get the impression you are part of the older SMT world and reject modern advances, and would wonder the reverse for you. It's a bit galling to be accused of voting with my feet - and shipping - after patiently waiting 10 years as an invalidating thing instead of taking that as a proof point that maybe there is something going on.


Have a few papers on hand to start reading about these?

Edit: I'd be curious which other papers/authors folks would add who are doing interesting work here.

One starting point is to look at great synthesis, mechanized proofs, and PL folks who have been exploring neural synthesis, and the papers they write + cite:

* Arjun guha: starcoder llm, llm type inference, etc

* Sumit Gulwani & Rishabh Singh (whose pre-llm work is on the syllabus)

* Nikhil Swamy (f*), Sorin Lerner (coq)

* A practical area that has a well-funded intersection because of urgency is security, such as niches like smart contracts, and broader bug finding & repair. Eg, simpler applied methods by Trail of Bits, and any academics they cite. Same thing for DARPA challenge participants: they likely have relevant non-DARPA papers.

* I've been curious about the assisted proof work community Terence Tao has fallen into here as well

* Edit: There are a bunch of Devin-like teams (ex: Princeton team markets a lot, ...), and while interesting empirically, they generally are not as principaled, which is the topic here, so I'm not listing. A lot of OOPSLA, MSR, etc papers are doing incremental work here afaict, so work to sort out

An important distinction for me is whether it is an agentic system (which is much of it and can more easily use heavier & classic methods), training an existing architecture (often a stepping stone to more interesting work), or making a novel architecture (via traditional NLP+NN tricks vs synthesis-informed ones). Most practical results today have been clever agentic, better training sets, and basic NLP-flavored arch tweaks. I suspect that is because those are easiest and early days/years, and thus look more for diversity of explorations right now vs deep in any one track.


Hi! I was the TA for this course in 2020, and did my Ph. D. under the professor. Ask me anything.

Thanks for offering your time!

1. How do you/y'all feel about the ethics of publishing this kind of detailed instructional material for self-replicating machines on the open internet, given the potential risk of behavioral singularities from as-of-yet-uncharted intelligence explosion dynamics? I mean this question in the most friendly, least “gotcha” way possible — feel free to answer flippantly!

2. What would you say was the “frame problem” of this work that blocked progress, either recently or in your historical readings? I’m just starting to examine this literature after intentionally not engaging while designing my own systems, and it seems pretty damn robust in a post-frame-problem world. What might I be missing out of naïveté?


1. I've been a doomer since before the term was invented, and I never had any worry about synthesis research based on symbolic techniques. Gödel is our friend here: they're not powerful enough to model themselves. The best you get is stuff like https://ieeexplore.ieee.org/abstract/document/7886678?casa_t... . An important component of the Sketch synthesizer is its simplifier, and this paper used Sketch to synthesize a better simplifier, and that speeds up synthesis times by 15%-60%. But that's a one-time improvement. The exponential take off comes from somewhere else.

2. Let me see if I understand your question: I think you're saying "The frame problem of AI and philosophy, https://plato.stanford.edu/entries/frame-problem/ , was the major blocker in AI systems, and now it's been solved by deep learning. What is the corresponding problem in program synthesis?" I think your "pretty damn robust" comment is important, but haven't yet figured out what you mean.

Part of the reason I found this confusing is that the literal frame problem also appears in PL and synthesis, primarily in the verification of heap-based imperative programs. See the "frame rule" of separation logic.

So anyway, what do I see as the big challenges stopping synthesis from reaching its dream of being able to generate well-designed large programs with little input?

My two answers:

First, there's the problem of finding good abstractions. You see this in very old work:trying to prove an imperative program works is easy to automate EXCEPT for the problem of finding loop invariants. This has remained a challenging problem in the 50 years since it was first described. Craig interpolation is the classic solution, but it only goes so far. The synthesis view is: figuring out how to write loops is not straightforward because you need to summarize infinitely possible states. And that's loops; designing data structures and finding good invariants for stateful applications is harder still.

Second is what I like to call the "semantic veil." Generally, for any given chunk of code, there are multiple goals and abstractions that yield the same chunk of code. See my analogy to dairy-free chocolate chips in https://www.pathsensitive.com/2023/03/modules-matter-most-fo... . So deducing the intentions of a piece of code from just the code itself is literally impossible. Unfortunately, the way humans design programs is much better understood as being about these higher-level organizational structures, and not about the code itself. This is an area where I'm excited about the potential of LLMs: they can make excellent use of the information in priors and in natural language to guess said intentions.


Hi @Darmami, what have you been working on in the meanwhile? Given the current state of programming and compsci research, what interests you these days? (What do you see as most important 5 to 10 years down the road?) As a layperson I don't have time to grok most of this but revisit from time to time.

Hi Squircle!

I mostly left academia after graduating, and am not actively following current research, although I still live in Cambridge and attend a fair number of research events. I can say that deep learning has taken over PL research, just like everything else. A good number of PL and synthesis people have shifted heavily into pure deep learning, and a large proportion of those that haven't are working either on applying deep learning and LLMs to solve PL problems, fusing PL and ML techniques (aka "neurosymbolic programming" -- the Scallop project from Mayur Naik's group is particularly exciting to me), or finding ways to use PL techniques to solve ML problems. For an example of the latter, I just flipped through this year's PLDI papers, and one caught my eye that sounds like it has nothing to do with AI, "Hashing Modulo Context-Sensitive Alpha Equivalence." (Decoding the jargon, that means "How to deal with an enormous set of programs that contain lambdas" -- something that comes up when doing search-based synthesis and superoptimization.) Its abstract ends: "We have employed the algorithm to obtain a large-scale, densely packed, interconnected graph of mathematical knowledge from the Coq proof assistant for machine learning purposes."

I think PL techniques do provide the key to overcoming a lot of the problems with LLMs. You want your LLM to have better correctness, reasoning, and goal-directed search -- researchers in programming languages and formal methods are expert at this. I am here to some extent conflating PL/FM techniques with traditional automated reasoning, but I think that's a fine conflation to make, because such techniques have been incubated by the PL/FM communities for the past 30 years since they were sidelined by AI. Case in point: very large fraction of computational logic and automated reasoning papers are motivated by problems in program analysis, verification, and synthesis.. https://easychair.org/smart-program/FLoC2022/index.html

As for what I have been up to: Trained a few hundred more software engineers, mirdin.com . While I mostly stayed away from AI stuff during my Ph. D., I've given in to the tides, and am now building a startup using my AI, PL, and pedagogy expertise to solve all problems related to codebase-learning (onboarding new hires, changing teams faster, etc).

At the very end of my Ph. D., I discovered a new program synthesis technique based on constrained tree automata, and used it to build a synthesizer which is 8x more performant on one benchmark than the previous SOTA while using 10x less code. https://www.jameskoppel.com/files/papers/ecta.pdf . So the research I've done since graduating has largely been follow-ups to that. See https://www.computer.org/csdl/proceedings-article/icst/2023/... , https://pldi22.sigplan.org/details/egraphs-2022-papers/4/E-G... . I'm currently on two collaborations. One is continuing to develop algorithms for new kinds of constrained tree automata that can synthesize more kinds of programs. The other is an outgrowth of my startup: an empirical study on existing tools for codebase learning.

Anyway, that's not a comprehensive answer on what to watch out for in the field, which I am not presently qualified to give if I ever was, but it's the things that have my attention.

Oh, but: watch Isil Dillig. Everything that comes out of her lab is good.


Thank you for your thoughtful reply @Darmani. Best of luck with your current and future endeavors!

For practical systems, do you think improving the compiler toolchain is a good direction ?

For example, speeding up compilation of 100s of similar code candidates, doing the same for eval, etc


That's very relevant to generate-and-test program search.

The SBSE (search-based software engineering) and APR (automated program repair) communities do that kind of stuff. E.g.: There's a paper from that world, which I can't presently find, on "super mutants": basically, use environmental variables as feature flags so that you can compile 100 variants at once. Back when I worked in this space, one of the best things I did was run the Java compiler from within nailgun (program that saves JVM startup time) to compile mutants faster.

Symbolic program synthesis doesn't work like this. They effectively has its own compiler toolchain that transform programs into constraint systems and/or do partial evaluation of programs with wholes. Most of them never run a traditional compiler; they just guarantee that they output correct code.


OK!

I work in financial services, a lot of customers want tools to support their creaking code bases.

What are the state of the art options, both commercial and open source?

What are the prospects for things like translating COBOL to Python or Java?


Supporting an old codebase is a pretty broad question. There are quite a few companies supporting COBOL.

But for migrations, and really anything interesting or unique, I recommend Semantic Designs. http://www.semdesigns.com/ . Past customers include ADP, the Social Security Administration, the Bank of Australia/New Zealand, and Goldman Sachs Australia.

They do everything, but COBOL migrations are their bread-and-butter. There are hundreds of COBOL dialects out there, and they have no trouble supporting all of them.

I did interned for them in 2016, and the CEO served on my thesis committee. But I'm not saying this because I worked there; I worked there because I thought they had the best tech.


This is super interesting, thank you.

But, looking at semdesign's site and info I have a question - which is if this tech exists since 2010 or so... why are all the big banks not using it? Is it just bad marketing or is there a gotcha? They are all whining about their cobol problem, this could be a fix......

Does this mean that the market is not what folks imagine?


Oh, some of the big banks are.

But: Au contraire. It's existed much longer than that.

So you've heard of products called vitamins and painkillers?

The CEO of Semantic Designs, Ira Baxter, calls the kinds of migrations and mass changes they sell "heart surgery."

As in: "How many people go to the doctor and say 'Doctor, I think I need some heart surgery?' No one."

A very common thing that happens is that a company starts a project to upgrade some dying system and gets cold feet in the middle and then scraps the project. "We survived last year, we're surviving this year, we'll survive next year." Sometimes it takes extreme pressure, like the compiler for the original language no longer existing, before they migrate.

(And it's not just the really big changes. A friend of mine, who founded another program-transformation startup, discovered this when he landed a deal using his tool to upgrade some company from PHP 4 to PHP 5. They decided that they needed way more code review than the kinds of trivial changes being made actually should require, decided not to do it, and then wound up not doing the upgrade.)

When Ira begins talking to a new customer, he asks two questions:

1. How long have you been in your job? 2. Have you tried to do this kind of thing before?

If the answer to #1 is too long, then he knows that either the manager he's talking too is risk-averse and complacent, or that he'll be promoted to his next job in the middle of the project. To make a good deal, he needs his counterpart to be someone who wants this kind of upgrade to be their big initiative that will help them rise.

If the answer to #2 is no, then he'll know that eventually talks will break down and then they'll try to do it manually or in-house by underestimating how hard it is. For their most famous project, the B-2 Stealth Bomber migration, Northrop Grumman did exactly that -- came to Semantic Designs after a failed manual migration and a failed in-house tool development effort.

So, yeah -- it's a very tough market. The buyers don't behave like you think they should.

I wrote about this some in an appendix to pathsensitive.com/2023/09/its-time-for-painkillers-vitamins-die.html


Thanks - this is extremely insightful and helpful.

(2018) Discussion on HN in 2021 (132 points) https://news.ycombinator.com/item?id=28099928

Nice. Thanks @gnabgib

Agree with the other commenter the new approaches are mostly neurosymbolic and based on DNNs vs the approaches given here. FWIW I think this is one of the critical pieces to AGI and I know many of the large AI labs are working on integrating these approaches in an agentic way with the LLMs they’ve trained. I expect this will dominate at least the next 12-24 months of research. There are also a few AI startups out there also working on prototyping these ideas.

It warms my heart to see program generation getting some light.

If you are into this then follow me here. I'm working on a program synthesis project in my spare time, a fusion of PG and LLMs.

https://youtu.be/sqvHjXfbI8o?si=rg6EnqkHtUjs1Cki&t=423


Everything relevant in "program synthesis" moved to the new buzzword "codegen"

Depends on the extent to which you understand the Silicon Valley venture capitalism era as a permanent evolution, as opposed to an unsustainable trend!

In my nonacademic opinion, codegen is way, way less meaningful and powerful of a perspective to take on the problem than Program Synthesis. That said, I’m curious - are you mostly referencing commercial and OSS work with this comment, or does this also include institutional textbooks and tutorials and such?

FWIW, Program Synthesis is still much more popular in print according to Google Ngram - tho it’s losing ground fast!

https://i.imgur.com/bhDPzho.jpeg

Is there a similar tool that would accurately measure usage in blogs, manuals, and github pages, I wonder?


Has LLM shaken up this field?

yes

Could you give an example of an apples-to-apples comparison between (symbolic) program synthesis and LLMs that the latter wins?

The reason I am asking is because, in my experience, LLMs never match the 100% precision (relative to the specification that is synthesised against), which program synthesis effortlessly matches.

On the apples-to-apples comparisons, LLMs often only obtain 70% - 80% precision, which means they are effectively solving a simpler synthesis problem. (Yes they have other benefits, like being able to extract knowledge from natural language, which conventional synthesis can't do, but that's offen not needed.)


Do you have a link to literature review?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: