Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is that a fair reading of my comment? I used "2%" as shorthand for "an increase of two points of accuracy score".

When the starting accuracy is 5%, saying that it was improved by 0.4 is not very elucidating. The point is that the approach is still very bad at generating correct programs and any improvement to the already abysmal state of the art is tiny and possibly insignificant.

It's like the old joke: the USA president challenges the General Secretary of the USSR in a 100m race. The USA president finishes first. Next day, the Pravda goes out with the title: "100m race: General Secretary finishes second. American President finishes before last".



> The point is that the approach is still very bad at generating correct programs

It is current state of art approach. Nothing better is developed according to that benchmark. What exactly you are complaining about and proposing? Unpublish paper and results?


>> It is current state of art approach.

That's not right. The approach described in the article and the linked paper is not "state of the art". It beats one benchmark which has only been used to compare one kind of system: neural program synthesis systems (Codex, AlphaCode, GPT-j and the one in the article, CodeRL). It almost completely fails to acknowledge the wider field of program synthesis that has existed long before that benchmark. The benchmark is ad hoc and arbitrary and it tells us nothing about the general capability of the compared systems, except that they are still very bad at that particular benchmark.

I linked the Gulwani report above because it is a good, recent introduction to the field of program synthesis research which is still dominated by non-neural approaches and for very good reasons. I linked to another paper that shows an example of a system of the non-neural kind that is common in program synthesis.

I pointed out that regardless of the one and only benchmark on which the LLM-based systems have been tested that are listed in the article above, the approach of code generation by LLM is primitive compared to the sophisticated search and testing strategies developed by the wider prorgam sythesis community over the years. Again, see Gulwani at al. for examples.

The proposed approach is also primitive compared to existing, earlier neural program synthesis approaches, i.e. program synthesis approaches that do use a neural network, but not an LLM, for example DreamCoder [https://arxiv.org/abs/2006.08381] or everything from Dawn Song's group [https://sunblaze-ucb.github.io/program-synthesis/index.html] - none of which I'm affiliated with in any way, btw.

If you're looking for the state of the art in program synthesis, look elsewhere than LLM-based systems like the one in the article above. Even if you're looking for the state of the art in neural program synthesis, look elsewhere. What is described in the article above is a first, faltering step in a direction that may still yield some good results in the future. Or it may not. But it's nothing impressive for the time being.

>> Unpublish paper and results?

Gulwani et al is a technical report but it's a staple reference in the field, and an easy introduction to outsiders. As far as I can tell, neither AlphaCode, nor CodeRL (the Salesforce model described in the article) have been the subject of published work. The article above is linking to an arxiv preprint.


> It beats one benchmark which has only been used to compare one kind of system: neural program synthesis systems (Codex, AlphaCode, GPT-j and the one in the article, CodeRL).

yes, that's what they call "state of the art", winning one benchmark is enough to call result SOTA.

> The benchmark is ad hoc and arbitrary and it tells us nothing about the general capability of the compared systems, except that they are still very bad at that particular benchmark.

so, what benchmark is better in your opinion, why and what systems demonstrate strong results there?

> Gulwani et al is a technical report but it's a staple reference in the field, and an easy introduction to outsider

sorry, I don't understand why you keep referencing on that report. It looks 5 years old, thus outdated. What value it has in your opinion exactly?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: