This is dangerous. Instead of acquiring understanding, a tool like this lets the programmer just write the code which passes tests. Without the deep understanding of algorithm you can't write sufficiently full tests which assess program's correctness. TDD wouldn't save here, because instead of being a tool of discipline it becomes a list of instructions for codegen.
Would anyone want to ride in a car or fly in a plane written by set of sporadic tests? I wouldn't risk it.
Of course nobody should write production software with this, that's why the project loudly says that this is a PROTOTYPE. It is experimental software.
These kinds of tools are the future IMO. It's a higher layer of abstraction. Most programmers have no idea how their programs are actually executed--we write in a high-level language and rely on voodoo to execute it, without a "deep understanding" of how it's really executed, and without writing instruction-level tests. And it works great. I don't see how this is theoretically different, it's just a higher level of abstraction. Of course, its utility greatly depends on the practicality of its implementation (which includes performance)...
> it becomes a list of instructions for codegen
That describes exactly the everyday programming languages that we all use.
Fun fact: when you board on a plane, you have no idea how good its hardware, software and test suite is beyond common “it didn’t crash n-thousand times before” knowledge.
Given how large those systems are, how much layers (OS) of code there are, and the turn-over in software companies, I am pretty confident that no single engineer know everything about.
"engineers" - it's been decades since anyone on the planet could know everything about a whole system, but each engineer knows everything about their subsystem
Which is why I won't fly on certain airlines :-).
As a Software Engineer who worked at an avionics company, I know who our big customers were and what bugs are in the products they bought, so I avoid them whenever possible.
most hardware and plenty of software for that hardware is sourced all around the world...engineers involved make assumptions based on how that stuff should behave, perhaps even run it through tests of their own...but they hardly know _everything about it_
You might be surprised by how much safety critical software is built by code generation tools like SCADE. It's not entirely the same thing, but the programmer is pretty removed from the final product and it is very spec driven.
As a TDD, I really don't care too much about the implementation as long as my tests pass. Performance however is another issue, but that's something that can be optimized and re-written into something like this depending on the language.
I've felt for awhile that it's a matter of time before we end up with a development environment that - given a collection of tests - will automatically generate a program which passes all the tests. Barliman looks like a really cool step in that direction, even with its stated limitations.
For some reason the first thing that popped into my head when I read this was fuzzers like American Fuzzy Lopp. I wonder if you could use the same sort of stochastic exploration to explore the space of possible programs satisfying a set of tests rather than explore the space of possible inputs given a program?
That's called program synthesis which is a hot research area. Specifically, what you mention is called "programming by example". I can't remember many examples on top of my head but some people from Microsoft Research are working on it for example: https://microsoft.github.io/prose/
I can't Google the name right now, but this does exist as an optimization technique. Just start with any program, create random permutations, score them for correctness and performance, keep the best ones, repeat.
The problem is that this is incredibly computation intense, since the number of possible programs is huge. Right now it's viable for improving small code parts with a know-correct starting point. Maybe some day computers become fast enough to make more viable.
Genetic Programming is a generalization of that. The "collection of tests" is effectively the fitness function.
The challenge is that 1) doing it for non-trivial functions and getting results faster than much simpler generative methods that depends on heuristics is a really hard problem (but can work better when we don't have reasonable heuristics); 2) writing exhaustive tests for a lot of the problems we care about is likely to take more effort than writing the code in the first place.
I think if you want something like this the effort is best expended on tools that help you create a consistent, concise and exhaustive model / test-suite rather than code to implement it, with a focus on making it possible for a human to read and sanity check the generated model.
In this case, the tool is basically trying to create a model that matches the tests, it's just that it never makes the model explicit other than in the form of finished code, which prevents us from verifying that the model is correct other than by inspecting the code and/or expanding the tests.
For small functions that might be helpful, but for larger pieces of code, I think it is likely that generating the code directly is likely to lead to code that is near impenetrable and impossible to validate expanding the test suite. E.g. a recurring problem of research in genetic programming has been that a lot of the resulting solutions are hard to understand even very small/simple algorithms. And for bigger problems it's not unusual to end up exposing weaknesses of the fitness function rather than solving the intended problem.
It's still an interesting project. I just think we're really far from having something with wider appeal.
More generally it's just an optimisation (the mathematical kind) where your parameters are essentially a function body. In that article the function body is a neural network, but there's no reason it has to be.
Prolog can (sometimes) "reverse execute" a specification, not generalize from examples. It's like saying "a cat has a round face and pointy ears" versus "look at these cat pictures". (Versus "draw an oval and then draw two triangles at the top...")
I wonder how far they went with prolog. I remember examples in books talking about solving a problem and outputing a plan. But it wasn't a program, not in the sense of nested/modular systems, more like a linear walk.
That said, webyrd has shown kanren embedded lambda calc (evalo relation) to find which program would be reduced to some value..
No, Prolog is nothing like that. Prolog is perhaps the sort of paradigm that could be used as one part of an implementation of something like that (as we see here with miniKanren), but Prolog itself isn't even close.
Barliman is written in miniKanren[0], a logic/relational programming system built by Daniel Friedman[1], Will Byrd[2] & Oleg Kiselyov[3]. There are implementations of miniKanren in languages other than Scheme, one of the prominent being Clojure[4].
To oversimplify, in the miniKanren world programs are written using relational logic, wherein there are "variables" and then certain "relationships" between the variables. That is the program specification. Now we can run the specification and allow miniKanren to generate one or more variables that satisfy the relations. Thus a miniKanren program can have more than one answers. One interesting side-effect of this kind of an abstraction is that programs can also be run backwards to generate more programs that satisfy certain relations. That's pretty much what's happening with Barliman.
Some of you might recognize Daniel Friedman as the author of The Little Schemer. If you liked that book, you might check out The Reasoned Schemer. Short, accessible, and a bit mindbending, it offers a compelling introduction to logic programming that culminates in the "invention" of a Prolog-like DSL from basic Scheme primitives. Terrific little book. https://mitpress.mit.edu/books/reasoned-schemer
ILP + IFP are nice subjects to read about for this kind of thing. At university a number of people believed ILP and/or neural networks and/or genetic programming would replace programmers shortly. That didn't happen (this was around 25 years ago) but it's still interesting material great to learn from.
This combined with a robust type system should be nice. I thought it would be a solid direction which would've happened but didn't. One of my uni courses was ILP (long time ago indeed) and that seemed to have merit with faster computers, more advanced algorithms and heuristics. After that I saw Hoogle when I learned Haskell; seems combining these things can make something powerful. I was going to post MagicHaskeller here as well but gergoerdi did that already.
This looks pretty cool! However, I think it would be annoying to have the tests automatically running as you're typing. What happens if you accidentally create an infinite loop or run something dangerous?
Fair enough, indeed I would advise to use with caution around data that needs to be kept safe from deletion. I think of this as a nice idea for experimenting forms of coding like spikes, but wouldn’t run something like this against a production database or my root file system.
Would anyone want to ride in a car or fly in a plane written by set of sporadic tests? I wouldn't risk it.