Not to sidetrack things, but there is a bunch of cool stuff in the source for Lua. Whether or not it gets used directly, it is going to have a huge influence on the implementation of new languages.
It has a bunch of novel ideas* (e.g. the register-based VM, the table optimizations) that have paid off big, it's very portable (due to being written in strict ANSI C), and it has a surprisingly small code base. It's pretty readable (http://www.lua.org/source/5.1/), full of examples of simple-but-efficient garbage collectors, string handling optimizations, etc. The designers have worked hard to keep it small and simple.
I agree that it has a bunch of novel ideas, but I wouldn't list the register-based VM and table optimizations among them; register-based VMs go back to the 1950s, depending on how you want to count, and using different representations for tables used as arrays and tables used as hashes (that's what you mean by "table optimizations", right?) is pretty unsurprising. I'm not saying it isn't a great engineering achievement; I never would have predicted that those things would have been worth the cost in Lua's context, and they were, brilliantly so.
I would instead list their one-pass-compiler implementation of flat closures and their approach to continuations, and especially integrating continuations with C code calling back into Lua.
I second the recommendation of the paper. I also think their paper on LPEG is worth reading for anyone who is considering using regular expressions or yacc or ANTLR for anything: http://www.inf.puc-rio.br/~roberto/docs/peg.pdf
To my understanding, stack-based VMs are more common than register-based VMs, and the Lua VM is noteworthy for being both well-implemented and small enough to read. Then again, it looks like you're quite a bit ahead of me in reading about VMs. I just found this in your "blog thing" yesterday: http://www.bentwookie.org/blog/kragen-tol/2007-September/000... You've got a lot of good links. :)
By "table optimizations", I meant the way it behaves as both a dict and a table, depending on how it is used, and how much they get out of using those throughout the language. (I wrote the above in a hurry before catching the bus, and knew I had missed better examples, but it came to mind from the _Implementation_ paper.)
Yes, I agree that stack-based VMs are more common than register-based VMs, and it was an unusual decision on the part of the Lua folks to choose a register-based VM, particularly given their emphasis on implementation simplicity. And it appears to have paid off, which is pretty cool!
I'm glad you've enjoyed the various links.
I meant the way it behaves as both a dict and a table
You mean, a dict and an array?
I'm not sure I agree with their decision to unify those structures, but it does seem not to have cost them much implementation complexity or performance.
Lua has been cited as one of the major influences on WebKit's JavaScript interpreter SquirrelFish, which has seen about a 10x performance boost in the past 18 months.
concerning register based vm architectures you might find the following a good read: "virtual machine showdown: stack vs. registers" (https://www.usenix.org/events/vee05/full_papers/p153-yunhe.p...). aside from lua and parrot, google's dalvik also uses a register architecture (afaik).
i also had a look at the lua implementation a month or so ago, and i found some other details very interesting: it uses very few instructions (<40), and it uses a very neat trick to "unify" floating point and integer numbers: by using double as the default numerical type. (compare for example to the jvm, {i, f} x {add, mult, div, .etc.})
Thank you. Excellent article. I had always suspected register VMs to be faster from my days as a compiler writer. All of this is straight forward code optimization and generation. If anyone out there is contemplating a new language, I suggest looking at this.
Almost forgot: There's also a reading guide for the Lua source (http://www.reddit.com/comments/63hth/ask_reddit_which_oss_co...) by Mike Pall, the author of LuaJIT (http://luajit.org/). The source for that is almost certainly a goldmine too (it's a cutting-edge JIT compiler, after all), but I haven't really looked at its internals yet.
(Also, should be "simple-but-efficient garbage collection" above.)
IoL4 (http://www.iol4.com/) is an L4-based microkernel that boots into Io (http://www.iolanguage.com/), which is another language that looks rather interesting. I've never got the Io VM to build on OpenBSD/amd64, which rules it out for me, though.
I'd like to do that with a Forth or a Lisp one of these days, time permitting.
Agreed, _why really makes me think of the days when hackers weren't doing things with the hopes of getting a paycheck in the end, but more because they had the know how, and they wanted to do it just to see if they could pull it off.
It's an infectious virus that works rather brilliantly with the presentation, and quality of work this guy puts together.
Wow, the technical and artistic output by him is amazing. I'd love it if he wrote a bit about how he operates, what his interests are and etc but I somehow feel that is outside his character.
The first feature that really jumped out at me were the scoped mixins. From the docs:
You can swap out mixins for the span of a single source file. Example: you could give all strings a "backwards" method. But just for the code inside your test.pn script.
This would let you monkey patch in a safer fashion.
> I don't like significant whitespace, personally ... (one of ideas is an) elimination of line noise*
Yay to that.
If you think about it, very few programming languages had legibility (readability) of the code as one of their design goals. Not through the natural language verbosity, but rather by reducing or eliminating the code needed to please the compiler and otherwise not related to the actual application logic.
D makes a good step forward in this direction (compared to C++), and so does Python. Java is close, but still pretty "noisy". Petrovich is the best there is, but it's not very practical.
This is an interesting subject, so if anyone has any thoughts on the matter, please post below.
I think at the end of the day, what is noise for one application domain is signal for another.
Have you heard of Concept programming? http://mozart-dev.sourceforge.net/cp.html I think the basic idea is that it somehow separates the domain-specific code from the code that pleases the compiler.
Python was designed with readability as a goal. That was the reason for significant white space rather than {}. On some European keyboards the braces had to be typed as triples. I don't find Java particularly readable even though I code in it every day.
a.equals(b) vs a == b
The antithesis of readability is APL, which is nothing but line noise. When it came out, you had to get a special printhead just to type it. You could make some unbelievable one-liners though.
It's interesting that there are no classes, and that there are only Objects, where you just mix everything in.
As I understand it, javascript objects are like this, except you only add in methods one at a time, instead of mixing in and out modules.
It doesn't look like there are inheritance trees. I'm not sure what you'd do to prevent namespace clashes when mixing in different modules or if/how modules share and work on object attribute data, but so far in Ruby, that hasn't been too much of a problem.
I think you'll have a "well, duh" reaction if you implement an OO system in pure C. It is basically just a bunch of structs (data) with a bunch of functions which take a struct pointer as their first argument. Mixins just let you randomly add or remove these functions without any extra shenanigans, which you can also easily do in our pure C OO system.
ok, so _why did it again. from time to time, this guy comes from nowhere and drops a little something of awesomeness. _why makes me feel sad and be ashamed to even touch a computer keyboard :/
this guy rocks.
It has a bunch of novel ideas* (e.g. the register-based VM, the table optimizations) that have paid off big, it's very portable (due to being written in strict ANSI C), and it has a surprisingly small code base. It's pretty readable (http://www.lua.org/source/5.1/), full of examples of simple-but-efficient garbage collectors, string handling optimizations, etc. The designers have worked hard to keep it small and simple.
* See "The Implementation of Lua 5.0" (http://www.tecgraf.puc-rio.br/~lhf/ftp/doc/jucs05.pdf). It's 16 pages, and is a quick - but quite thought-provoking - read.