> The 5.0 VM is a register machine, which operates on a set of virtual registers that can store and act on the local variables of a function, in addition to the traditional runtime stack.
This is a common source of confusion, because the name "register machine" makes people think about CPU registers. However, the registers in a register VM are merely slots in the traditional runtime stack. The difference between a stack and register machine has to do with how the bytecode instructions are encoded. In a stack machine, most instructions implicitly pop arguments from and push their results to the top of the stack. The instructions are byte-sized, encoding just the operation name. For example, to add 10+10
LOADK 10
DUP
ADD
Meanwhile, in a register machine the instructions can read and write to any slot in the stack. Instructions are larger, because in addition to the operation name they also encode the indexes of the input slots and the index of the output slot. But it's worth it because you need less instructions in total and do less stack shuffling.
This. Some CPU architecture even blur the line between hardware registers and stack slots: register windows! Indeed what matters is whether each instruction addresses operands explicitly (registers) or implicitly (top of the stack)
The main advantage is that you can keep the size of each instruction small. The more "architectural" registers, the more bits you need to encode each operand in all instructions. The larger the code, the more time is spent fetching the code from RAM and more i-cache is required.
So how does having a larger "microarchitectural" register file help?
Regiater renaming is used to eliminate false data dependencies arising from the reuse of registers by successive instructions that do not have any real data dependencies between them.
This allows independent instructions to be executed in parallel even if seems that you'd have to wait until a regiater is freed up.
Lua's simplicity is sometimes it's real selling point. I was just today searching for a small scripting language to implement in a mobile app in .net, where app size is a premium, and it turns out that the smallest useful JavaScript interpreter is at least 3x the size of a Lua interpreter.
I do believe that an un-bloated JavaScript language from when it was just invented would be simpler than Lua (as both were designed as "scripting" languages, not as main ones), but history didn't go that route :)
But... Lua is WEIRD! Weird nomenclature, weird string concatenation operand, 1-based arrays, too clever "tables" and "metatables" stuff.
Tables + metatables are awesome, I've seen full OOP, composition based design various forms of dynamic dispatch all built on those simple mechanisms.
One fun thing we did with co-routines was build literate-style scripting for our designers to define AI routines. They could write them in a linear/semi linear way with yield(<number of frames to wait to re-execute>) that allowed them to script all sorts of interesting behaviors without extensive programming knowledge.
We also ran the whole thing inside a 400kb pre-allocated block, it's an incredible powerful, embeddable language.
Weird? :) Surely all languages have their idiosyncrasies otherwise there'd be so very few to choose from? I mean, 1-based arrays can be found in several popular languages like R, Julia, & MATLAB. Taking the opposite perspective: there's so much in Lua that is familiar. I'd argue that coders who have never used it before could easily make sense of a Lua program. (I'll give you metatables, though — they're a bit unusual!)
The concatenation operand makes sense though... String concatenation is hardly addition and for weakly typed languages it's easy for the + operator to end up doing the wrong thing instead of failing.
Out of curiosity: what Lua interpreter did you pick for your .net project?
I'd love to see upvalues diagrammed as they are represented in memory.
It sounds like the stack is perhaps a Stack<Frame pointer¹>, where each Frame contains the locals for that stack frame; then a coroutine just needs to keep a pointer to the stack frame it is closing over. (And then, Frame does too, recursively, incase there is more than one scope being closed over.)
This would be extremely similar to … most any other language … and makes me wonder why Lua gives them such a unique name. It has been hard to really comprehend the Lua spec, when I've tried to understand that facet of it.
(I'd also argue that Lua isn't as simple as it is made out to be: primitives behave wildly different from objects, there's the 1-based indexing, multiple-return is pretty much unique to Lua (vs. returning a tuple, which other languages such as Python, Rust, and sort-of JS, go for; I think that's conceptually simpler).)
¹and note "pointer" here might really be "GC ref", to permit these to be GC'd as necessary, as closures can keep stuff alive far longer than normal.
> It sounds like the stack is perhaps a Stack<Frame pointer¹>,
No, the stack isn't a list of frames where each frame is a separate heap allocated list of slots for values. You're right that that's a very common (but not super fast) implementation technique.
The Lua stack is a single contiguous array of value slots that is used by all call frames. In fact, call frames use overlapping stack windows so that function call arguments don't need to be moved during a call.
That makes implementing closures particularly tricky since when a function returns, its region of the stack is immediately reused by later code.
Upvalues address that by moving just the local variables that are closed over off that contiguous stack array and into the heap. And they do that only when the function where the variable is declared returns so that during that functionn's lifetime, it can be accessed directly from the stack. Also, the compiler is able to manage all this just with a single pass compilation.
> multiple-return is pretty much unique to Lua (vs. returning a tuple)
Multiple returns are from Common Lisp (I don't know if the history goes back further though), and they aren't tuples. They're the return-position analogue of what are default arguments in argument-position; like default args allow callers to ignore arguments they don't care about, multiple returns allow callers to ignore the return values they don't care about. You can add a secondary return value to a function without breaking existing callers, just like you can add a new default arg.
Barbra Liskov’s CLU language had multiple returns in 1973, ten years before Common Lisp. I was a student of hers back then.
I’m pretty sure I saw multiple returns used in pseudo-code on a class handout the semester before I started working on CLU. The class was “Structure and Interpretation of Computer Programs” that would lead to the writing of SICP.
You're right, CLU is actually where Lua took them from. From "The Evolution of Lua"
> From CLU we took multiple assignment and multiple returns from function calls. We regarded multiple returns as a simpler alternative to reference parameters used in Pascal and Modula and to in-out parameters used in Ada; we also wanted to avoid explicit pointers (used in C).
Edit: And yeah I don't think it's anything unique to Lua, it's an implementation detail of closures and lexical scope generally. It's a non-stack wrapper for closed-over variables, the end of the introduction to chapter 25 explains:
> For locals that aren’t used in closures, we’ll keep them just as they are on the stack. When a local is captured by a closure, we’ll adopt another solution that lifts them onto the heap where they can live as long as needed.
> For locals that aren’t used in closures, we’ll keep them just as they are on the stack. When a local is captured by a closure, we’ll adopt another solution that lifts them onto the heap where they can live as long as needed.
I sort of thought about mentioning something like that; that's in what I categorize as an optimization. (A good one, since the common case is that there probably isn't a closure at all in the scope, and the entire scope can thus be moved to the stack, kept out of heap, and not GC'd, reducing GC pressure.)
Since all upvalues can be determined at “compile” time (cause they all are locals up the lexical scope), I think they don’t pin stack frames^ for their lifetime, instead they create separate upvalue blocks in advance when corresponding activation records get created. One can also create C closures with upvalues from the stack, e.g. push a string and “close” it for few C functions (lua_pushcclosure, lua_upvaluejoin, luaL_setfuncs). But this string gets popped from the stack, and these functions live on their own and have a shared upvalue, which is not in any stack frame. It would be reasonable to assume from the C API that lua_CFunctions and Lua functions have the same upvalue mechanism, as they are completely interchangeable.
^ I mean internal stack structures, not a public stack, which is fully dynamic and may be cleared without affecting anything, e.g. in lua_CFunction. There’s nothing you can refer to on the stack, the same way you don’t want to refer to a temporary technical `push` in Assembly. Consider this:
function f()
local arr = []
for i = 0, 100000000 do
table.insert(arr, function ()
return i
end)
end
end
If `i` was created on the stack on every iteration, it would quickly overflow it, because you can’t just erase previous i, since it was closed.
(For those unaware: `for i …` “creates” a new variable/upvalue at each iteration. Every arr[?]() returns a different value in this example.)
For first, I was hoping to omit obvious optimization like splitting unclosed locals into a stack for the simplicity of the conversation, to arrive at a conceptual model for upvalues. That's going to be clearly impossible. That's most of your first paragraph. The Lua C API … almost serves to muddle the issue. (The stack there is … an artifact of the API, really? from a point of view trying to model the language.)
(A corrected version of your counter-example:
function f()
local arr = {}
for i = 0, 5 do
table.insert(arr, function ()
return i
end)
end
return arr
end
I've also changed the constant to make it reasonable to run, without creating an enormous table.)
That's good counter-example, and does run counter to my intuition. (The behavior I describe, e.g., in Python, is different; the closures in Python all close over the same variable.) One might be able to boil this down to instead using "scope" where I was saying "stack frame". (The difference being that for loops in Lua create scopes, vs. Python, where they do not. IMO Lua's is the better of the two options.)
To combine the optimization in the sibling thread (that I was attempting to avoid for simplicity's sake), then the implementation could be a stack of scope-ishes, consisting of the unclosed variables only, and then optionally some structure under GC, containing the closed-over variables (none, if nothing is closed over). The scopes on the stack would need to know of the (potential) GC object holding the closed over variables; a (normal) use of `i` here would have to look up the same `i` as what the closure hold, and as you note, cannot live on the stack.
… this is where, again, I think a diagram would help. I don't know that terms like "upvalue" and "activation record" help build understanding of Lua to new practitioners, and IMO, do more to obscure it. (This isn't your fault, ofc.; these are the terms Lua itself uses for these concepts.)
Lua 5.0 doesn't have the incremental garbage collector, only the original mark-and-sweep collector.
The pdf they based the blog post off of even says the incremental GC is upcoming in 5.1, and you can read through https://www.lua.org/source/5.0/lgc.c.html yourself and see that, unlike the 5.1 version, it only has a single mark list and doesn't use the tri-color scheme.
Timing on this is confusing. Article was written in 2020 about a paper published in 2003 or so about the design of lua 5.0.
It's still a really insightful and approachable paper that's worth reading, and it does still help understand the constraints and approach of lua. But the current "old" version of the language is 5.2 released in 2011, and there have been a couple major versions after that as well.
So some of it may not actually be applicable to using lua now, depending on what your "now" looks like.
For the most part, the info is still valid today. The main thing that changed since then is that Lua 5.3 introduced integers and several new bytecode instructions for the common case where one of the operands is a small integer constant.
I love that they went from a single-pass interpreter to a byte-code virtual machine, and now like PHP, also has direct access to C & system libraries! Lua has come a long way since it's advent.
The paper and article are about lua 5.0 which came out in 2003. I don't believe it had any prominence at all before that so this stuff has "always" been there in lua unless you were in the CS dept at puc-rio in the 90s.
I think Lua started catching on outside of PUC-Rio before 5.0 in 2003. Remember, LucasArts used it in Grim Fandango, and that became widely known in the game industry in 1999 or 2000. My own first exposure to Lua was in a Linux distro package repository sometime in 2000. That's why, when I was looking for a lightweight scripting language implementation in late 2004, I remembered Lua. So yes, my actual first exposure to the Lua language was Lua 5.0, but I consider myself a relatively late adopter.
As with nearly everything in Lua, if you want bells-and-whistles then you need to find a library. I've been writing a lot of extensions in Rust using mlua [1] which has been fantastic.
I've written so much FFI code for Lua it's ridiculous. Rust mlua has been a total game-changer for me.
I wish there was a Lua implementation in pure Rust (I understand about likely performance degradation compared to LuaJIT). There's incomplete Luster[1], but it's still unusable.
mlua is pretty much the standard for Rust at this point. There are a few others but they're not nearly as comprehensive and they all suffer from various safety issues that require unsafe{}. mlua doesn't require any unsafe usage in its API.
I thought rlua was safer than mlua? Am I mistaken there and has the situation perhaps changed, or are you claiming rlua requires you to use unsafe{} in various places?
I made some seemingly similar design choices in TXR Lisp (not knowing anything about Lua or its internals).
- register based VM with 32 bit instruction words holding 6 bit opcodes.
- closures that start on the stack and are moved to the heap.
- single pass compiler with no SSA, Lisp straight to code
- but with additional optimization, informed by control and data flow analysis, done on the VM assembly code.
I don't think there's any deep connection. "upvar" in Tcl is a language feature. Upvalues in Lua are purely an implementation detail of the interpreter.
If you want all the gritty details, a chapter of Crafting Interpreters walks through a complete implementation of Lua's approach to closures:
I'm not sure about that though I haven't used tcl in a while and was never an expert. I think tcl upvar allows you to explicitly do what lua's closure implementation automatically does.
They aren't the same thing but they are closely related concepts and similar solutions to the same potential problem.
Nowadays the upvalues in Lua are bona-fide lexical scoping a-la Scheme or Javascript. However, the name comes from Lua 4.0 where the upvalues worked in a more unusual way. Not in the same way as Tcl though, but just as surprising for the uninitiated.
This is a common source of confusion, because the name "register machine" makes people think about CPU registers. However, the registers in a register VM are merely slots in the traditional runtime stack. The difference between a stack and register machine has to do with how the bytecode instructions are encoded. In a stack machine, most instructions implicitly pop arguments from and push their results to the top of the stack. The instructions are byte-sized, encoding just the operation name. For example, to add 10+10
Meanwhile, in a register machine the instructions can read and write to any slot in the stack. Instructions are larger, because in addition to the operation name they also encode the indexes of the input slots and the index of the output slot. But it's worth it because you need less instructions in total and do less stack shuffling.