For optimal speed, you should move as much code as possible outside the closures.
In particular, you should do the `switch op` at https://github.com/skx/simple-vm/blob/b3917aef0bd6c4178eed0c...
outside the closure, and create a different, specialised closure for each case. Otherwise the "fast interpreter" may be almost as slow as a vanilla AST walker.
The core idea is simple:
do a type analysis on each expression you want to "compile" to a closure, and instantiate the correct closure for each type combination.
Here is a pseudocode example, adapted from gomacro sources:
This works best for "compiling" statically typed languages, and while much faster than an AST interpreter, the "tree of closures" above is still ~10 times slower that natively compiled code. And it's usually also slower than JIT-compiled code
I'd love to see, if it's possible to create a libc-free, dependency-free executable without Nim (https://nim-lang.org/).