Very cool. For someone with moderate compilers & POSIX experience but no scheme ...

znpy · on April 23, 2022

I’m no compiler guy but I’ve written a fair amount of shell script and I’m going to take an educated guess.

My guess is that in order to do proper text processing and transformation in shell-script you have to call many external tools (from the posix-compliant toolchest) many times, meaning that you’ll be paying a lot for fork+exec.

feeley · on April 23, 2022

An important design goal for the POSIX shell version of the RVM is to depend on the fewest external tools possible. Each dependency brings with it a portability risk because unix tools can be subtly different between environments (for example macOS vs linux, and even between linuxes, and even between versions of the same distribution of linux).

So the core of the RVM has no dependencies with external tools (i.e. it is in "pure" shell script). The only dependencies are for I/O, namely for the "putchar" RVM primitive and the "getchar" RVM primitive (both of which do byte-at-a-time binary I/O on stdin and stdout). For putchar the shell "printf" (which is usually builtin) command is used, specifically

    printf \\$((byte/64))$((byte/8%8))$((byte%8))

For getchar is is not possible to use the shell "read" command because it does not handle null bytes on stdin (and we want Ribbit to support binary I/O). This is admittedly somewhat of an artificial constraint given that Ribbit will probably be mostly used for text processing, like to write compilers that map source code text to target code text. So getchar is implemented with

    set `sed q | od -v -A n -t u1`

and then the next byte on stdin is available using $1 and "shift" is used to move to the next byte. The call to sed is to read each line separately (useful to implement a REPL where you enter a command which is executed before having to enter the next command). Both "sed" and "od" are standard unix tools. Moreover we checked that those set of options have existed for a long time (at least the early 1990's). It would be interesting to test many POSIX shells to verify Ribbit's portability (we used bash, dash, zsh and ksh for our testing).

The low performance of the POSIX shell RVM is due to two things. First, the shell's interpreter is not fast. The simple loop

    n=10000000; while [ $n -gt 0 ]; do n=$((n-1)); done

runs about 5000 times slower with bash than the equivalent JavaScript code run with nodejs, which is implemented with a JIT. Secondly, the POSIX shell does not have arrays so it is necessary to use the shell "eval" command to do indexing. For example a[i]=0 becomes

    eval a$i=0

As you can imagine the RVM implementation uses arrays all over the place for implementing access to the RVM's stack and Scheme objects (both of which are implemented using a garbage collected heap, which is conceptually an array of 3 field records and implemented in the shell as 3 "arrays").

The POSIX shell version of Ribbit was designed to be as portable as possible, only relying on a POSIX shell to run Scheme code. This can be useful in a context where you only have a POSIX shell at hand and you want to bootstrap a more efficient set of development tools (C compiler, editor, etc) and you don't mind waiting a day or two for the build to complete. It is also useful when you want a reproducible build pipeline and the only tool you trust is the shell.