I completed the second interpreter in the craftinginterpreters book in rust. One thing I noticed was all the places the author was explicitly keeping track of lifetimes, I could do it via the rust compiler. For example, there is a function which should only be called with strings which will live for the life of the program. Easy to enforce with rust.
The only place where I couldn't find a zero cost abstraction was enum encoding, where enums are of different sizes.
It would be great if the compiler could use only 1 byte for the rest of the enums other than OpConstant, and more bytes for OpConstant, and use maybe the first 2 bits to store the number of bytes used, kind of like sds library by antirez. This is the only place where I felt my rust implementation was less efficient than the book's c implementation.
Off course, I could go the same route as the c code, and not use type safe enums by just keeping an array of bytes and doing the decoding myself, but I wanted to have full compile time type safety.
> The only place where I couldn't find a zero cost abstraction was enum encoding, where enums are of different sizes.
Unfortunately, variable-sized types interact poorly with arrays -- you can no longer access any element in constant time. UTF-8 has essentially the same issue.
Did you end up abstracting over the byte array at all? I would imagine that some kind of `BytecodeBuffer` could give pretty reasonable ergonomics over a raw buffer without sacrificing efficiency or correctness.
If you want random access into an array of enums, you need them all to be the same size (there are also alignment concerns, which is why it's tough to use only 1 bit of overhead to discriminate an enum with two variants). I guess you could add indirection and have an enum with variants A(Box<AType>), B(Box<BType>), etc. This is good practice if you care about space and you have a variant with a type that's a few hundred bytes (not heap-allocated like with Vec). But it's not worth adding the indirection and potential cache miss to save a byte, when most cache lines can fit at least 32 bytes.
But you don't always need random access; often you only need sequential access, in which case you can pack differently-sized values into a single buffer. However, Rust enums don't support that use case. A macro could probably do it without too much loss of ergonomics, but I'm not sure if there's a good crate for it.
That's a good point! I guess you'd still have to make sure that variants with alignment requirements don't get unaligned. One way to do that would be to add leading padding if they follow an unaligned variant. Adds some complexity to the representation, I guess, and I suspect there's other ways of doing it. Neat problem.
The only place where I couldn't find a zero cost abstraction was enum encoding, where enums are of different sizes.
For example
pub enum OpCode { OpReturn, OpConstant(u8), OpNegate, OpAdd, ... }
It would be great if the compiler could use only 1 byte for the rest of the enums other than OpConstant, and more bytes for OpConstant, and use maybe the first 2 bits to store the number of bytes used, kind of like sds library by antirez. This is the only place where I felt my rust implementation was less efficient than the book's c implementation.
Off course, I could go the same route as the c code, and not use type safe enums by just keeping an array of bytes and doing the decoding myself, but I wanted to have full compile time type safety.