Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Crunch – a Scheme compiler with a minimal runtime (more-magic.net)
190 points by sjamaan 11 months ago | hide | past | favorite | 71 comments


For Common Lispers such as myself, who are vaguely aware of developments in the Scheme space: the most important difference between CRUNCH and Chicken appears to be that, while both compile down to C/object code, CRUNCH is additionally targeting a statically-typed subset of Scheme.

Opinion: this is great. The aversion of Lispers to static types is historical rather than intrinsic and reflects the relative difference in expressiveness between program semantics and type semantics (and runtime vs tooling) for much of computing. Now that types and tools are advancing, static Lisps are feasible, and I love that.


I don't believe it's not intrinsic. A lot of the reason why Lispers may be averse to static types is because of the perceived inflexibility it can induce into the system. Lisp programmers don't want to be told what to do, especially by the compiler. Some CLs like SBCL have allowed some form of inference through the standard type declarations in the language. This leads me to believe that the 'right thing' in the case of Lisp is a combination of dynamicity and some stronger typing features that can be applied during the optimization stage. The dynamic nature and ease of use of Lisp relative to its performance is one of its greatest assets: it would be nearsighted to try and sacrifice that--a good Lisp programmer can optimize the important parts of his programs such that they're comparable to or even outperform their equivalents in other more ``high-performance'' languages. With that being said, these developments might bring us closer to a ``sufficiently smart compiler'' that could make that latter stage mostly unnecessary.


> A lot of the reason why Lispers may be averse to static types is because of the perceived inflexibility it can induce into the system.

This perceived inflexibility is what my comment was getting at - that for primitive type systems available back in the 80's, yes, the types significantly constrained the programs you could write. With today's type systems, however, you have far more flexibility, especially those with "Any" types that allow you to "punch a hole in the type system", so to speak.

When I tried typed Python a few years ago, I found out that, to my surprise, 99% of the code that I naturally wrote could have static types attached (or inferred) without modification because of the flexibility of Python's type system.

I also learned that types are a property of programs, more than just languages. If a program is ill-typed, then having a dynamically-typed language will not save you - it will just crash at runtime. Static types are limiting when either (1) they prevent you from writing/expressing well-typed programs because of the inexpressiveness of the type system or (2) it's burdensome to actually express the type to the compiler.

Modern languages and tools present massive advances in both of those areas. Type systems are massively more expressive, so the "false negative" area of valid programs that can't be expressed is much, much smaller. And, with type inference and more expressive types, not only do you sometimes not have to express the type in your source code at all (when it's inferred), but when you do, it's often easier.

The "Any" type is really what steals the show. I don't think that there's a lot of value in a fully statically-typed Lisp where you can't have dynamic values at all - but I think there's a lot of value in a Lisp with a Python-like type system where you start out static and can use "unknown", "any", and "object" to selectively add dynamic types when needed.

Because, being a Lisper, you probably think like me, I'll give you the idea that really convinced me that types are positive value (as opposed to "only" being small negative value): they enable you to build large, complex, and alive systems.

Types are a force-multiplier for our limited human brains. With types, you can more easily build large systems, you can more easily refactor, you can start with a live REPL and more easily transition your code into source on disk. Types help you design and build things - which is why we use Lisps, after all!


I don't disagree with you there. The only thing CL really misses out on is that its type system isn't specced out enough to be as powerful as it could be. Since they're just macros you can write all kinds of crazy types, but non-terminating ones just might not work if your implementation doesn't handle them a certain way. This was actually a disputed issue: https://www.lispworks.com/documentation/HyperSpec/Issues/iss....

Someone had proposed reifying types to be first-class objects, which might be a good thing. I haven't thought about it enough to decide. https://gist.github.com/Bike/e405cc49a64fed0752b524c292bd715...


Being able to declare types is the reason why I switched from Scheme to Common Lisp. It's just a shame that there's basically no concept of generic types and 'satisfies' isn't quite good enough to make up for it.


I don't think it makes sense to conflate Lispers to Schemers. I've programmed in both languages but have a stronger affinity for Scheme partially because semantically it is less flexible and more "staticy" than Lisp. Philosophically, the languages tend to attract different personalities (to the extent that the highly fragmentary Scheme world can be characterized).


I want static types in a higher level assembly language for systems programming. That's because I want to work with machine-level representations, in which there are no spare bits for indicating type at run-time (moreover, using such a language, we can design a type system with such bits, in any way we please).

I don't want static types in a high level language.

It's just counterproductive.

We only have to look at numbers to feel how it sucks. If we divide two integers in Common Lisp, if the division is exact, the object that comes out is an integer. Otherwise we get a ratio object. Or if we take a square root of a real, we get a complex number if the input is negative, otherwise real.

This cannot be modeled effectively in a static system. You can use a sum type, but that's just a greenspunned ad hoc dynamic type.


> This cannot be modeled effectively in a static system. You can use a sum type, but that's just a greenspunned ad hoc dynamic type.

This can easily be modeled in a static type system. All you really need is subtyping.


Oh come on, sum types are so much more useful than a value which can have literally any type (including undefined aka nil).


> Now that types and tools are advancing, static Lisps are feasible, and I love that.

Haven't that been feasible for a pretty long time already? Judging by how well-received (or not) they've been, it seems there isn't much demand for it. Things like clojure.spec and alike (compile-time + run-time typing) seems much more popular, but isn't static.


There isn’t much demand for Lisps in general.


I think given the sheer amount of them that is demonstrably false


Many have been created, but how many have significant usage?


People make a lot as their side projects through the easy parsing, what software is being made with them? (besides the usual hacker news backend response)


Have you used one at work? I would surely love to but haven’t had the chance yet.


Apropos, this crossed my feedreader today: https://lispjobs.wordpress.com/2024/12/19/mid-senior-clojure...


Clojure is fairly popular (I'm using it at work, though I'd prefer Scheme of course)


What are the main differences between OcamML and a statically typed Lisp?


Type inference is probably the biggest thing. You would need explicit "phases" to expand macros, disallow macro expansion at runtime, and implement bi-directional type inference HM-style to get even close to what OCaml has.

To be honest, I'd kill for a Lisp that had the same type system as OCaml, but I suspect the closes we'll get is basically Rust (whose macro system is quite good).


Racket has Typed Racket, which while not Hindley-Miller can do some type inference.

There's also the plait language which says its type system is similar to ML: https://docs.racket-lang.org/plait/index.html

And Hackett, inspired by Haskell: https://lexi-lambda.github.io/hackett/

And Common Lisp has coalton: https://coalton-lang.github.io/


Most of those aren't really ready for production use except maybe Typed Racket, which I consider to be too "weak" and took a route with annotations that I'm not a fan of. Coalton is very interesting, I've been following it for a bit. Carp [0] is another one that I've been following.

[0]: https://github.com/carp-lang/Carp


Coalton is used in production for quantum computing systems and soft real-time control system automation. There are also (a small number of) Coalton jobs.


Quantum computing control systems is exactly the domain I've spent about half a decade doing, it's really not the production-like environment you think it is. Speed of iteration and flexibility to allow for changes to hardware is tantamount to success. It's also a lot easier to accept risk to breakages in the language when the author works at your company too.


> I suspect the closes we'll get is basically Rust

Rust is categorically different from all of these other things. Lisps (and OCaml) all have interactivity as a core language feature - Rust is as non-interactive as it gets.

> whose macro system is quite good

Compared to C, perhaps.


There is a simpler solution than type inference to removing type annotations while retaining types: remove the distinction between variable and type names. The way that you handle multiple instances of a type is to define an alias for the type. In pseudocode it might look like:

  type Divisor Number
  def divide(Number, Divisor) = Number / Divisor
As compared to:

  def divide(number: Number, divisor: Number) = number / Divisor
I have implemented a compiler for a language that does this with different (less familiar) syntax. It's a bit more complicated than described above to handle local variables but it works very well for me.


> There is a simpler solution than type inference to removing type annotations while retaining types

This doesn't remove them, it just moves them.


Is that language a Forth dialect by any chance?


I'm not super familiar with the details but I've heard that Shen has a really good type system. Aditya Siram addresses types at 12:00 in this video: https://youtu.be/lMcRBdSdO_U

Shen homepage: https://shenlanguage.org/


It's also pretty straightforward to use multiple coding styles in the lisps in question, regardless of the typing being static or not.


https://github.com/LuxLang/lux

    The language is mostly inspired by the following 3 languages:

    Clojure (syntax)
    Haskell (functional programming)
    Standard ML (polymorphism)


The module and functor system. Also macros are much more a Lisp family thing.


The interesting part is how few trade-offs had to be made to make GC-less (ref-counting) Scheme-to-C compilation attainable, and how closely CRUNCH adheres to R7RS.

Notable is the lack of call/cc, but in my "armchair expert" opinion, it's the ugliest part of the language. Yes, continuations as a first-class object are elegant and succinct, but in order to do anything actually useful with them (like implementing exceptions), you need library code, which makes them much less composable.

I think there's a much more pleasant and practical language lurking somewhere between R6RS "big" and traditional "small" Scheme, but I feel it would take a BDFL to flesh it out.

(Meanwhile, back to fixing my init.el.)


Delineated continuations might help


delimited


This just made my day. I'm the author of Scheme for Max, an extension to the Max music programming environment that puts the s7 Scheme interpreter in it for scripting note-event level code (not dsp). It would be fantastic to also be able to use a subset of Scheme for generating DSP level code, where speed would be more important than dynamic typing or continuations/tco/etc.

I will be watching this closely!


Fun fact: the DOS version of the Cakewalk music software in the late 80's had a scripting language (CAL) that was very loosely based on Lisp. The author (Greg Hendershott) later went on to write a lot about Lisp.


Yes! This was the first "lisp" I ever used, back in the very early 90's. It was still part of Cakewalk in the windows releases at that time and was really Cakewalk's secret weapon.

With the Ableton Live Notes API, enabling something similar is one of the things I plan on adding to Scheme for Max this year.

Thanks for the tip about who wrote it, I will look him up!


what do you think about pre-scheme?


I haven't tried it, but I'm curious and interested!


I'm glad to see more projects exploring the statically compiled Scheme space. One difference from PreScheme listed is R7RS conformance, but the PreScheme restoration project is also targeting R7RS so that the compiler is portable to many Scheme implementations. The biggest difference I can see is manual memory management vs. reference counting. Though, one of the TODO items for the PreScheme restoration is to revisit memory management. Maybe PreScheme will end up with reference counting, as well.


The very first rudimentary standalone example:

     (define (main) (display "Hello world\n"))   
Doesn't compile, but instead gives error:

     Error: main: global variable `scheme#display' has unknown type


The popcount example does compile, if you drop the "assume" form,

    (import (chicken bitwise) (chicken fixnum))
    (define (popcount32 x) ; From "Hacker's Delight"                                                                                                                                       
    ;  (assume                                                                                                                                                                             
       (let ((x (fx- x (bitwise-and (arithmetic-shift x -1) #x55555555))))
         (let ((x (fx+ (bitwise-and x #x33333333)
                       (bitwise-and (arithmetic-shift x -2) #x33333333))))
           (let ((x (bitwise-and (fx+ x (arithmetic-shift x -4)) #x0F0F0F0F)))
             (let ((x (fx+ x (arithmetic-shift x -8))))
               (let ((x (fx+ x (arithmetic-shift x -16))))
                 (bitwise-and x #x0000003F)))))))
Turns into somewhat redundant C, looks much like transliterated bytecode, e.g.

    long v17 = 0;
    // ...
    // popcnt.scm:7                                                                                                                                                                            
    v17 = 0;
    // ...
    // popcnt.scm:7                                                                                                                                                                            
    v17 = (t37 & 252645135);
    // popcount32                                                                                                                                                                              
    // popcnt.scm:8                                                                                                                                                                            
    t40 = crunch_arithmetic_shift(v17, -8);
Interesting find in the (small) crunch.h is

    #ifdef CRUNCH_FREESTANDING
    #include <inttypes.h>
    #include <stdarg.h>
    #include <complex.h>
    #include "crunch-environment.h"
    #else
    // loads of stuff
though I haven't found crunch-environment.h in the source yet


The nice thing is that you can emit pretty sloppy C and the C compiler will optimize it pretty well.


I believe the domain name is a reference to this anecdote https://www.catb.org/jargon/html/magic-story.html


You believe correctly


> Other targets are possible, like GPUs. I don't know anything about that, so if you are interested and think you can contribute, please don't hesitate to contact me.

The freestanding macro suggests most of the heavy lifting is done. Stuff the GPU targets will struggle with in no particular order:

- setjmp / longjmp

- signals (if used?)

- threads

- fast alloc/free

- stack will be smaller than chicken expects

I don't know how to do signals. The rest are merely difficult.


Crunch doesn't have any of these. That's CHICKEN you're thinking of, but CRUNCH explicitly does not require CHICKEN.


> - setjmp / longjmp

Isn't that mostly needed by call/cc, which is not supported by Crunch?


I thought chicken used it to treat the C stack as an arena for the GC, with longjmp acting as part of garbage collection. I don't see how it would help for call/cc, that needs a reified stack to handle repeat invocations.


so if I understand this right, it could be a way to run scheme on esp32 and similar microcontrollers, isn't it?

Also its small size would make it a perfect target to compile to typescript/deno/wasm without rejecting the s-exp power and runtime possibilities in its full chicken code at the backend...


There are specialized Scheme implementations that aim to be small.

A recent paper (see also the presentation on YouTube):

Leonard Oest O'Leary, Mathis Laroche, and Marc Feeley A R4RS Compliant REPL in 7 KB

For systems with a C compiler:

Vincent St-Amour and Marc Feeley. PICOBIT: A Compact Scheme System for Microcontrollers. (This one got a best paper award).

Finally, there is also PICBIT available for PIC microcontrollers:

PICBIT: A Scheme System for the PIC Microcontroller


Not a scheme, but for running a lisp on microcontrollers uLisp is pretty amazing. http://www.ulisp.com/

You even get a REPL and everything WHILE running on the hardware. Super easy to set up and get going. Though as it is interpreted so you will of course not have native performance. Still very useful for prototyping and hobbyist projects.


As a rule, interpreted code has a smaller memory footprint than natively compiled programs.

Even interpreted on a several hundred MHz classic RISC microcontroller, Lisp will chew through a million cons cells a second or more - an order of magnitude or so faster than the old Lisp machines in the 80s were and they used to run giant CAD systems on those to design aircraft and stuff. (Albeit slowly...)


OTOH if you want something that gives you a REPL and everything directly on the hardware but is also not as slow as your typical interpreter, look at Forth.


Not in the way I would like: you cannot have a REPL running. Seems only a transpiler for R7 Small.


It's compiling a subset of Scheme that can be statically typed. It's not for full-blown live-hackable Scheme programs. You'd use Crunch or PreScheme to implement things that you can't implement in Scheme, like a garbage collector. I don't know if Crunch has this feature, but with PreScheme you can run your PreScheme programs at the Scheme REPL so you can have your comfy REPL-driven dev environment and then create a static executable when you're ready.


The documentation for Crunch is here:

https://wiki.call-cc.org/eggref/6/crunch


> No support for first class continuations

I'm not sure about how people would feel about this. I have mixed feelings. It feels like a loss of many things. What are the gains from ditching continuations?


First-class continuations remains the hardest nut to crack to implement any full specification of Scheme, especially if performance and/or compactness and/or simplicity of implementation and/or integration with other languages (C, C++, Java, etc.) is a priority. Scheme--, a subset with only downward continuations, i.e. continuations that could be implemented using only C setjmp and longjmp or equivalent, would still be an extremely useful language, but it is much harder to gather a community for such a project.


From the article:

> What is needed is a small, portable compiler that generates more or less "natural" C code with minimal dependencies and runtime system that supports at least the basic constructs of the language and that puts an emphasis on producing efficient code, even if some of the more powerful features of Scheme are not available.

Since C doesn't support continuations, it is not possible to have continuations and at the same time generate "natural" C code.

Consider how you would implement first class continuations. It's not possible to do CPS transformation (given the goal of natural code generation) - since that's a whole program transformation.


> Since C doesn't support continuations, it is not possible to have continuations and at the same time generate "natural" C code.

Since CHICKEN actually does this, I'll add the proviso of "without a GC or costly runtime".


I would call the code CHICKEN generates for "direct style" not "natural". (I know - it is a cop-out to use "natural" in quotes)

Regardless of what we call the style, it isn't simple any more.

For people interested in the internals of Chicken Scheme (I suspect sctb already knows):

https://www.more-magic.net/posts/internals-gc.html


Given the intent to have relatively straightforward C code and a tiny runtime, you pretty much have to ditch them or you're not going to get anywhere at all.

Plus honestly most of the uses of continuations that would make sense in the sort of relatively low level code this is aimed at can be built out of a few special purpose macros (and possibly a small amount of crying) - scheme code tends to lean on first class continuations for a lot of things that don't really need them, Because It Can.

e.g. Paul Graham's On Lisp shows how to build (a subset of) continuations out of macros, and getting something like clojure's 'recur' out of that would probably give you simpler macros than pg's code uses.


It's fine because this is a static subset of Scheme that you could use to, say, implement a runtime for a full Scheme. As a Scheme programmer, it feels nice to be able to implement your Scheme in something very close to Scheme instead of having to use C.


The make && make install doesn't create the chicken-crunch executable which is used in the examples, nor do I see 'crunch' in any of the build scripts. Any guesses?


The installation bit mentions doing:

> <install location>/bin/chicken-install -test crunch

Edit: That's because this is an "egg", which is an external library/package not part of the core chicken install


Ah yes, the test runner leaves the new binary under bin. Thanks


Interesting since this can compile down to C code , I was thinking maybe we can convert this to cosmopolitan c code as well?


You would just need to (configure this tool to) compile the resulting C with cosmocc, no?


that is to make it so that a single executable can on everything


This looks useful for sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: