Some very interesting features of Nimrod which I would like to emphasize:
- clean Pythonic syntax (whitespace relevant but tabs are forbidden)
- native Perl syntax for regular expressions (slide 42)
- subrange types as in Ada
- set types as in Pascal
- strings as case (switch) selectors
- easy C interface with automatic type conversion (only functions as parameters need additional compiler pragmas)
- typed macros (templates)
- object code: native, C, Javascript (probably more to come)
I am working with Nimrod right now, and it is really one of the most productive languages that I ever encountered. It seems to mix the best features of other popular languages. Nimrod actually is where Python should be. I wonder if it is suitable for small embedded systems also.
By the way, one important difference to Perl regular expressions is that re"..." always parses the whole expression where Perl parses partially. Fortunately Perl's behavior can easily be simulated with a template.
For embedded use it's probably ok, anywhere C is ok. It statically links the bits of the stdlib you use but dead code elimination seems to work pretty well.
A simple word counting script I just wrote compiles down to about 52k on 64bit OS X.
I also wonder about Nimrod on embedded. The dealbreaker is dynamic memory allocation; if Nimrod does that as a non-negotiable part of its runtime, it can't be used.
I don't see how Nimrod is a natural evolution of Python. Static types and meta-programming by templates are far from anything Python proposes. I believe not even the syntax comparison applies too much, it's syntax is more reminiscent of Pascal than Python itself.
As someone who has spent much of the last decade or so writing Python, I mostly disagree. The pain points I've had with python (speed, mainly) are well catered to by Nimrod. I agree that it's not "pythonic" in some ways, but the feeling is very similar. The wtfs/line ratio is very low.
Not yet. I am working on a replacement for the Asciidoc/Docbook toolchain which I am not satisfied with. It handles an extended version of Asciidoc and aims to produce code for a few output formats (HTML, TeX).
Nimrod includes rst parsing and html/tex generation (http://build.nimrod-lang.org/docs/rstgen.html). While orthogonal to your quest you may want to look into its implementation for inspiration (or adopt rst instead of asciidoc for your publishing needs).
Nimrod provides the features which I wanted to have when I developed with Python: type system, Perl regex, range subtypes, easy C interface. native compiler(s). I know PyPy but Nimrod needs much less space to compile.
A GC with a deadline is what caught my attention, seemingly can specify a max pause time (in ms) and GC wont' take longer than that. That is very good for soft realtime stuff -- games, audio processing.
Also, Nimrod has a very nice library. Definitely has enough "batteries" in it to get started.
Just a few impressive ones: http client, server, json parsing, actor support, redis db driver, zmq, cairo graphics, OpenGL wrappers, even Lua and Python support.
That looks very appealing, anyone using, what is your experience with it?
I have been using Nimrod for some personal projects and internal tools since about version 0.9.0. It is quite usable -- especially the current Github version -- but as you may expect from a 0.x project, it still has some rough edges. These haven't stopped me from being productive with it, but you should be prepared for the occasional "huh?" situation (the most common problem I've had was that when I experimented with and learned the language, malformed code that I inadvertently wrote would trigger a compiler error, e.g. an illegal AST exception).
If you want to do some serious experimentation with Nimrod, I recommend using 0.9.3 (the Github version). It is considerably more mature and stable than 0.9.2.
The one oddity that I'm still getting used to is Nimrod's preference for value types. I.e., the string and seq[T] types use copying for assignment [1] by default, when most other languages assume reference semantics (you can use reference assignment instead, but have to say so explicitly). This may create inadvertent overhead if you are unaware of it, but also avoids some pitfalls you otherwise may run into with mutable strings.
As for the default GC, it's not that you can actually guarantee a maximum pause time [2]. The GC uses deferred reference counting (i.e., eliding most RC operations and batching the ones that are needed). As long as your data structures are acyclic, that allows for a very low upper bound on GC pause times. Pause times for dealing with cyclic structures depend on the size of the cycles involved (and the size of data structures attached to those cycles). A practical workaround for when you have to deal extensively with cyclic data structures is to make use of the fact that each thread has its own heap: thus, putting all time-critical stuff in one thread where cyclic data structures aren't used, and the rest in another thread.
[1] Copying is not used for passing arguments to a procedure or for returning a result, only actual assignment.
[2] You can influence it to some extent by changing the value of ZctThreshold in system/gc.nim, which specifies the size of the zero count table. Each GC iteration will still have to scan the stack, of course.
Edit: Oops, to correct myself, Nimrod does actually have a GC_setMaxPause() procedure, though I still don't think it can guarantee hard upper bounds on pause times.
Wait, if the RC scheme has a cycle collector, how do you ensure there are no CC pauses for acyclic data? Every CC algorithm I know of uses heuristics to determine when to scan the heap for cycles (usually when an RC drops from >2 to 1) and those can result in pauses proportional to the size of the live set on the heap, even when there are no cycles.
I am not sufficiently familiar with the inner details of Nimrod's GC implementation to give you a full answer, but, no, you don't have to scan the entire live heap to reclaim cycles.
The technique is called "trial deletion" and only has to traverse potential cycles. Strictly speaking, a type-agnostic implementation may have to traverse all objects reachable from an object whose reference count was decremented since the previous pass, but in a strongly typed language you can skip that for all objects that can't be part of a cycle. Nimrod at least makes use of that information in asgnRefNoCycle() in system/gc.nim.
Obviously, if everything on the heap is part of one big cycle, then, yes, you'll have to scan the entire heap.
It's still proportional to the live set in the general case. In practice cycle collectors tend to have to scan a lot, leading to some severe pause times (30ms in Firefox used to be common until the ad-hoc ForgetSkippable was added). Heuristics that cycle collectors use tend to fall down a lot in practice, unfortunately.
Thank you for replying. That was a very good overview.
Could you elaborate if possible a bit on a thread private heap. I know Erlang's actor and Dart's isolates allow that (which makes it easy for it to implement a concurrency GC).
What is the mechanism for creating private heaps (if there is one)? Or is it just by convention as in "just know that from this one thread I only create objects and no other thread will access it"?
When a thread is created, it automatically comes with a new thread-local heap. You can send objects to other threads via channels (which will be deep copied to the other heap). Each thread will allocate objects within its own heap by default.
Note that you can also fall back to GC-free allocation with untraced references (using ptr instead of ref) but will then have to manage that part of the memory yourself (i.e., deallocate untraced objects yourself).
Very interesting work. By the way, I'm planning to work on the project with the close purposes, but I have started with a more low-level tool first. My first work is a library to construct incremental parsers: https://github.com/Eliah-Lakhin/papa-carlo The idea is that the language's compiler based on this library will be easier integrated into the exist code editors and IDEs. Maybe we should somehow unite our efforts with the author of Nimrod?
Looks like a cross-platform language thats an even more acceptable lisp than ruby (hello macros!), with multiple build-language targets including C/javascript/C++ and a fast garbage collector? Like Go, but even cooler?
As a rule, I avoid any project where the developers have too much of the it-doesn't-work-and-I-don't-care attitude, mainly because you can't build on top of a foundation that isn't maintained fastidiously--you'll waste all your time fixing the foundation and won't get productive work done. Culture matters.
Cool project. How does it accomplish the realtime GC max pauses of 1-2 ms after compiling to javascript? Wouldn't the js implementation's GC have the final say on GC pauses? Thanks.
Just being optimistic, probably. You could write a fairly trivial program that builds and discards massive trees that will force GC to take greater than 1-2ms simply on freeing memory. Maybe they mean in regular operation or something - but then what is regular? I guess they have to claim impossible feats to get interest given just how many languages there are already.
It's deferred reference counting, so the 1-2ms claim is perfectly reasonable because all it has to scan is the stack (though if they want to break cycles they will have to do something costlier). The downsides are all of the downsides of reference counting, plus the fact that last I looked Nimrod's GC is not thread safe. That last one is a huge downside IMHO. I think DRC is interesting but I'm not much of a fan of it, honestly: it doesn't solve the big downsides of RC (cycles and thread safety being super expensive) while giving up RC's biggest advantage (prompt reclamation).
Deferred means you eventually have to do the work. Same for reference counting - when you free that last element it will have to actually do the freeing. If you've filled up ram and are allocating objects fast then you can either crash or do some garbage collection and if the tree is large it will have to take more than 1-2ms.
Yes, you can always overrun an incremental garbage collector, forcing it to start collecting more often to hit the deadline. Each collection may take only a couple of milliseconds, but there's no garbage collector that can say "no more than 1-2 ms per frame" in the case of a game, for example. If you bound max pause times, you may have to pause more often. There's no free lunch.
Deferred means you eventually have to do the work.
Not quite. What you save with deferred reference counting is what is generally the biggest cost of reference counting: assigning references to local variables and having local variables go out of scope. Deferred reference counting only requires an RC update if you store a reference in a heap object or in a global variable.
You will, of course, necessarily incur the overhead for allocation and deallocation, which one can make cheaper (bump allocation in generational and copying GC, memory regions) but can't entirely get rid of, especially in non-copying/compacting schemes.
But you lose the promptness of deallocation. Furthermore, I think the biggest overhead of reference counting is not the count manipulation (especially if you aren't making it thread safe), it's the inability to handle cycles, which DRC does not address.
You are correct that you won't get the immediacy of reference counting; that's a problem of all memory schemes that aren't naive reference counting. But no, the biggest constant time overhead for naive reference counting comes from the RC updates for literally any pointer manipulation. Not just the overhead of having increments, decrements, and possibly failed branch predictions for every single pointer assignment, but also what it does to cache locality. Deferred reference counting essentially only needs a write barrier not very different from generational or incremental forms of garbage collection. This is why tuned versions of deferred reference counting come pretty close to other tuned GC schemes, but naive reference counting falls short, even in the absence of cycles.
Cycle detection primarily affects pause time, not overhead.
Pause time is a form of overhead. It reduces both latency and throughput.
Furthermore, the RC updates needed for DRC have to be thread-safe in a concurrent scenario. I understand that Nimrod has chosen not to have a thread-safe GC, but I don't think that'll scale for systems programming: too many algorithms want shared memory, and manual memory management is much more difficult in a concurrent setting. Once you go thread-safe, RC updates become extremely expensive, while concurrent GC is well-studied and performant.
Once you go thread-safe, RC updates become extremely expensive, while concurrent GC is well-studied and performant.
This is simply false. In fact, concurrent versions of deferred reference counting have been known and studied for years [1]. The simple solution is to store the RC updates in a buffer and execute them only during a collection phase.
Pause time is a form of overhead. It reduces both latency and throughput.
Yes, but for many applications (e.g., HPC) only amortized cost is relevant.
Note also that you can do cycle detection concurrently with the mutator's operation (again, see [1]). Concurrent cycle detection makes the implementation more complicated, but no more so than a concurrent tracing scheme. Moreover, if your program is indeed performance-critical, RC schemes allow you to minimize the impact on cycle detection by designing your code so as to avoid cycles (either by using acyclic data structures exclusively or by using weak pointers). Conversely, tracing schemes make it difficult to programmatically influence worst case pause time.
> Note also that you can do cycle detection concurrently with the mutator's operation (again, see [1]). Concurrent cycle detection makes the implementation more complicated, but no more so than a concurrent tracing scheme.
I disagree—it's much more complicated. Concurrent GC is not trivial, but it doesn't have seven different colors and two different tests (sigma and delta test).
A good concurrent reference counting system is just as complex as a good tracing garbage collector, because it requires the same runtime machinery—precise stack maps, write barriers, etc., to perform backup tracing. But it also has the reference counts to deal with. It ends up being more complex overall.
> Moreover, if your program is indeed performance-critical, RC schemes allow you to minimize the impact on cycle detection by designing your code so as to avoid cycles (either by using acyclic data structures exclusively or by using weak pointers).
No, that's my point—even if you use acyclic data structures, there is no guarantee that you won't have the CC run because you had multiple references to an object and you dropped one, making the cycle collector suspect it was part of a cycle. This is the entire reason why ForgetSkippable was added to Firefox: this happened a lot in practice. The solution was to add a bunch of ad-hoc code to make the cycle collector drop objects from the purple buffer, much like the "acyclic" pragma does in Nimrod—but now you have the possibility of leaks unless this is done very carefully.
> Conversely, tracing schemes make it difficult to programmatically influence worst case pause time.
In collectors like Metronome, it's extremely easy: set the pause time to what you want it to be. That's the only solution that's really scalable in my view: working around the collector by manually telling the CC not to scan certain types feels too error-prone.
I think the slides are saying you have the ability to pass that option to the runtime. You can specify an upper bound for gc time, see here: http://nimrod-lang.org/gc.html
I have no idea what Nimrod is doing, but I suppose one could output JS where a heap is simulated through a TypedArray, such that the JS GC only has a single memory reference to keep track of. Then you could give guarantees on GC performance, given reasonable assumptions about the JS runtime.
There's no way to scan the stack in that case to find the root set, though, unless you spill all pointers to the typed array at safe points (which costs performance).
It has a lot of cool stuff, and I'm intrigued by its approach to threading. (Per-thread heaps and message passing, with an unsafe shared heap if you need it.)
How does dynamic typing work in Nimrod, if it has it? What's the equivalent of being able to call Write() on any io.Writer in Go, for example? [Edit: just found the explanation at http://nimrod-lang.org/tut2.html#dynamic-dispatch but have not yet read and digested.]
I'd love to see some open-source projects that've been written in it--the Nimrod tools and stdlib are obviously a good start. I think one of the things that really helped Go was having lots of n00b-friendly content (the tour, a short "spec", Effective Go doc, blog posts, talks); you could call it "marketing", but it really helped me, at least.
My only complaint is that my internal Python (awesome) vs. JavaScript (not so awesome but becoming standard) war gets tackled from a third side now. Another language to care about - not exactly what I needed.
Having said that: Nimrod looks beautiful and maybe it's time to hop on the bandwagon.
Am I in a time warp? Nimrod is over 5 years old. Yet most comments here think it's new and groundbreaking with statements like nimrod sounds even better than google go. Nimrod is a cool language, but am I missing something here?
Why is it confusing that an obscure programming language, on any given day, will be seen for the first time by a lot of people when it gets onto the front page of HN?
That’s normal. Languages often take five or ten years to attain any kind of popularity. It’s probably still new to many people, and even if they’ve heard of it before, they probably haven’t tried it yet.
- clean Pythonic syntax (whitespace relevant but tabs are forbidden)
- native Perl syntax for regular expressions (slide 42)
- subrange types as in Ada
- set types as in Pascal
- strings as case (switch) selectors
- easy C interface with automatic type conversion (only functions as parameters need additional compiler pragmas)
- typed macros (templates)
- object code: native, C, Javascript (probably more to come)
I am working with Nimrod right now, and it is really one of the most productive languages that I ever encountered. It seems to mix the best features of other popular languages. Nimrod actually is where Python should be. I wonder if it is suitable for small embedded systems also.
By the way, one important difference to Perl regular expressions is that re"..." always parses the whole expression where Perl parses partially. Fortunately Perl's behavior can easily be simulated with a template.