Direct Clojure to C Compiler - Based on ClojureScript

exDM69 · on July 9, 2012

Clojure is one of the cooler languages out there. There has been one major blocker for me, though: the JVM. I don't want to have anything to do with the JVM. Not because it's bad. It's just that native code is better, especially for the stuff I'm doing.

I hope these developments eventually turn into a proper Lisp that you can interface with native C code efficiently.

Racket is pretty cool too.

alberich · on July 9, 2012

I wonder what would be the advantages of using Clojure without the JVM over, say, Common Lisp. To me, it seems the great advantage of Clojure is its ease of using the large number of java libraries out there. However, if you take those libraries out, wouldn't a mature Lisp implementation like SBCL be much better?

amouat · on July 9, 2012

There are still some nice things such as destructuring and vectors & maps as base types. The syntax of Clojure is generally a bit cleaner and more concise, though I'm sure some lispers will disagree.

pjmlp · on July 9, 2012

There a few differences that might be worth still.

Clojure is a Lisp-1, and provides better data structures as Common Lisp as far as I am aware.

But the JVM benefit is a big one indeed.

alberich · on July 9, 2012

I thought that, besides the JVM itself, the advantages of Clojure would be its concurrency model - as CL standard doesn't define anything related to this - and its stronger functional approach.

I don't fully grasp the advantages of being a Lisp-1 or Lisp-2 :)

pjmlp · on July 9, 2012

> I don't fully grasp the advantages of being a Lisp-1 or Lisp-2 :)

It has to do with variable scoping. Lisp-1 only accepts lexical scoping, while Lisp-2 also allows for dynamic scoping.

http://www.nhplace.com/kent/Papers/Technical-Issues.html

mahmud · on July 9, 2012

Good grief.

No.

Lisp-N has to do with the number of name spaces in the language. Lisp-1s only have ONE environment; the variables range over all values of the language, integers, functions, vectors, symbols, strings. In Lisp-N, there are N namespaces for each "kind" of value.

Common Lisp is called a "Lisp-2" because it separates functions from other values. (it's closer to Lisp-5 or even Lisp-7, forgot which, because it allows that many kinds of "names" without collission.

Also above you said Clojure "probably" has better data structures than Common Lisp. That is unsubstantiated.

yoklov · on July 9, 2012

For what it's worth, Clojure has implementations of Phil Bagwell's HAMT-based hash-maps, vectors, and sets. These are absolutely amazing data structures, and offer functional updates, a list-like API (O(log_32(n)) for most operations, which ends up being less than seven operations in practice, which is similar enough to O(1) for most programmers).

I'm not much of a common lisp programmer (mostly sticking with scheme/racket and Clojure), but I don't think they've been ported to Common Lisp (yet), so the claim that Clojure has better data structures than Common Lisp, while somewhat a matter of preference, isn't entirely unsubstantiated.

mahmud · on July 9, 2012

Common Lisp has hash-tables, vectors, and sets .. had it, in an ANSI-specified document since 1984. The implementation details of these is, well, implementation dependent, be they HAMT or whatever.

The language provides a wealth of tools out of the box, but it's the job of the programmer to choose more appropriate tools for the problems at hand.

The following is the Common Lisp specification. 12+ implementations, half of them industrial strength, more than half commercial vendors, and everyone of them has to adhere to this spec:

http://www.franz.com/support/documentation/8.2/ansicl/ansicl...

swannodette · on July 9, 2012

But does Common Lisp allow multiple implementations of those data structures to exist in the same program actually allowing programmers to utilize the best tool for the job?

mahmud · on July 9, 2012

Are you seriously asking me if a programming language allows for the implementation of data-structures?

If you mean can users extend the core CL data-types, no, but you have a wealth of tools to do that yourself; use generic functions to implement whatever interface you wish, then provide for specific implementations.

swannodette · on July 9, 2012

but you have a wealth of tools to do that yourself

That's the problem. With Clojure you don't have to invent interfaces and then convince other libraries to use them. All the data structures are built upon well considered language level protocols that anyone can implement preventing pointless API proliferation.

mahmud · on July 10, 2012

Clojure has the benefit of hindsight; it is one of the newest programming languages and it is not constrained by a published standard, nor does it need to cater to tens of vendors or any existant code-bases (Nearly every user of Clojure is on github ;-)

honr · on July 14, 2012

I think wise people prefer programming in good languages that enjoy the benefit of hindsight. Common Lisp does not have the luxury of hindsight, and can hardly change its specification to reflect modern theoretical and practical PL results.

Personally, I slightly prefer Clojure over SBCL.

jules · on July 9, 2012

Note that CL's data structures are mutable, whereas Clojure's are persistent.

swah · on July 9, 2012

Google search saver: immutable = persistent

postfuturist · on July 9, 2012

yoklov, for some reason I can't reply directly. There are a few common lisp libraries of persistent, immutable data structures that perform well. This one is implemented most closely to Clojure's: https://github.com/ks/X.FDATATYPES . This one has been around for a long time and is quite stable: http://common-lisp.net/project/fset/Site/index.html .

pjmlp · on July 9, 2012

> Also above you said Clojure "probably" has better data structures than Common Lisp. That is unsubstantiated.

It is my understanding that Common Lisp lacks the vector and hash maps that Clojure has.

Sorry if I got it wrong. I am more a ML guy than Lisp.

aidenn0 · on July 9, 2012

http://www.lispworks.com/documentation/HyperSpec/Body/t_vect... http://www.lispworks.com/documentation/HyperSpec/Body/t_hash...

pjmlp · on July 9, 2012

I was wrong it seems.

From the documentation I have not seen the simplified syntax Clojure uses for vectors, hashes and sets.

I imagine that the syntax is Clojure specific.

mahmud · on July 9, 2012

Your understanding continues to be wrong.

spacemanaki · on July 9, 2012

edited bc mahmud said it better, but

Also, Clojure supports (optional) dynamically scoped variables.

FrojoS · on July 9, 2012

Prototype on the JVM but compile to C for production or performance critical tests? Ideally, no refactoring required at all.

I guess it also helps to work with a dialect with more momentum. E.g. its easier to find support, on- and offline.

Since I haven't really used Closure or any other Lisp for more than a tutorial, I might be very wrong, though.

Raphael_Amiard · on July 9, 2012

Clojure -> C will very probably be slower than Clojure -> JVM, at least if the compiler is naive.

pjmlp · on July 9, 2012

There are JVMs with native code generation like you would expect from a normal C compiler, they are just a bit too expensive.

Here is one example, http://www.atego.com/products/aonix-perc/

exDM69 · on July 9, 2012

Yes. Java is getting closer and closer to native code performance. In some rare cases it can even beat C. But this is after billions of dollars have been thrown at trying to make Java fast.

If you intend to distribute your software (as opposed to running on your own servers), it would require all the customers to have a fancy JVM. Or buying an expensive license to be able to distribute the runtime with your software.

markokocic · on July 9, 2012

> Yes. Java is getting closer and closer to native code performance. In some rare cases it can even beat C. But this is after billions of dollars have been thrown at trying to make Java fast.

Why is it a bad thing that someone was willing to pay money to make Java fast. Do you want to say that there haven't been billions of dollars in both money and man-hours thrown at trying to make C/C++ fast? There's no free performance.

exDM69 · on July 9, 2012

> Why is it a bad thing that someone was willing to pay money to make Java fast.

I guess there's someone who wants to pay for that.

But I don't think it's money well spent. The Java bytecode is not designed for or suited to fast execution and it's a bad intermediate representation for computer programs to be compiled into native code. The Java byte code was designed to be easy to interpret and was designed at a time when a memory access was faster to execute than an ALU instruction (on cheap h/w where Java was originally intended to be run on). Those days have passed and we should move on to better intermediate representations for computer programs. Like LLVM IR.

pjmlp · on July 9, 2012

You mean VM Kit then, with AOT compilation for Java, just like Clang but for Java.

http://vmkit.llvm.org/

exDM69 · on July 9, 2012

Except that I don't want the Java part of it. I'd much rather just use Clang and C.

VMkit is a nice effort, though. I hope it will some day power fancy new languages that have dynamic features which are not well suited for AOT compilation.

pjmlp · on July 9, 2012

What part did you not understand from my post?

These are native code compilers without any need of extra runtime exactly like C.

Just because most people only know Oracle(Sun)'s JVM, it does not mean it is the only way to execute Java code.

Raphael_Amiard · on July 9, 2012

Why do you prefer native code ? What do you even mean by native code ? The JVM does produce native code, it just does it at runtime (JIT compilation).

So maybe you mean static compilation, but then, why is it better for the stuff you're doing ?

exDM69 · on July 9, 2012

> Why do you prefer native code ? What do you even mean by native code ? The JVM does produce native code, it just does it at runtime (JIT compilation).

By native code I mean binaries emitted by my compiler in a format that my operating system can link and load. I usually write the code in C with some CPU-specific assembly intrinsics for the critical stuff.

I tend to write computer graphics and physics simulation stuff which has to be quite fast. A lot of stuff I deal with is moving blobs of binary data from buffer to another (disk, memory, dma, gpu). Or computing a SIMD fused multiply-and-add for a few hundred megabytes of floating point data. Or invert and transpose a hundred thousand 4x4 matrices. Everything the JVM does is acting against me, so better just write native.

Sure there are games and physics and whatnot written with Java and other systems, but they are more commonly "low perf" stuff. 2d games and that kind of stuff. And even then they suffer from JIT & GC pauses and other undesirable effects. It always ruins my day when in the middle of a Minecraft session the GC kicks in while running away from a Creeper.

rayiner · on July 9, 2012

> I tend to write computer graphics and physics simulation stuff which has to be quite fast. A lot of stuff I deal with is moving blobs of binary data from buffer to another (disk, memory, dma, gpu). Or computing a SIMD fused multiply-and-add for a few hundred megabytes of floating point data. Or invert and transpose a hundred thousand 4x4 matrices. Everything the JVM does is acting against me, so better just write native.

The JVM is very fast for these things. There remaining delta is the result of more investment into optimizations in GCC/LLVM than any limitation of the JVM model itself.

jwr · on July 9, 2012

What makes you think that code compiled through C suddenly loses the need for GC?

GC, allocation, thread management, OS interfacing and compilation (JIT or AOT) together make up a VM. Removing the compilation part doesn't make the rest of your VM requirements disappear.

Having used a number of various VMs in the Common Lisp world (yes, including one implementation that went through C) I value the JVM greatly. It is an impressive engineering effort.

Also, I wrote code like yours (hand-written x86-64 SIMD assembly). I then used it from Clojure through JNA and was quite happy with the results.

exDM69 · on July 9, 2012

> What makes you think that code compiled through C suddenly loses the need for GC?

Yes, you might need to free some heap memory with C and having malloc/free cause GC-like performance. That's why I have zero mallocs in the fast path. The bad part in GC is the unpredictability, which makes it unsuitable for games, simulation and multimedia stuff. But for most allocs I can get away with using the stack, which is essentially free (a possibility of a page fault exists).

> Also, I wrote code like yours (hand-written x86-64 SIMD assembly). I then used it from Clojure through JNA and was quite happy with the results.

Yes that works quite well as long as the foreign function is big enough.

What I have is a bunch of code written with C intrinsics, mostly small functions which I use a handful of for a big bunch of data. What I want is that inner loop to be efficient yet readable.

With C and intrinsics I get interprocedural optimization between the small functions. Everything gets inlined and all the values are stored in registers. There are a few hundred SSE math instructions without a single memory operation in between. I get the compiler to do register allocation for instruction scheduling for me, which it is great at (and I suck in) and leaves me with the instruction selection which it is not so great at.

Now had this been written with any higher level language, or even C across translation units (without link time optimization), my resulting code would be a lot worse. In particular, live values get spilled out of registers to memory when theres a foreign function call (as per abi requirements).

The non-fast path of my project could very well be written in any language as long as the beef would be written in C or similar. I would have to avoid allocating new heap objects to avoid GC unpredicatbility.

exDM69 · on July 10, 2012

Misread the part about the GC. Sure you do need a GC when using C as an intermediate language. If done well, you can get long pieces of code with no need to touch the heap, and hopefully your backend (the C compiler in this case) will be able to optimize it, at least a little. I'm sure LLVM would do a lot better job there.

jbooth · on July 9, 2012

That's all well and good, but wouldn't any LISP be way insufficient for your needs then? I mean, Java's going to be closer to C's performance than any LISP dialect until we have a "sufficiently smart compiler", right?

heretohelp · on July 9, 2012

This is the sort of thing I'm talking about, when I talk about the insufficiency of Java and other VM languages for high performance programming.

It'd be nice if somebody made a new systems language to succeed C or C++. Rust is looking less likely these days. Maybe Clay or Deca?

raphinou · on July 9, 2012

Why is Rust less likely these days?

heretohelp · on July 9, 2012

A thousand papercuts and a preoccupation with things that don't make it nicer to use. Supremely practical in a sense though.

It's like the inverse of golang. golang is unimpressive, and totally impractical for systems programming, but is nice to use for general purpose programming.

Rust is impressive and could'ish be used for systems programming (I'm not sure how much uphill fighting is involved with the current memory management model, it seems ARC is the default which makes me intensely uncomfortable about its prospects for 'hard' systems programming) but it's not very pleasant at all. The syntax and semantics make me nauseous.

Rust can be fixed, golang cannot. Unlikely though.

I'm not sure it's more realistic to make a practical implementation of region-based memory management than it is to ameliorate the pains of day-to-day systems programming in C.

There is a lot of low-hanging fruit when it comes to systems programming if you're allowed to make a language from scratch and they've not really capitalized on much of that due to ambition.

regularfry · on July 9, 2012

I can't answer for the GP, but native code is generally better for me because of the startup time.

batista · on July 9, 2012

>Why do you prefer native code ?

For me, because I don't want to install the JVM, or force users of my stuff to install the JVM. I also don't care about JVM libs.

>What do you even mean by native code ? The JVM does produce native code, it just does it at runtime (JIT compilation).

I, for one, mean non-JIT, no virtual machine needed, native code.

>So maybe you mean static compilation, but then, why is it better for the stuff you're doing ?

Less startup overhead. Small binaries. Easily redistributable. No stuff I don't need.

sanxiyn · on July 9, 2012

Does GCJ qualify? No JIT, no VM, native code, less startup overhead, easily redistributable.

DanWaterworth · on July 9, 2012

I'm asking because I'm in the process of writing my own lisp->c compiler:

Are you forced to use as environment like:

  typedef struct environment {
    struct environment *up;
    int num_bindings;
    value_t *bindings [0];
  } environment_t;

Or would it be possible to have a dynamic vector as your environment (so that you get O(1) lookup) at the cost of making constructing closures more costly (as you'd have to copy the bindings)?

jules · on July 9, 2012

Sure, you can do that. Besides the closure allocation cost you have to be careful if your Lisp supports setting variables directly:

    (define x 2)
    (define y (lambda (z) (+ x z))
    (set! x 3)

Now you want the `x` referenced inside the lambda to refer to the new value of `x`. So if closure creation copies bindings in the parent scope then you have to introduce an extra indirection for mutable variables. This is called "assignment conversion".

    (define x (make-ref 2))
    (define y (lambda (z) (+ (read-ref x) z))
    (set-ref! x 3)

Note also that it is unnecessary to store the number of bindings in the environment, because this number is implied by the lambda associated with the closure.

DanWaterworth · on July 9, 2012

Awesome, that's exactly the answer I was hoping for. Is this a common way of compiling lisp or does it actually make things slower in the general case?

jules · on July 9, 2012

Both methods have pros and cons, and the most advanced implementations use a hybrid method. The copying method is probably the best to start with. It may make things slower in special circumstances, but in the common case it is faster because lambdas tend to have a small number of free variables. It is also way simpler, because linked environments have a tendency to retain garbage if you are not careful. For example:

    (let ((x 1) (y 2)) 
       (lambda (z) (+ x z))

If you use linked closures naively then the closure will keep `y` in memory even though it's garbage.

However, keep in mind that what you usually not think of as free variables are free variables. For example if you have this:

    (let ((y 4))
       (map (lambda (x) (factorial (+ x y))) xs))

Then not only `y` but also `factorial` and `+` are free variables. With some work you can avoid storing those references in the closure. I believe that the state of the art is described in this paper: ftp://ftp.ecn.purdue.edu/qobi/99-105.pdf

DanWaterworth · on July 9, 2012

All good points. I will surely read that paper, thanks for all your help.

exDM69 · on July 9, 2012

Depending on what type of Lisp you're compiling, you may be able to resolve some locations at compile time. See section "lexical addressing" in SICP chapter 5, here: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-35.html...

DanWaterworth · on July 9, 2012

Thanks, I appreciate the link.

mark-probst · on July 9, 2012

A dynamic vector will not work in the general case because environments can "fork". You can, however, have shallow environments, where making a new environment based on an old one involves copying some or all of the old environment's fields. That can become a problem, however, if your language supports set!ting local variables - the local variable might be represented in more than one environment, so you'd have to keep track of that and do more than one store.

See Andrew Appel's "Compiling with Continuations", for example.

DanWaterworth · on July 9, 2012

Could you give an example of where an environment can fork. Do you mean when you create a closure?

mark-probst · on July 9, 2012

This example is completely contrived, of course:

  (let [x 1]
    (defn make-adder [y]
      (fn [] (+ x y)))
    (defn make-multiplier [z]
      (fn [] (* x z)))
    (defn mutate! [new-x]
      (set! x new-x)))

Now for

  (make-adder 2)

we get a function whose environment includes x and y, whereas the environment for

  (make-multiplier 2)

includes x and z. Assuming you have shallow environments, you'll have one environment [x' y] and another one [x'' z], whereas with deep environments (like in ClojureC) you'd have [[x] y] and [[x] z] where the [x] is shared between them. Now, if you call

  (mutate! 3)

with deep environments you only need to store the 3 once (to x), whereas with shallow environments you'll need to store to x' and x''.

DanWaterworth · on July 9, 2012

Sure, you can come up with pathological cases for either option. For example, with deep environments, the following is inefficient:

  (lambda (a) (lambda (b) (lambda (c) (lambda (d) (lambda (e) a)))))

rayiner · on July 9, 2012

The technique used in most Lisp compilers (SBCL, etc) is as follows:

Say you have an outer lambda and an inner lambda. All variables (either arguments or local variables introduced by "let") defined in the outer lambda and used in the inner lambda are called the "free variables" of the inner lambda and the "closed over variables" of the outer lambda. Of the closed over variables of the outer lambda, all the ones that are assigned-to in the inner lambda are "assigned to variables." You can perform this classification in one pass before code generation.

Once you have annotated each lexical variable with its "closed over" and "assigned to" status, you can generate flat C functions in one code generation pass. The idea is to generate each inner lambda as a C function that accepts a hidden "environment vector" that contains the closed-over variables in the outer lambda. This environment vector is indexed simply by an integer index. Just handle the following AST constructs as follows:

LET: if the lexical variable introduced by the LET form is closed-over and assigned-to, it must be demoted to a box because the inner lambda can't modify the outer lambda's stack frame. Whereas you can generate a regular lexical variable directly as a C local variable, you have to generate a closed-over + assigned-to variable as a C variable that points to a single heap-allocated box that references a Lisp value.

VARIABLE-REF: for every reference to a variable, you have to look at it's status. If the variable is defined in the current lambda and not assigned-to, you can just emit a use of the C variable directly. If it is defined in the current lambda and assigned-to, you have to emit a load of the value from its box, a pointer to which is in a C variable in the current function. If the variable is defined in an outer lambda (it's a free variable of the current lambda), but not assigned-to, you have to emit the reference as a load (with a known constant index) from the environment vector. If it's a free variable and assigned-to, you have to load a reference to a box from the environment vector, then load the value from the box.

LAMBDA: assign each free variable of the lambda an index starting at 0 (which you will use when emitting references to free variables in the body of the lambda). Generate the LAMBDA expression as the creation of an object with two slots: a pointer to the C function of the inner lambda, and a pointer to an environment vector. Emit code to copy each free variable referenced in the inner lambda into the environment vector. Remember you might have multiple levels of nested lambdas, so follow the rules for VARIABLE-REF above when emitting each copy.

Then, just structure your calling convention so you can FUNCALL lambda objects and pas the environment vector as a hidden first argument.

Note that this requires a bit more environment book-keeping in your compiler than you might otherwise have. You really want each inner lambda to have a complete environment, including free variables. This is so you can keep track of which variable has which index in the environment vector. Say you have:

    (defun foo()
        (let ((x 0) (y 1) (z 2))
            (labels ((bar () (+ x y))
                     (baz () (- z x)))
                ...)))

In "bar" your free variable indices probably look like {x: 0, y: 1} while in "baz" your free variable indices probably looks like {z: 0, x: 1}. You need to be able to keep track of these mappings separately, which you can't do if you attach these mappings in the environment structure you use for the LET form. Instead, you need to have a separate environment structure with entries in BAR and BAZ for the free variables of each lambda.

In practice this generates very good code, because most lambdas simply don't have very many free variables (at least so long as you don't naively generate LET forms as nested lambdas).

DanWaterworth · on July 9, 2012

That's interesting. I was planning on having a pass that eliminated let expressions, turning them into immediately applied lambdas, but having read your comment, it sounds like this isn't such a good idea. I'll have to think about this.

rayiner · on July 9, 2012

I personally don't think it's worth the minor conceptual simplification to treat LET forms as lambdas, but YMMV. It's only slightly more complicated to handle both LET forms and LAMBDA forms as things that can introduce values into the compile-time environment, but substantially more complicated to do the unboxing and inlining you'd have to do to ensure that the common case of defining a bunch of local variables that are not closed over does not end up consing or performing a function call.

densh · on July 9, 2012

At first one should expect it to be at most as fast as cpython. If you don't do type inference and a bunch of other advanced functional language friendly optimizations you will get quite poor performance compared to JVM. Even haskell which does a lot of those isn't overperforming JVM yet. (Memory usage is much better though.)

http://shootout.alioth.debian.org/u64/benchmark.php?test=all...

igouy · on July 9, 2012

>>Memory usage is much better though.<<

"Don't confuse differences in default memory allocation with differences in Memory-used when the task requires programs to allocate more than the default memory."

Look at memory use for reverse-complement, k-nucleotide, regex-dna, binary-trees, mandelbrot.

dons · on July 9, 2012

And now we can look forward to 20 years of research on very fast runtimes for executing Clojure ...

http://research.microsoft.com/apps/pubs/default.aspx?id=7985...

markokocic · on July 9, 2012

Clojurescript is an interesting language, with great potential for easy porting. I wish there is a direct JVM port of Clojurescript.

I would be able to leave with Clojurescript limitations compared to Clojure if it provides smaller jar files and better performance of compiled code. That would make it really usable on constrained targets like Android.

andrewcooke · on July 9, 2012

obvious question here is: comparative benchmarks, anyone?

mark-probst · on July 9, 2012

It's too early to do benchmarks. Nothing in ClojureC is optimized at this point, so it would do very badly. I'm focussed at making it work first, then on making it fast. We'll get there.

Raphael_Amiard · on July 9, 2012

That brings an interresting other question : How far are you into making it work completely ?

mark-probst · on July 9, 2012

I'd say - optimistic estimates (for pessimistic, multiply by 2) - core language 1 more weekend, core library 2 more weekends after that. So far it's been about 3 weekends of work.

Contributors are always welcome ;-)

Raphael_Amiard · on July 9, 2012

I'll finish up cljs/lua and i might give a hand :)

mark-probst · on July 10, 2012

I'm looking forward to it! :-)

pjmlp · on July 9, 2012

Cool, next step directly to native code skipping C. :)

jared314 · on July 9, 2012

Or LLVM

TazeTSchnitzel · on July 9, 2012

Great, now you can compile Clojure to C, then back to JavaScript with Emscripten! :P

markus2012 · on July 9, 2012

Sounds like a nice obfuscator.

I'm curious what the end result would look like! :-)

TazeTSchnitzel · on July 9, 2012

Well, to do the full obfuscation, we would need to compile Clojure to C, then to JavaScript, then run it through Google Closure compiler. And then try to comprehend the mess that results.

goggles99 · on July 9, 2012

Interesting but ultimately useless. I see something like this come around a couple of time per year. Always fades into oblivion within a year or two.

The C code will not be optimized C (like a seasoned C developer would write). Java is compiled to byte code which is compiled to machine language. This ugly, unoptimized, generated C is compiled into machine language too. The performance (both memory and CPU usage) of the JVM product will be far superior through.

Don't think that you will be writing for embedded devices or device drivers with something like this. it will be a mess. Use C for things that should to be done in C. For Desktop apps, Web apps, or services use Closure ETC.

I don't like seeing a square peg rammed into a round hole. It does not work well so don't get too exited lisp programmers. This is an interesting hobby - that is about it.

Xcelerate · on July 9, 2012

Your post adds nothing of value to the discussion. You say this project will fade into oblivion -- how do you know this? Do you know, or are you familiar with, the developer's work?

How do you know that the C code, when passed through an optimizing compiler, will not be able to compete with the JVM? Have you done these tests? The author already mentioned on this page that the goal is to get it working, and then focus on performance. As far as I can tell, he is a "seasoned C developer" and should be able to construct the generated C so that it compiles efficiently.

Without more than mere pessimistic speculation, your failure to contribute any sort of primary or objective information does little more than to make you appear curiously angry at a project in which you have no interest or affiliation.

EDIT: Just read through your comment history. A lot of your posts are written in this sort of negative tone. Why do you do that? If you would provide some links or alternative viewpoints, substantiated with a paper or even a blog post, that would be a lot more insightful.

goggles99 · on July 9, 2012

Poorly written C code run through an optimizing compiler will only produce slightly better (at best) executing machine code.

Why do this? this should compile to native machine language directly if anything. Would you transform it to C first so you could edit the C? Then what if you want to make a big update using the Closure source? it will overwrite your changes.

This approach is not practical, perhaps I am failing to understand any logic as to why you would go with this approach. Portability (between processor architectures) is all I can think of but you will never achieve competitive performance if this is the case.

Good C coders write their code differently against different processors and platforms because all machines have different capabilities and resources (strength and weaknesses). The whole point these days of writing something in C to achieve the second highest performance (only behind pure assembly language) in both processor gpu/cpu/other co-processors and at the same time using as little memory as possible (both ram and NV storage).

Xcelerate · on July 9, 2012

Okay, now we're getting somewhere. I'm still not sure about the assumption you make in the first line (1. that it will be poor quality C code 2. that its compilation will result in inefficient machine code) -- we would actually need to test it out in order to find out whether that is true.

Although I can understand your point in the second paragraph. Indeed, C is a language that is nowadays mostly suited for performance-oriented application. The reason the author translates Clojure code into C, and then into machine code is that this is vastly easier than going straight to native code. As you say, there are so many platforms with different sets of instructions that it would be almost pointless to write an optimized compiler directly from Clojure code to all these platforms, particularly since Clojure is still a somewhat obscure language.

The value in doing this is that Clojure promotes good programming practices. In turn, this means that someone who has little experience with embedded systems or hand-optimizing assembly can write better performing code in Clojure -> C -> assembly than if they tried to use C without any experience. Sure, it won't be as fast as if they were an experienced assembly guru, but the difference will only be a factor of 1-2 -- the algorithm is much more important than the compilation, and this is where Clojure shines.

One other use is when a project is not performance-oriented, but development-oriented. A fast development cycle is in a lot of situations much more important than the speed at which the code runs. As Clojure is currently limited to the JVM (and Javascript maybe??), extending this support to C now extends the domain of Clojure to almost anything.

goggles99 · on July 9, 2012

A language from an entirely different paradigm (Functional language, Garbage collection, tons of abstractions and a huge API) cannot ever be a replacement for or even compete on the same playing field C. We are talking apples to oranges here.

These languages are not comparable equivalents. Their specialties are completely opposite. To make their output equivalent, you would have to construct an AI system more advanced than anything that exists today.

Turing_Machine · on July 10, 2012

"To make their output equivalent, you would have to construct an AI system more advanced than anything that exists today."

Umm......

http://en.wikipedia.org/wiki/Universal_Turing_machine

goggles99 · on July 9, 2012

Name a place (besides IOS) where this will enable Closure to run that does not already have a JVM? Now tell me if it would be practical to run it there (printer firmware ETC) VS hand coded C.

Speaking of IOS - This IS an area where this may be practical, but that is only because of politics (Apple's tyrannical restrictions). But there are already closure compilers for IOS.

pjmlp · on July 9, 2012

> printer firmware ETC

Ricoh printers have a JVM builtin, http://ricoh-ridp.com/sdkj

goggles99 · on July 10, 2012

Any embedded device programmer will tell you that a VM in a printer is an expensive novelty. Putting a VM on an embedded device will incur extra overhead in both memory and processor requirements. This will only raise the processor cost by a couple of dollars likely, but quantity is always the problem here. If HP ships 2 million LaserJet X's in a given year, the use of the chip just cost them 4 million dollars extra. Now which is cheaper? hiring 2 C gurus to program the firmware at 150k per year, or I Closure programmer off the street at 90k per year (don't forget the 4 mill in hardware cost differences)?

Simple math.

pjmlp · on July 10, 2012

While I agree with you, all the Ricoh printers in our company have a VM installed.

And I can tell you that a well known mobile company originally from UK, available in most European countries, makes use of Java for Smartcards on their SIM cards.

You made your math wrong. Nowadays, specially if we put outsourcing in the equation, hiring Java developers is way cheaper than hiring C developers, specially since many universities don't teach C any longer.

mark-probst · on July 9, 2012

Compiling to machine language doesn't automatically make code fast, neither does compiling to C make it automatically slow(er).

There is very little about C that makes this inherently slow, and almost all of the stuff one has to do to make it fast has to be done when targeting machine code as well.

On the other hand the C compiler does lots of things very well that would otherwise have to be coded by hand, like register allocation, interfacing with existing code, and, yes, portability.

goggles99 · on July 9, 2012

This IMO is an example of using the wrong tool for the job.

Closure is awesome and has it's place, C is the industry standard for performance and also has it's place. The lines will never be crossed between these two languages.

You are trying to fit a square peg in a round hole. If you improve your skills from this project - great, but please don't try to mislead anyone that this is a viable replacement for C (be up front about this).

A language from an entirely different paradigm (Functional language, Garbage collection, tons of abstractions and a huge API) cannot ever be a replacement for or even compete on the same playing field C. We are talking apples to oranges here.

goggles99 · on July 9, 2012

Why do this? compile to native code directly if anything. Would you transform it to C first so you could edit the C? Then what if you want to make a big update using the Closure source? it will overwrite your changes.

This approach is not practical, perhaps I am failing to understand any logic as to why you would go with this approach. Portability (between processor architectures) is all I can think of but you will never achieve high performance if this is the case.

C coders write their code differently against different processors because all machines have different capabilities and resources (strength and weaknesses).

spacemanaki · on July 9, 2012

Compiling to C is not at all an uncommon strategy, there's at least one pretty serious Scheme implementation [1] that does this. Portability between architectures is a pretty big advantage, but in this case another one is leveraging the Boehm GC, rather than rolling your own. Could you use the Boehm GC if you emitted assembly? I would guess maybe it's possible, but you'd have to emit assembly that followed C calling conventions, I would imagine. Emitting C might be easier. Lisp in Small Pieces has a pretty big chapter that goes into detail how to compile Lisp to C.

[1] http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_P...