Hacker News new | past | comments | ask | show | jobs | submit | mfp's comments login

The new module organization avoids collisions with existing libraries like Extlib while allowing you to link Batteries' modules selectively, thus decreasing the size of the generated executables.

For convenience, all the modules are also available under the Batteries namespace. In this case, however, all of Batteries' code will be included in the executable --- that's how OCaml's linker works at the moment.

For instance, if you only use BatList in your code (or do module List = BatList to refer to it with that name), only that module will be included in the binary; if you do

    open Batteries
and then refer to the module simply as List, all the other modules in the Batteries hierarchy will also be included in the binary.


What is needed to make something a "real" web framework? Ocsigen/Eliom does handle sessions, routing, page parameters, forms, continuation-based sites, templating and other things you'd expect from a web framework. The main thing Rails includes but Ocsigen/Eliom doesn't is an ORM.


IO was only (barely) disk-bound when you gave a full core to the reader (i.e., you have to be careful not to use HW threads on the same core). This is not a problem specific to OCaml --- I reproduced it with a standalone program written in C that simply read the file: as soon as you have more stuff running on the same core (in different hardware threads), the IO performance drops. See http://groups.google.com/group/wide-finder/browse_thread/thr...

Also, and this came as quite a surprise, it turns out that mmap is slower than read(2) on the T2K.


Wow, interesting. Thanks for the info and the link. This is a surprising and disappointing aspect of this CPU.

Do you know if there's something about this (seemingly trivial) workload that is pathologically bad for this processor, or do you believe that the CPU is just wimpy? I've only ever had a glossy spec-sheet-level introduction to these at work. Given this load I don't see the value of "4 threads per core" that they proclaim on http://www.sun.com/processors/UltraSPARC-T1/specs.xml


It's just that the hardware threads are slow, I think --- compiling stuff on the T2K also took forever. It also seems to me that there's seemingly little value in having 4 threads per core: it forces you to parallelize programs that ran fine with normal cores just to match the performance you'd get without hardware threads...


"The ruby version is single threaded and the test is on a 32 core workstation (counting CPU time Python is only 4x faster and OCaml is only 17x faster)"

I mentioned that on my blog. I also have an OCaml version that is 25x faster in CPU time (i.e., as fast as the top C++ entries), but barely faster regarding wall clock time. It takes a couple dozen extra lines.

Keep in mind that the Wide Finder 2 benchmark was about parallelism from the beginning; I said the Ruby version was naïve precisely because it wasn't parallel. The fact that the language did matter to this extent came as a relative surprise, because the most expensive operations in the Ruby version actually take place in its core classes, written in C. It's just that it's so slow everywhere else that the overall performance is still an order of magnitude worse.

"The author admits that there were multiple stabs at the OCaml version. What savings could come from optimizing the code in other languages?"

There are three OCaml versions, listed on the result table http://wikis.sun.com/display/WideFinder/Results

AFAIK other entries received considerable optimization effort (I'd even go further and say that most involved more) --- several went through half a dozen revisions, even if the wiki doesn't reflect it.

You can take a look at the wide-finder mailing list to see how often each participant was using the T2K (we used the ML to reserve time slots): http://groups.google.com/group/wide-finder/topics?hl=en&...

wf2_multicore.ml was the first version I ran against the full dataset on the T2K, and did quite well (8 minutes). The 2(?) first runs crashed because I exhausted the memory space of the T2K, but the 3rd one completed successfully.

wf2_multicore2_block.ml took considerably more time because I switched from line-oriented to block-based IO --- basically the technique all the fast implementations used.


OT: According to wikipedia, the PAVE PAWS phased array operates in the 420-450 MHz UHF range, with a corresponding wavelength of 66cm; if the error margin is really 0.1*lambda, the system is built to a precision of 6cm, significantly easier to meet than 0.1mm.

Also, it would be easy to compensate any error by controlling the phase of the fed signals.

See http://en.wikipedia.org/wiki/Ballistic_Missile_Early_Warning...


Thanks!

I was thinking of that very link when I wrote this. Dunno know why I didn't look more carefully for it.

The number L = lambda/10 is actually part of the standard definition for a microwave system. Any circuit that has a physical dimension greater than L is too "big" to use standard circuit analysis, because the voltage and current values you measure at different points will be different due to propagation delays, even when those points are connected by a wire with no reactance and negligible resistance at lower frequencies.

The world's biggest microwave system (biggest machine period, actually) is the US power grid, even though it oscillates at a measley 60 Hz. Lambda is about (.6c / 60 Hz) = 1800 miles.

This is proof that, electrically as in so many other things, the Midwest is 180 degrees out of phase with the rest of the country.


The code has grown and become more complex in the last versions because I optimized it until directory traversal + glob matching got sensibly faster than Git's own (git-ls-files --exclude-standard doesn't do exactly the same thing as find-git-files, but meaningful performance comparisons can be done with the -d -o and -m -o options). It's still much smaller than metastore (1/3rd or so) even though it does much more now, however.

You can take a look at the initial version of ometastore here: http://eigenclass.org/repos/gitweb?p=gibak.git;a=blob;f=omet... The functionality from metastore takes ~270 lines vs. ~1500 lines of C, the support for .gitignore takes another 70locs.

It's got one or two bugs in the Gitignore support which I fixed later (and a silly bug in do_finally), but this code is simpler if you want to see what OCaml looks like in actual use (for a system tool, in this case). It's almost (if not actually, I don't remember) the "first version that typed", by the way.


Good find. I don't see how this keeps OCaml "from being a first choice for server-side development" while making it an acceptable language for client-side development, though.

There are a several hackish polymorphic print implementations, but the best solution so far seems to be the "deriving" camlp4 extension (http://code.google.com/p/deriving/wiki/Introduction). This looks pretty good:

     type 'a tree = Leaf of 'a | Branch of 'a tree * 'a * 'a tree
	 deriving (Show)
     
     type point = { x : float; y : float }
	 deriving (Show)
     
     let points = Branch (Leaf {x=0.0;
				y=0.0;},
			  {x=2.0; y=2.0},
			  Branch (Leaf {x=1.0; y=1.0},
				  {x=1.0; y=0.0},
				  Leaf {x=0.0; y=1.0}))
    		      
    Show.show<point tree> points
    =>
    "Branch
       (Leaf {x =0.; y =0.}, {x =2.; y =2.},
	Branch
	  (Leaf {x =1.; y =1.}, {x =1.; y =0.}, Leaf {x =0.; y =1.}))"


Maybe the lack of parallelism in OCaml's threads at the time he wrote that?

There are now at least two solutions to obtain speedups on multi-core and multi-processor machines plus scalability by allowing seamless distributed processing: the JoCaml extension, which integrates the join calculus (http://jocaml.inria.fr/), and coThreads (http://cothreads.sourceforge.net/), which comprises shared-memory (with extensions like STM) and message passing while keeping backwards-compatibility with the original Threads library.


I've written between 50 and 100KLoCs of OCaml code over the last year (before that, I did mostly C and Ruby, which I've been using since 2002; touched many other languages but never did anything significant, over 5-10,000 lines with them). If I had to describe the language in two words, I'd say that it's practical and loyal.

It's a loyal language because it doesn't bite you in the ass when you don't expect it. It's practical because there's a decent number of (high-quality, in general) libraries available, there are several concessions to serviceability in the language (mainly the ability to combine imperative and functional styles) and the implementation is solid and stable.

I haven't experienced the problems with the type system I've seen some people complain about. On the contrary, I've found it to be immensely helpful both when exploring new ground and when refactoring code. Deliberately breaking the code by changing a type or a function and letting the compiler guide you is a joy. In addition to other well-known benefits (Caml riders often feel that "it works as soon as it types" for a reason...) I won't repeat here, the type system (in particular the module system) sometimes makes me realize that I'm following the wrong track (I've learned to love functors after the early troubles).

Another thing I appreciate very much is the excellent performance and its predictability (other people might not care about this). The compiler doesn't (nor needs to) do deep magic the way GHC does to yield good results, so you can easily predict the performance (speed & memory usage) of your code --- and improve it when needed. Joel Raymond tells how this feels perfectly: "I would describe working with OCaml to guiding a scalpel: you move it and things happen. Right now, right here, in real-time. Compilation time is almost unnoticeable, the tool is powerful but reasonably simple. I have no problem expressing the most complex things and moving the project forward at a fast clip. I'm enjoying myself tremendously at the same time!" (Joel has switched to Erlang^H ^H^Hfactor^H^H^H K since he wrote that, though).

Expanding a bit on the Objective Caml toolset, I haven't really used ocamldebug (even though it knows some fine tricks like allowing you to go back in time...), but I often use the profiler and I've come to love camlp4, a tool that allows you to extend OCaml's grammar (I'll just say that it's very powerful, this post is already getting too long). I use the REPL mainly to explore libraries (just do "include Themodule" to see all its types & functions) or to check the type of a function (the type almost always tells you all you need to know without reading the documentation). I don't find it worse than irb --- but I rarely code inside the REPL anyway.

Now for the cons... as much as I like the language, some things could be improved:

* the standard library is a bit meager. Several third-party libs to complement/extend it exist, but there's still work left (there's some activity in the works to create a Community Distribution with richer libs).

* sometimes you feel some kind of ad-hoc polymorphism would be nice

* I've also wished a few times that the compiler were a bit smarter (inlining in higher-order functions, other classical optimizations)

* the community is very quiet: the code-to-blogging/discussion ratio is much higher than in other communities. INRIA isn't very talkative regarding its future plans for the language, and the ML has seen moderate activity historically (it's been revitalized as of late after the first OCaml Meeting)


The back in time ability sounds pretty cool. I've wondered whether any debuggers implement that idea. I also like the idea of an extensible grammar (and syntax too, right?) I think pg should allow arc's syntax to be extended; which I remember him saying he'd do somewhere, but I haven't been able to find the quote since.


" I also like the idea of an extensible grammar (and syntax too, right?)"

Yes, that's what I meant (you change the grammar, resulting in new syntax).

Some examples of what you can do with camlp4:

http://martin.jambon.free.fr/pa_memo.ml allows to define memoized functions very conveniently:

    (* normal *) 
    let fib = function 0 | 1 -> 1 | n -> fib (n-1) + fib (n-2)
    (* memoized *)
    let fib = memo 0 | 1 -> 1 | n -> fib (n-1) + fib (n-2)
Automatic generation of

* typed JSON marshallers (http://martin.jambon.free.fr/json-static.html)

    type json mytype = Foo | Bar of int * int
    (* just add "json" to the type declaration to create to
       create the json_of_mytype and mytype_of_json functions *)
* serialization with S-expressions (http://www.janestcapital.com/ocaml/)

* pretty-printing, type-safe marshalling with structure-sharing, dynamic typing, equality... (http://code.google.com/p/deriving/)

* list comprehensions, heredocs, string interpolation, lazy pattern matching, "do syntax" for monads (very much like Haskell's)...

Here's some OCaml code that relies on a rather large syntax extension of mine which allows you to generate (or verify) SQL schemas automatically and build composable queries using a typed relational algebra (the type system ensures that all queries are valid; if you change the schema and break some queries, the compiler will tell you what's wrong --- broken queries just don't compile):

   TABLE user users
     COLUMN id SERIAL AUTO PRIMARY KEY
     COLUMN name VARCHAR(64) UNIQUE
     COLUMN age INT NULLABLE INDEXED
     COLUMN password VARCHAR(64)
   END
   
   TABLE comment comments
     COLUMN id SERIAL AUTO PRIMARY KEY
     COLUMN title TEXT
     COLUMN text TEXT
     COLUMN created_at TIMESTAMPZ
     COLUMN author SERIAL FOREIGN(users, id)
   END
   
   let minors x = SELECT [User_age < (Some 18)] x
   let pauls = SELECT [User_name LIKE "%Paul%"] users
   let young_pauls = minors pauls
You can read more about this extension at http://eigenclass.org/hiki/typed-relational-algebra-in-OCaml


Thanks for your excellent post! I'm interested in OCaml, but I am very much a momentum-based PL-switcher.


... but they use OCaml to prove that there are no run-time errors (RTE) in their code :-)

http://www.astree.ens.fr/

"In Nov. 2003, ASTRÉE was able to prove completely automatically the absence of any RTE in the primary flight control software of the Airbus A340 fly-by-wire system, a program of 132,000 lines of C. [...] From Jan. 2004 on, ASTRÉE was extended to analyze the electric flight control codes then in development and test for the A380 series."


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: