Hacker News new | past | comments | ask | show | jobs | submit login
A fast HTTP request/response parser for Common Lisp (github.com/fukamachi)
131 points by fukamachi on Oct 30, 2014 | hide | past | favorite | 73 comments



For context, Eitaro Fukamachi has been working on a whole bunch of things relating to the low-level parts of Common Lisp web development: A fast HTTP parser, a fast URI parser[0], a libuv-based web server[1] (That outperforms Node!), and he's also written Clack[2], which is Common Lisp's Rack/WSGI.

[0]: https://github.com/fukamachi/quri

[1]: https://github.com/fukamachi/woo

[2]: https://github.com/fukamachi/clack


Fukamachi has done some stellar work. I've used his code and have generally been very happy with it.


> (That outperforms Node!)

In what way? All of the benchmarks I'm seeing on that page indicate otherwise.


When he first posted it, it did, it looks like recent changes have brought that down:

https://github.com/fukamachi/woo/commit/be4a0db688af05ee204e...


Right. When I added default headers, the score went way down.

Probably libevent2 is the bottleneck and no way to make it faster without switching its backend (the cl-async's author is working on rewriting it with libuv now).


Out of curiosity did you try basic-binary-ipc? I would expect it would lose out to cl-async, but have never benchmarked the two against each other.


It would be quite surprising if it did, since Node's HTTP parser isn't Node/JavaScript, but highly optimized C.


That's quite interesting! I tried to learn Lisp a while back and found the string handling to be horrible, and it's such a huge part of modern development.

It sounds like there's someone with a lot of skill working to fix exactly this.

Now, do I have time to look into Lisp again...


out of curiosity, what did you found horrible about string handling?


So I just looked up Common Lisp string handling and none of it looks familiar. It's possible what I was trying to use before wasn't Common Lisp. It's also possible I was trying to learn from a site that was just doing a horrible job of it. It's possible things have gotten much much better since I last tried to learn it.

Whatever the cause, I was under the impression that you essentially had to treat strings as lists of characters and that there really weren't any built-in functions to handle strings as strings. It seems that's wrong at least in modern day.


Strings are sequences and functions that work on sequences work on strings as well. So getting substrings etc. is all possible.

I've had very little problems operating on strings in Common Lisp (all I missed was starts-with and ends-with functions), could you give some concrete examples so that I might help you?


I can't now because it was years ago I was having problems. I'm starting to think that it was a different Lisp and not Common Lisp I was looking at, too. Too long ago to remember for sure.


Strings are a simple-array of characters and functions that work sequences work on strings (ej. remove). They are complemented with a some string specific functions like string-equal, string-trim, string-upcase, etc. It has been that way for at least 20 years. It might have been that the site was doing a horrible job at explaining it.


Try Clojure


I've always assumed Clojure was the right choice for a Lisp to do web applications. Perhaps I should revaluate my assumptions.


If you're assuming this based on some microbenchmarks, you should probably instead reevaluate your evaluation techniques.

(This isn't meant as a personal jab, but too many people focus on silly things like microbenchmarks when choosing tools, and not on other more important things like support avenues, tooling, libraries, etc.)


I don't think it's fair to assume that TheMagicHorsey was basing his evaluation on the microbenchmarks themselves.

The real story here is that people are working on a number of new libraries/modules/tools for doing web development in CL — the benchmarks (as you correctly point out) are a minor detail.


Perhaps you should read more carefully. "If you're assuming ..."

It's just not clear from the context we have. But, caution! Microbenchmarks should only be Microtrusted.


I think this is truly sagely advice.

My grand takeaway from Aphyr's Jepsen series (http://aphyr.com/tags/jepsen) is that the more sophisticated software becomes in the never-ending quest for features and performance, the more likely the fundamentals underlying it have been overlooked, glossed over, or ignored completely.


When you compare the time fukamachi's spent on these libraries to the time spent on Clack, Caveman, etc., it's pretty clear there's obsession with performance.

As aroman said, "The real story here is that people are working on a number of new libraries/modules/tools for doing web development in CL".


I didn't mean to imply that the original post was not worthy in anyway whatsoever, in fact I think all of the work he is doing in the CL ecosystem is awesome and I look forward to playing around with it soon in my next CL project.

But apgwoz's general point about how we evaluate software is spot on. In general, I think benchmarks and the like have a sort of "woo"ing effect on people (in the James Randi sense), and we should probably take a few moments to reflect on that before making any major architectural decisions. :)


I agree entirely, but at the same time, I think too often people make social rather than technical decisions. Lots of stars on GitHub and a well-designed website are usually indicative of a good project -- but people put too much faith on CSS, the same way they put too much faith on microbenchmarks taken in laboratory conditions.


This -> people make social rather than technical decisions

Leo Meyerovich did a study on the social factors that affect language adoption[0]. Vivek Haldar has a highlights the salient points[1]. More info at his site: [2]

[0]: http://lmeyerov.github.io/projects/socioplt/papers/oopsla201...

[1]: http://blog.vivekhaldar.com/post/56972066006/empirical-analy...

[2]: http://lmeyerov.github.io/


I think too often people make social rather than technical decisions

Indeed. It's sad that as engineers (who theoretically espouse logic, reason, and evidence) we still end up making so many decisions based on herd behavior.


I would posit that engineers are human creatures rather than robots, and basing decisions on social factors is a natural part of being human.

For my anecdotal experience, apart from JS libs (where style over substance is frequently the norm) I find a well-designed website is usually a sign the project cares about its users, and is more likely to provide good documentation & a space to ask questions.


I've generally been happy with my use of Common Lisp in the web environment - it's had a long-standing presence there. I've personally not found deployment the most happy story, but I expect that with the rise of Docker, shipping a Common Lisp image in a Docker image will be a very popular method of deployment.


I'm curious, why would you assume that?

The primary benefit and cost, depending on your point of view, of clojure is that it runs on the JVM. I don't see how that would be a strong argument for or against using this tooling for any problem domain unless you were already committed to or against the JVM for other reasons.


Being able to run on the JVM is not a differentiating factor among lisps. Kawa Scheme and ABCL give you access to Java Libraries.

If you ask me the strengths of Clojure that it is built upon abstraction/interfaces (ie. conj instead of cons) and how it manages hygiene while retaining quasiquotes templates leveraging clj's namespaces. But then again you should probably ask a clojurian instead.


An advantage of Clojure over ABCL is the syntactically nicer Java interoperation and the mechanism to avoid reflection by using "warn-on-reflection" and introducing type hints.


Can somebody give arguments for using Racket or CL for web dev? Whenever I evaluate Lisps I end up choosing Clojure because to me it seems to be the most practical to develop web apps in.


I don't think you should look for arguments. Just try it. Write something in every one of the environments you are considering and you'll learn more than you ever could from reading online discussions.

As a data point, I moved from Common Lisp to Clojure and never regretted it. But the reason I switched was not because I read something online, but because of Clojure's concurrency support, which means I can create concurrent software that is robust.


In the HTTP benchmark game, there is also Bryan O'Sullivan's attoparsec-based parser: http://www.serpentine.com/blog/2014/05/31/attoparsec/ The important point is that the parser is a few dozen lines of code (i.e. it relies on a general parsing library instead of being custom hand-written code).


Is parsing HTTP headers a major bottleneck? HTML parsing, sure, especially with the horrors of HTML 5 error handling per the spec and the need to back up after guessing the character set. But HTTP headers are not that big or complex.


HTTP protocol grammar is pretty simple, but parsing correctly considering the server side and desired performance may be hard.

By nature parsing strings is expensive (not sure why they designed HTTP in string format, binary would save a ton of power in processing). So imaging you are receiving a request, most of the time due to Network latency (client or server side), you don't get the full Request at once, most of the time you need to perform a couple of read(2) operations.

Now, what do you do on each read ?, did you manage full state ?, how many parsing rounds ?, how to catch the end of the protocol block , etc etc etc.

So answering your questions I would say "YES", the HTTP protocol in a Server "may be" a bottleneck if its not well implemented. In our HTTP Server project[0] we are working in a new parser to improve performance, so for real, this is a very important topic.

[0] http://monkey-project.com

best


From the Reddit thread: http://www.reddit.com/r/lisp/comments/2klhpn/quri_a_uri_libr...

>Right. When I was profiling Wookie, I found URL parsing was the third bottleneck (the first is HTTP request parsing, and the second is libevent2).


Yes, that's really important for compatibility with Google Wave.


This reminds me of John Fremlin's work back in 2009 http://john.freml.in/teepeedee2-c10k . (Hello, John!)


Just curious what about this parser makes it faster? Can anyone give a basic explanation?


Well if you implemented the c parser with precisely the same algorithms the c one would become faster, so one can assume they have totoally different underlying implementations


That is not necessarily true (and I say this as someone who heavily favors C for parsers). An implementation with a JIT will likely be able to optimize away work by noticing that the table of callbacks is empty.

I would be more impressed if the callback table was not empty (and the callbacks actually did something, like increment an integer) and Lisp was still winning.

EDIT: I see others saying that SBCL is AOT, not JIT. If so I'd be very interested to look at the disassembly side-by-side and see an explanation for why SBCL is winning.


SBCL is a compiled Common LISP (even in interpreted mode, it compiles every eval-ed lambda before calling it).

On the other hand there is nothing inherently stopping the optimiser from optimising the resulting code as well or better than GCC or any C compiler. Theoretically it could produce optimal machine code.


Correct me if I'm wrong but I was under the impression that Common Lisp is memory-safe (implying guards like bounds-checks), dynamically typed (implying type checks) and garbage collected (implying GC overhead).


Yep but all these checks don't necessarily need to happen at runtime. A clever enough compiler can optimise all of it (or most of it) at compile time. In general this is a very hard problem but for something static enough (like a simple parser) the compiler can produce code and determine its safety without needing to carry checks on to the runtime.

Note that C code or machine code would also have to perform input checks at runtime when doing http header parsing. A Common Lisp compiler wouldn't theoretically need to produce code performing any more runtime checks than these.


You can also manually turn some checks off. In some Lisp systems, declaring an optimization setting of (safety 0) will turn off a bunch of checks, even ones that the compiler can't statically determine are safe to skip. Of course this should be used very carefully if at all (especially in code that's parsing arbitrary data from the internet!).


SBCL (And other compilers) implement a type inference engine to reduce type checks, and you can also provide your own type declarations, i.e. gradual typing before it had that name.


CL type declarations are cool, but they aren't gradual typing. It doesn't either reject your program if you get them wrong, or dynamically enforce that everyone else lives up to them.


That depends on the optimization levels you select. At it's highest optimization level, there is no type checking and you'd better have your code right.

http://www.sbcl.org/manual/#Declarations-as-Assertions

4.2.1 Declarations as Assertions

The SBCL compiler treats type declarations differently from most other Lisp compilers. Under default compilation policy the compiler doesn’t blindly believe type declarations, but considers them assertions about the program that should be checked: all type declarations that have not been proven to always hold are asserted at runtime.

Remaining bugs in the compiler’s handling of types unfortunately provide some exceptions to this rule, see Implementation Limitations.

CLOS slot types form a notable exception. Types declared using the :type slot option in defclass are asserted if and only if the class was defined in safe code and the slot access location is in safe code as well. This laxness does not pose any internal consistency issues, as the CLOS slot types are not available for the type inferencer, nor do CLOS slot types provide any efficiency benefits.

There are three type checking policies available in SBCL, selectable via optimize declarations.

Full Type Checks

All declarations are considered assertions to be checked at runtime, and all type checks are precise. The default compilation policy provides full type checks.

Used when (or (>= safety 2) (>= safety speed 1)).

Weak Type Checks

Declared types may be simplified into faster to check supertypes: for example, (or (integer -17 -7) (integer 7 17)) is simplified into (integer -17 17).

Note: it is relatively easy to corrupt the heap when weak type checks are used if the program contains type-errors.

Used when (and (< safety 2) (< safety speed))

No Type Checks

All declarations are believed without assertions. Also disables argument count and array bounds checking.

Note: any type errors in code where type checks are not performed are liable to corrupt the heap.

Used when (= safety 0).


I'm not familiar with gradual typing, but, SBCL with any safety level above 0 does both of those things. This isn't specified by the standard, of course.


Not necessarily true, although C compilers are usually very good.


Why do you think that? Is it because most languages compile to C? That is not the case with most CL implementations. (ej. CL & CCL).

Although I agree that it is generally the case, you can read here[0] for an example where CL would gain an advantage over C compilers. Albeit in an implementation specific way.

[0]: http://www.pvk.ca/Blog/2014/08/16/how-to-define-new-intrinsi...


Possibly, but you might not be able to do that without implementing enough of Lisp to allow the algorithms to be implemented, as per Greenspan's Tenth: http://c2.com/cgi/wiki?GreenspunsTenthRuleOfProgramming


parsed = http_parser_execute(parser, &settings_null, buf, strlen(buf));

assert(parsed == strlen(buf));

He's calling strlen(buf) twice here. There's no guarantee that this is optimized at compile time. Usually len of buf is determined from read/recv system calls and for a fair comparison strlen should be removed here.


No difference even though I ran the benchmark again after removing the line.


Interesting. I was expecting at least cpu branch improvements given the lack of assert.


Also use gettimeofday() instead of clock(), because you don't know how far into the second you are at the time it's called.


Any good compiler will optimize that out. (Actually, even a bad compiler will optimize that out... it's a really easy optimization).


assert only runs in debug mode anyway.


It runs unless NDEBUG is defined, which the author did not do according to his methodology. Anyway I looked at the assembly output and strlen() is optimized away.


I really do not like when bad habits are encouraged by the optimizer.



"Forcing a definition of the name NDEBUG, either from the compiler command line or with the preprocessor control statement #define NDEBUG ahead of the #include <assert.h> statement, shall stop assertions from being compiled into the program"


Hmm, I can't really write lisp, but, how maintainable is this?? https://github.com/fukamachi/fast-http/blob/master/src/parse...


It could do with some comments and doc-strings certainly, but I'd give it a pass for now as it's a new project. As for the code, I imagine that in any language, highly optimised programs can start looking pretty odd after a while. Try understanding "grep" for example:

http://git.savannah.gnu.org/cgit/grep.git/tree/src/grep.c

As a guy not particularly proficient in C, that looks crazy to me. Although it does at least have decent comments, which I'd expect for such a mature project!


I was not able to run the benchmark for Lisp. What Lisp implementation is expected? I tried CLISP and got:

    *** - READ from #<INPUT BUFFERED FILE-STREAM CHARACTER #P"benchmark.lisp" @1>: there is no package with name "SYNTAX"


I'd recommend SBCL, since a high-performance JIT compiler is likely to give more representative results than a bytecode VM like CLISP. Plus, I'd expect some minor portability errors to arise when you push the envelope like this.

By the way, your error seems to be that the cl-syntax library is not being found, as opposed to a portability problem, so maybe you don't have Quicklisp installed and loaded?


Pedantic correction: Unless something's changed very radically since I last looked, SBCL's compiler is AOT not JIT. (Though in interactive use it compiles things as you enter them, which I suppose is kinda JIT if you look at it from just the right angle and squint.)


You're right, and I always make this mistake for the reason that AOT/JIT look the same from the outside.


> By the way, your error seems to be that the cl-syntax library is not being found, as opposed to a portability problem, so maybe you don't have Quicklisp installed and loaded?

Indeed I don't, I didn't know to install it since it wasn't listed as a dependency. Thanks for the info!


Quicklisp is CL's package manager, so that's necessary to install dependencies. You can install it with:

    $ curl -O http://beta.quicklisp.org/quicklisp.lisp
    $ sbcl --load quicklisp.lisp \
           --eval '(quicklisp-quickstart:install)' \
           --eval '(ql:add-to-init-file)' \
           --quit


Is Lisp a write-only language?


I'll assume this is an honest question - not a troll (and hence don't deserve the downvotes), but no, Lisp isn't write-only.

It's an immensely flexible language that lets you code in a variety of different styles, so you can end up with code spaghetti. But that doesn't mean that you have to, or even that it's the most likely thing to happen.

If you compare it to idiomatic Perl, where a lot of the community seem to get obsessed with the famous one-liners, Lisp seems a lot better to me. And if you follow a decent style guide like:

https://google-styleguide.googlecode.com/svn/trunk/lispguide...

Then there's no particular reason why you should end up with disaster code anymore than in any another language. I'd rather read a single page of Lisp, than the equivalent numerous pages of Java or some other verbose language.


The old book _Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp_ contains a number of nontrivial programs (like an Othello player and a little symbolic math program). AFAIK, most people think the code is unusually clear, both a good example of how to write clear CL programs and a good example of clear programming period (independent of language). If you are seriously interested in clarity of CL programs, I recommend taking a look at that book.


Coming from Common Lisp and having tried out Haskell recently I can ask the same about the latter while I find Lisp to be one of the most readable languages out there.

It's basically what you're used to and for Lisp it does not take long to get used to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: