Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a really good VM-based PEG parsing system called LPEG (http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html). Paper: http://www.inf.puc-rio.br/~roberto/docs/peg.pdf

It's for Lua, but there isn't anything in the VM design that requires Lua per se - porting it to another language's C API wouldn't be difficult. In my experience, LPEG has been quite easy to use for the sort of tasks one would normally apply regular expressions to (and then some, as it handles recursive structures far better), it's quite fast, and it's easy to tune incrementally. In my (not yet ready for release) Lua/LPEG webserver, I'm handling 1000+ requests per second in about 2MB ram total* , so LPEG's design has tamed the usual space issues caused by PEG memoization.

* Only a tiny amount of that time is due to request parsing, it's mostly because I'm still working the kinks out of the event loop.

One strike against it is that there isn't a whole lot of casual documentation, though the documentation that does exist (the research paper and a library reference) is quite thorough. Also, it takes advantage of Lua's operator overloading to make the grammars concise, which takes getting used to. Still, having it so thoroughly integrated into the language is extremely convenient, and it's quite a bit more expressive than regular expressions.

Also, here's a link to the PEG paper on the author's site, rather than behind the ACM's paywall: http://www.bford.info/pub/lang/peg.pdf . Gotta love the ACM.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: