Some of the cool new features include a relatively lightweight compiler plugin facility and the abilIty to now write down types that are parametrized over typeclass constraints. Also some of the on going work on data parallel Haskell has been integrated into this release.
Does anyone have any tips for web scraping with Haskell? I've used tagsoup but the memory usage is really bad (or maybe I'm using the wrong string type?) and the interface is really clunky compared to something like pyquery or hpricot.
* Use tagsoup only for small projects or sites that are so broken that other packages fail to parse/load DOM properly.
* Shpider[1] package on Hackage (which I maintain these days) makes it somewhat easier to do the crawling bit. It has an intiutive API and we are always open to suggestions/new functionality there.
* Instead of tagsoup, learn hxt (and arrows along the way). It is really, really hard to get used to, but just amazing in extracting information using combinators from DOM once you're there. Perhaps you could do it as a back-burner learning project. Make sure to look into the arrow proc/do notation as that's pretty much key for scraping.
* Alternatively, you can use one of the xml parsing libraries and their combinators. Some that come to mind are: haxml, hexpat, xml, xml-basic, xml-conduit. I'm sure these would be great too, although I haven't used one in any great capacity.
Thanks! I've heard of Shpider, haven't had a need for it as of yet because I haven't needed programmatic web browsing (just downloading the page and extracting info).
I'll see if I can clean up the markup enough to get it to parse with hxt, otherwise Shpider provides a good reference on how to correctly use Tagsoup. And also the Shpider codebase is really clean and well documented.
* support for registerised compilation on the ARM platform, using LLVM.
First class REPL!
* It is now possible to give any top-level declaration at the GHCi prompt
* Safe Haskell