Here are a couple of suggestions: \* Use tagsoup only for small projects or site...

tkahn6 · on Feb 3, 2012

Thanks! I've heard of Shpider, haven't had a need for it as of yet because I haven't needed programmatic web browsing (just downloading the page and extracting info).

I'll see if I can clean up the markup enough to get it to parse with hxt, otherwise Shpider provides a good reference on how to correctly use Tagsoup. And also the Shpider codebase is really clean and well documented.