The challenges with crawling on a large scale still persist as is evident by bloomreach and many other companies building custom solutions because available open source tools cannot handle the scale of such products. SQLBot aims to solve this problem.
Product a few weeks from launch. If any is interested: http://www.amisalabs.com/AmisaSQLBot.html