I didn't grab the flash content, but if I remember correctly, it was a flash movie that wrapped a PDF that Dan then OCRed and cleaned up with Refine. The coolest part was that the pdf was in grid form, so Dan wrote an ImageMagick script that split it into individual cells and then OCRed each cell (for better results).
EDIT: We haven't had any contact with Wolfram|Alpha but maybe we should reach out.
How did you deal with the Flash content? Decompile the source code? Did you encounter tabular PDF data? If so, did you find a good solution?
Also, have you or your colleagues had any contact with the Wolfram Alpha team? It seems like your organizations have similar data curation goals.
http://blog.stephenwolfram.com/2010/10/the-emerging-computat...