VCdelta author here. I'm too lazy to do it by hand.
It's a generic scraper in Python with a "DSL" for the type of scraping it does. The DSL is, at its base, css selectors with certain assumptions about how the sites are structured (and then a way to make exceptions.) I can code a site's scraper in about 3 or 4 minutes using this. There are some sites that are impossible to scrape this way because they don't have the information in a machine-readable form (i.e. just images), or they block bots, or they have no data on portfolio companies at all.
And a couple of scrapers break every week, as you'd imagine, and get fixed a couple of weeks later, so the data is not complete by a long shot. Also, since new VC funds seem to start up every week, it doesn't follow even a substantial subset of all the funds. Wish I had more time to spend on it.
Do keep in mind the date when the actual investment happened is probably 2-5 months before the vcs had posted it on their site, especially for earlier rounds.
Deletions are potentially as interesting or more interesting than additions. I think that generally speaking the more scrupulous investors do not hide their failed investments, but I am aware of some firms that delete failed companies from their portfolio page to inflate their 'batting average'.
It knows deletions of course, but because the scrapers break, or the sites go down or the portfolio companies names change, there are a lot of false positives, so I decided not to publish them. When a site breaks and a portfolio add doesn't get published, that's not a big deal (VCdelta doesn't purport to be complete) but saying something's been deleted when it hasn't been seemed different.
If you, or anyone reading, is really interested in this kind of analysis reach out to me at kevin@mattermark.com. We build all kinds of crawlers like this.
A note worthy initiative to capture this valuable information. Another one that may be worth looking at is http://internetdealbook.com/category/top-deals/. It focuses on the more visible deals and does include deal size where public.
It looks like it's just scraping the portfolio pages of most of these, and most portfolio directories list nothing more than a company's name and website.
That said, doesn't AngelList or CrunchBase have something like this already? (or at least an API to extract more information from)
Now, it would be fun to run predictions on the portofolio and investment data. Predict what companies will get investment next, what domains are hot, and so on. Who knows, you could even make money from this.
A list of investor-investee relationships, each given an approximate establishment date.
The website "watches the portfolio pages of some 150+ venture capital sites. Every night I look at each page, note any additions, and report them here".