Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
VC portfolio page changes (neuvc.com)
98 points by hansy on May 6, 2014 | hide | past | favorite | 17 comments


I've been following their twitter feed for a long time (https://twitter.com/vcdelta) ... one of my favorite twitter bots


You've written scrapers for all this http://neuvc.com/labs/vcdelta/vcs.html VC pages? I can almost feel the pain...


VCdelta author here. I'm too lazy to do it by hand.

It's a generic scraper in Python with a "DSL" for the type of scraping it does. The DSL is, at its base, css selectors with certain assumptions about how the sites are structured (and then a way to make exceptions.) I can code a site's scraper in about 3 or 4 minutes using this. There are some sites that are impossible to scrape this way because they don't have the information in a machine-readable form (i.e. just images), or they block bots, or they have no data on portfolio companies at all.

And a couple of scrapers break every week, as you'd imagine, and get fixed a couple of weeks later, so the data is not complete by a long shot. Also, since new VC funds seem to start up every week, it doesn't follow even a substantial subset of all the funds. Wish I had more time to spend on it.


It says he "looks" at them, so he probably does it manually.


That text is written from the point of view of 'neubot', so its not 'man'-ual.


Do keep in mind the date when the actual investment happened is probably 2-5 months before the vcs had posted it on their site, especially for earlier rounds.


Many times. But a few times they've posted it before they announced it :)



Deletions are potentially as interesting or more interesting than additions. I think that generally speaking the more scrupulous investors do not hide their failed investments, but I am aware of some firms that delete failed companies from their portfolio page to inflate their 'batting average'.


VCdelta author here.

It knows deletions of course, but because the scrapers break, or the sites go down or the portfolio companies names change, there are a lot of false positives, so I decided not to publish them. When a site breaks and a portfolio add doesn't get published, that's not a big deal (VCdelta doesn't purport to be complete) but saying something's been deleted when it hasn't been seemed different.


If you, or anyone reading, is really interested in this kind of analysis reach out to me at kevin@mattermark.com. We build all kinds of crawlers like this.


It's missing the money part

/e math isn't far away, but it helps to put it in context. On that note, it's nice to have an updated list with most vc deals made in the past 3 years


A note worthy initiative to capture this valuable information. Another one that may be worth looking at is http://internetdealbook.com/category/top-deals/. It focuses on the more visible deals and does include deal size where public.


It looks like it's just scraping the portfolio pages of most of these, and most portfolio directories list nothing more than a company's name and website.

That said, doesn't AngelList or CrunchBase have something like this already? (or at least an API to extract more information from)


Now, it would be fun to run predictions on the portofolio and investment data. Predict what companies will get investment next, what domains are hot, and so on. Who knows, you could even make money from this.


What are we looking at?


A list of investor-investee relationships, each given an approximate establishment date.

The website "watches the portfolio pages of some 150+ venture capital sites. Every night I look at each page, note any additions, and report them here".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: