Hacker News new | past | comments | ask | show | jobs | submit login

Not an economist, but I have been mulling over a project surrounding scraping and pricing.

Without having access to the actual monetary transaction data, how does one know what was sold and for how much? Without this (or a mechanism by which the lister closes or updates the listing), how do you know anything was actually sold?




"how does one know what was sold and for how much?"

For example with domain names sales prices only a small amount of transactions are public. For example I've been doing it for 16 or 17 years and have never made any of my data public nor have people that I've consulted for.

Another example might be commercial rents. You can track asking rents but you can't really get a handle on actual rent paid since there are many deal factors (renovations, free rent, triple net etc.) that would change the numbers significantly.


Also an economist and a data scraper/consultant here -- depending on the data, some times all you need to figure out is correlation -- frequency of updates, listings being live for X time; clusters of listings around Y days, etc.

In terms of a few real-life examples, on the one hand you have eBay which provides you with sold data (API through Terapeak). On the other hand you have Craigslist, which is kinda opaque, hates scraping, but you can monitor listings and their half-life. (Listings that disappear quickly presumably get sold quick; listings that stick around for weeks relisted over and over have lower liquidity presumably and/or are priced high.)


eBay's completed listings is definitely one of the best applications of obtaining sales data on the Internet that I'm aware of. Besides that, in some cases there are ways to imperfectly estimate quantities when best seller rankings are available (e.g. at Amazon) -- Chevalier and Goolsbee where the first to suggest this approach back in 2003.[1]

As you mentioned, monitoring half-life is another imperfect approach, but it is of course plagued by false positives (a listing goes away but no sale was made). There was a Google Tech Talk many years ago where some economists took this approach[2], except they were looking at pricing power instead of measuring quantity sold.

[1] http://www.hss.caltech.edu/~mshum/ec106/chevaliergoolsbee.pd...

[2] https://www.youtube.com/watch?v=SfjAezl3-cU#t=27m20s


Although it would only be available for a fraction of prices, delta in "quantity available" between scrapes could provide some data.

Most websites don't relay this info to the end user but it could be used on those that do.


I'm pursuing something even less structured -- forum posts. It may be a lost cause.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: