This is my last reply in this thread, because you are either trolling or have no idea what you are talking about, and I'm just wasting my time.
> Why is it so hard to create an index of online prices, which is what BPP is.
Why is it so hard to create a searchable index of the internet, which is all Google Search Engine is?
An index of online prices is much much smaller, say "just" 1% of the effort, so doing it dependably would cost "only" $5M/year or so to produce and maintain if you paid market prices (BPP has cheap student labor, I would guess, that makes their cost much lower; also, they can get away with not being robust). It's not just about crawling retailers - you have to be able to pull the prices and canonicalize products properly, and account for site layout changes. I actually did that in a former startup of mine. Doing it reliably is damn hard; embarrassingly parallel -- you could do it quickly by just throwing more money at it, but a ridiculously large amount of detail is involved.
Yes, yegg has been able to pull DDG with much, much less, but the talented people at Cuil were unable to pull with much, much more. If you're as good as yegg at this, I bet this is more rewarding (both financially and otherwise) than your job now. I know I can't pull a robust price index for less than a few $M.
> That isn't hard to do in my mind. Why do you believe it is so hard?
Because I live in the real world, and not in your mind. I actually did it in 1999-2001 for a startup. Technology has made it easier since, but not much easier. We developed a web scraper back then that was on par with everything available today (and is still ahead of many modern web automators like selenium).
Sites like pricegrabber are worth tens of millions if not hundreds of millions (meaning, if you were a company wanting to do that, you would either build-your-own or buy one for such a sum - so it's a good estimate of the costs, to within an order of magnitude), and all they do is create an index of online prices. In 2000, MySimon was sold for $700m for being nothing more than a price index. It wasn't a good deal for the buyer, but $70m today is probably a steal for a good, up-to-date price index of the web.
> That isn't hard to do in my mind.
I'm done here. I don't know what your real world experience is like, but it apparently doesn't help understand the involved statistics, economics, business and operational complexities. You demand details but you ignore them when given. I guess it's really a nice place in your mind.
> but you have to also provide some alternative data/math which is transparent.
BLS data is not transparent, and more than 30% of the value is officially based on speculation (a fact you conveniently ignore ignore). Ben Bernanke says it's accurate so it must be so. In the same way Europe is contained (as he said in 2008, 2009 and 2010), the mortgage crisis is contained (as he said in 2008 up until 2011), there is no chance whatsoever the US will be downgraded (as himself and Tim Geithner have repeatedly said when asked, up until the point it happened), etc.
If you actually read the data in http://www.bloomberg.com/news/2012-04-19/cpi-conspiracy-theo... you'd note that (a) there actually are no broad checks involved despite the title, and (b) while the BLS is not the only one using said methodologies, they are not universally accepted (not more than 40% of comparable countries use them) as it is claimed. Hedonics adjustments are more recent than the "recent" review of 1989 -- a review which was conducted with the implied target of finding that the BLS overstates consumer price inflation (by virtue of being commissioned by a congress quest to cut budget -- and what do you know, that's exactly what they found out!).
> It seems rather easy to me to make a price index, make the data public, and make the code public. I'm happy to help out on the coding side if you're up for it.
Since you live in your mind, and I live in the real world, it would be hard to bridge the gap and work as a team.
I've written a web crawler for a fortune 20 company, while I was in a research group at said company. I know exactly what would be involved. Storing prices with some small meta data is much much cheaper than storing full pages in the form of a search index. Scraping a discrete list of sites is far far easier and cheaper than writing a crawler robust enough to attempt to crawl arbitrary pages on the internet. Not only that, many retailers have APIs these days so a scraper wouldn't be needed for many of them, and reading from an API is something a novice programmer could do in less than an hour per retailer.
One of my close friends worked at pricegrabber for years, and the statement "all they do is create an index of online prices" is completely false.
Your continual use of ad hominem speaks volumes, and is your attempt to hide a lack of hard numbers.
> Your continual use of ad hominem speaks volumes, and is your attempt to hide a lack of hard numbers.
Questioning your reading skills you ignore given hard numbers that you've requested is perfectly reasonable. Questioning your math or economics skills when you insist that BPP and CPI are independent when you admit to not knowning how they are derived is perfectly reasonable. You might want to check what ad hominem actually means (beyond the latin translation), you might be enlightened. Alternatively, you can just ignore anything that contradicts your world view, as you have done before.
> Why is it so hard to create an index of online prices, which is what BPP is.
Why is it so hard to create a searchable index of the internet, which is all Google Search Engine is?
An index of online prices is much much smaller, say "just" 1% of the effort, so doing it dependably would cost "only" $5M/year or so to produce and maintain if you paid market prices (BPP has cheap student labor, I would guess, that makes their cost much lower; also, they can get away with not being robust). It's not just about crawling retailers - you have to be able to pull the prices and canonicalize products properly, and account for site layout changes. I actually did that in a former startup of mine. Doing it reliably is damn hard; embarrassingly parallel -- you could do it quickly by just throwing more money at it, but a ridiculously large amount of detail is involved.
Yes, yegg has been able to pull DDG with much, much less, but the talented people at Cuil were unable to pull with much, much more. If you're as good as yegg at this, I bet this is more rewarding (both financially and otherwise) than your job now. I know I can't pull a robust price index for less than a few $M.
> That isn't hard to do in my mind. Why do you believe it is so hard?
Because I live in the real world, and not in your mind. I actually did it in 1999-2001 for a startup. Technology has made it easier since, but not much easier. We developed a web scraper back then that was on par with everything available today (and is still ahead of many modern web automators like selenium).
Sites like pricegrabber are worth tens of millions if not hundreds of millions (meaning, if you were a company wanting to do that, you would either build-your-own or buy one for such a sum - so it's a good estimate of the costs, to within an order of magnitude), and all they do is create an index of online prices. In 2000, MySimon was sold for $700m for being nothing more than a price index. It wasn't a good deal for the buyer, but $70m today is probably a steal for a good, up-to-date price index of the web.
> That isn't hard to do in my mind.
I'm done here. I don't know what your real world experience is like, but it apparently doesn't help understand the involved statistics, economics, business and operational complexities. You demand details but you ignore them when given. I guess it's really a nice place in your mind.
> but you have to also provide some alternative data/math which is transparent.
BLS data is not transparent, and more than 30% of the value is officially based on speculation (a fact you conveniently ignore ignore). Ben Bernanke says it's accurate so it must be so. In the same way Europe is contained (as he said in 2008, 2009 and 2010), the mortgage crisis is contained (as he said in 2008 up until 2011), there is no chance whatsoever the US will be downgraded (as himself and Tim Geithner have repeatedly said when asked, up until the point it happened), etc.
If you actually read the data in http://www.bloomberg.com/news/2012-04-19/cpi-conspiracy-theo... you'd note that (a) there actually are no broad checks involved despite the title, and (b) while the BLS is not the only one using said methodologies, they are not universally accepted (not more than 40% of comparable countries use them) as it is claimed. Hedonics adjustments are more recent than the "recent" review of 1989 -- a review which was conducted with the implied target of finding that the BLS overstates consumer price inflation (by virtue of being commissioned by a congress quest to cut budget -- and what do you know, that's exactly what they found out!).
> It seems rather easy to me to make a price index, make the data public, and make the code public. I'm happy to help out on the coding side if you're up for it.
Since you live in your mind, and I live in the real world, it would be hard to bridge the gap and work as a team.