Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is literally why I created my company:

http://www.datastreamer.io/

We've been around for about a decade. IBM watson used us as their social data provider during Jeopardy. We provide data to tons of companies and you're probably using our services - just that it's not obvious where we're used since it's SaaS B2B and not B2C.

We're not free but the primary reason we exist is that other vendors charge borderline extortionate pricing and I fundamentally believe that the web MUST remain open.

We've also been providing data for very affordable pricing to researchers for more than a decade.

Search for us as Spinn3r under Google Scholar (our previous name) and we have hundreds and hundreds of PhDs who have access to our data.

We do charge for research usage now but it's very very very affordable.

The entire point is that we're trying to enable innovation.



This doesn't make any sense. You talk about open data but yours is the opposite. You're just another commercial data hoarder, please don't act like you're not.


You are mistaking between free and open. You can be open without being free. Maintaining web index is extremely expensive. Imagine storing most of the web on your own servers and serving it. Someone has to pay bills for all those disk space and bandwidth. I don’t think web index would ever be free (unless storage, compute and bandwidth were free) but having at reasonably priced is a very good thing. I would hope these indices are available on AWS, Azure etc where people can just use it with cloud compute and pay per use.


Easy to test, though. If they were open, you could download their entire data set under some permissive license. If you can't then they are not open.


> I don’t think web index would ever be free

Yet the company first mentioned does it for free, lol:

https://commoncrawl.org/

I've checked Datastreamer.io for 5 seconds, I don't see any link to their repo. If not "open source" then what does "open" mean?


Commoncrawl is not a company, it's a non-profit. Open means you can access the data, there is no assumption about the data being free or not.


What? It's a nonprofit organization engaging in nonprofit business. Any organization that engages in business is a "company." Common Crawl is a company. Your comment isn't accurate and it doesn't address the parent's comment.


If your prices are so much more reasonable than competition, why are they not published publicly on your site? “Contact us and we’ll tell you the price” is shady for a service that claims to be “very very very affordable.”


Because they charge different rates to different people. Super common in b2b arrangements.


Cheaper than the competition? Maybe. Nothing that requires contact to get a price is "affordable" (if you have to ask, you can't afford it...)


Hacker News guidelines say:

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize.

Your comment makes zero sense in this context, because it's just marketing.

> we're trying to enable innovation.

You're trying to make profit, like every other company in the world and that's OK.


Have you considered making a subset of your data open, cross-referenced from the paid data set? If other providers followed this approach, the open data set could grow and become more useful to all of the paid data providers, if only for lead generation and tool interoperability.


How exactly is it open if you have a paywall blocking people from accessing it though?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: