I attempted to find a company I know and 6 searches later I still couldn't find it. The first two attempts I had part of the name wrong, I.e. x resources instead of x energy, but I corrected that, continued, and still couldn't find it even though it came back with dozens of results.
Google got it on the first attempt even using the wrong name.
Too much data with poor search algorithms can be worse than a phonebook.
I also work at OpenCorporates and work alongside NGOs and investigators on stories as the data we have can be essential in tracking chains of corporate control and expose links to corruption. We have a community of botwriters and researchers who specifically go after datasets that are hard to find/scrape but are valuable e.g. financial licenses, businesses licenses etc. We're on slack if you're interested to see what's happening in our community: http://slack.opencorporates.com/
The mission of this project is absolutely laudable and the effort you are putting into it is impressive. However, I don't see how this database is particularly "open" if one cannot download any dump and the only access is a rate-limited API - where signing up for a key needs detailed statements on how the data is supposed to be used.
But it looks like the bots aren't published (checked github.com/openc and morph.io). That's a very strange approach to open data like you noted but I like their goal.
A great resource in that I suspect much of this data was otherwise buried deep in some other database...but is also an example of how essential filtering/ranking algorithms are for search, at least for general user friendliness.
For example, doing a search for Apple, and even limiting it to California, brings up a ton of junk that needs to be filtered out visually by the user:
And you have to have some knowledge of corporate conventions to get at what you think you want. Here's the entry for Facebook, Inc., registered in Menlo Park:
The listing of subsidiaries is nice...I don't know how complete it is, but it contains the companies I expect (Parse, Oculus, Whatsapp...no Instagram though?)
These kinds of products would be much more interesting if you could download the database and periodic updates. API only means a no go for a lot of use cases.
[from Chris at OpenCorporates]
Thanks for all the useful comments. Wondered if anyone tried the Advanced Search, and particularly the experimental 'search by relevance', which gives Apple Inc as the top score. It's experimental in part because what two users my not have the same view of relevant as each other (one reason why Google wants you to be logged in when searching, so it can return searches on what it thinks is relevant to you).
Re the number of results, with over 95 million companies from hundreds of official sources, there are a lot of companies with the same or similar names -- that's part of the power, and why OpenCorporates is so widely used by journalists, anti-corruption investigators, lawyers, law-enforcement etc. The more we loosen the search, however, the more results you will get -- so it's a balancing act, and the advanced search and the API is part of us trying to make it work for users, and we'd love feedback on both -- just email us at community @ opencorporates dot com
For large listed companies such as Apple or Intel, it depends on what question you are trying to answer. If it's just an overview of the corporation, then Wikipedia or something like Yahoo Finance is the best route. The former gives a narrative overview, using the collected expertise of hundreds of contributors; the latter includes highly proprietary data (which was also for the most part collected by offshore humans) to build an overall picture.
However, for official, provenanced data under an open licence about legal entities, OpenCorporates is by far the best option -- the Facebook example is a good one. Contrary to many people's expectations, Facebook is not a California Corporation. However, because they operate in California they have to register as a branch (aka Foreign Corporation), and if you land on that page (https://opencorporates.com/companies/us_ca/C2711108) you'll see that we actually link to the home corporation, in Delaware (https://opencorporates.com/companies/us_de/3835815), and there we list subsidiaries and other branches. Why don't we have Instagram? You can check from the source of the subsidiaries we list (e.g. https://opencorporates.com/statements/37307675 which links to the SEC Exhibit 21 filing at http://www.sec.gov/Archives/edgar/data/1326801/0001326801150... ), and you can see there that they don't list Instagram (possibly because either there's no separate company for it, or because it doesn't count as a material subsidiary).
So the question often becomes (and news stories are problematic for multiple reasons), where can you find the linkages that can be parsed into structured, provenanced data in a reliable way. We're focusing on two areas: looking for sources of public data that can be combined together to give insight to all, and building up a community of users to help us do that: http://impact.opencorporates.com/contribute/
Please do consider joining us in this important mission, or just by pinging us with suggestions of how we can do better.
Google got it on the first attempt even using the wrong name.
Too much data with poor search algorithms can be worse than a phonebook.