Some good ideas there, but mostly it's a collection of high-level tools and visualizations...my main objection is that journalists have a tendency to see data and documents as "magic" and making a slick Investigative Dashboard doesn't really dispel that. The main problem of data and document collection is not much different than in data science, where research and data cleaning/collection is by far the most time consuming part of the process. Improving OCR (and let's give Google credit for its work on tesseract) and creating a more friendly interface for tesseract (such as a training GUI) would be much, much more useful to the average investigative reporter.
And in terms of collection/research: if Google took up the work of reverse-PACER (for court documents), or furthered its work in election data (https://developers.google.com/civic-information/)...those would also be hugely beneficial initiatives.
I think any worthwhile investigative journalist will use what they see as a starting point for a deeper investigation. Writing about some visualization a tool showed you doesn't win you a Pulitzer.
Except that making vague accusations about large companies without evidence is possible more problematic than making them about people (which we get all the time now already). Companies can afford lawyers, and lawsuits for libel, even if not successful, will put a strain on a paper. If a paper is getting a double negative hit from an article, both in loss of reputation (if it has any left) and in legal fees in defending itself from libel, then it may be less willing to publish crap.
Of course, a well researched and written article will defend itself, and the last think a company that really has something to hide wants is to go to court when there's unflattering things that can be proven, not just reported.
https://www.google.com/ideas/products/investigative-dashboar...
Some good ideas there, but mostly it's a collection of high-level tools and visualizations...my main objection is that journalists have a tendency to see data and documents as "magic" and making a slick Investigative Dashboard doesn't really dispel that. The main problem of data and document collection is not much different than in data science, where research and data cleaning/collection is by far the most time consuming part of the process. Improving OCR (and let's give Google credit for its work on tesseract) and creating a more friendly interface for tesseract (such as a training GUI) would be much, much more useful to the average investigative reporter.
And in terms of collection/research: if Google took up the work of reverse-PACER (for court documents), or furthered its work in election data (https://developers.google.com/civic-information/)...those would also be hugely beneficial initiatives.