Let's be clear here: this is an unauthorized use of Google servers.
Unless Google provides an explicit Terms Of Service for this API endpoints allowing its use, you can assume that it exists only to serve Google-owned software (like Chrome).
Just because it exists doesn't mean you can use it.
There's an obvious risk that Google can modify the API and less obvious legal risk of misusing someone else's resources.
Doesn't mean you can't play with it and use it for experiments but you probably shouldn't cross the line of actually using it in any production software.
If it's publically accessible and not password or rate-limited then it's fair game for hackers to goof around with. Just don't base your startup on a (any) Google API.
I agree Google could easily change the API at any moment, but otherwise it seems a very grey area at best. Especially since they are including this within the open source parts of Chromium as the article implies.
I wouldn't rely on this for anything I wanted to support, but I don't see that it is incurring a legal risk to work with source code Google themselves have released.
This reminds me of the bad old days when Google Maps didn't have an API yet and people manually de-minified the source and hacked their way into the hidden API. The original mashups that were created as a result of this hacking resulted in the Google Maps API as it stands now. I've heard from Googlers that before the proliferation of these mashups there were no plans at Google to introduce a public Maps API.
The API is not reliable... I've invested a bit of time into this: https://github.com/taf2/speech2text and been underwhelmed so far... might be interesting to use google as a training tool for sphinx... the issue i found is because the utterance length supported is so short by google, you need to then figure out how to break wave's into small word chunks... but then you lose context so you lose quality...
Is there any authorized service for small-scale speech recognition from Google or anyone else? A free or cheap one?
Or is there any kind of open-source project? I guess there probably is not since the training set for modern systems would be terabytes of audio data and probably more proprietary than the algorithms.
There are a few open-source projects, like at http://www.speech.cs.cmu.edu/ -- as you say, they haven't gotten the love that Google's or Dragon's have, not last I looked.
Just one more option - Twilio has a voicemail transcription feature as part of it's API, with automatic call back. So anything recorded could be available as text that way.
For me personally I wouldn't be interested in using anything besides the most accurate prediction. However, I would probably make the alternative hypotheses available to choose in case the best prediction is incorrect. In that case I can only assume that the hypotheses are listed in decreasing order of confidences.
If you have a relevant text corpus (e.g. previous transcripts of the same person), you could use some Markov-type modeling/analysis to verify the transcription or find the most suitable alternative in the list.
Unless Google provides an explicit Terms Of Service for this API endpoints allowing its use, you can assume that it exists only to serve Google-owned software (like Chrome).
Just because it exists doesn't mean you can use it.
There's an obvious risk that Google can modify the API and less obvious legal risk of misusing someone else's resources.
Doesn't mean you can't play with it and use it for experiments but you probably shouldn't cross the line of actually using it in any production software.