Fast and free Speech Recognition with Google's Voice API

kkowalczyk · on Aug 12, 2011

Let's be clear here: this is an unauthorized use of Google servers.

Unless Google provides an explicit Terms Of Service for this API endpoints allowing its use, you can assume that it exists only to serve Google-owned software (like Chrome).

Just because it exists doesn't mean you can use it.

There's an obvious risk that Google can modify the API and less obvious legal risk of misusing someone else's resources.

Doesn't mean you can't play with it and use it for experiments but you probably shouldn't cross the line of actually using it in any production software.

dstein · on Aug 12, 2011

If it's publically accessible and not password or rate-limited then it's fair game for hackers to goof around with. Just don't base your startup on a (any) Google API.

aquark · on Aug 12, 2011

I agree Google could easily change the API at any moment, but otherwise it seems a very grey area at best. Especially since they are including this within the open source parts of Chromium as the article implies.

I wouldn't rely on this for anything I wanted to support, but I don't see that it is incurring a legal risk to work with source code Google themselves have released.

yellowbkpk · on Aug 12, 2011

This reminds me of the bad old days when Google Maps didn't have an API yet and people manually de-minified the source and hacked their way into the hidden API. The original mashups that were created as a result of this hacking resulted in the Google Maps API as it stands now. I've heard from Googlers that before the proliferation of these mashups there were no plans at Google to introduce a public Maps API.

taf2 · on Aug 12, 2011

The API is not reliable... I've invested a bit of time into this: https://github.com/taf2/speech2text and been underwhelmed so far... might be interesting to use google as a training tool for sphinx... the issue i found is because the utterance length supported is so short by google, you need to then figure out how to break wave's into small word chunks... but then you lose context so you lose quality...

pronoiac · on Aug 13, 2011

There's a random, spammy-looking URL in the description, about call tracking metrics. Were you aware of that?

WildUtah · on Aug 12, 2011

Is there any authorized service for small-scale speech recognition from Google or anyone else? A free or cheap one?

Or is there any kind of open-source project? I guess there probably is not since the training set for modern systems would be terabytes of audio data and probably more proprietary than the algorithms.

abecedarius · on Aug 12, 2011

There are a few open-source projects, like at http://www.speech.cs.cmu.edu/ -- as you say, they haven't gotten the love that Google's or Dragon's have, not last I looked.

aquark · on Aug 12, 2011

You could probably mash something up using Voxeo or even the new Twilio client stuff.

I have this on my project list for some home automation ideas, but it is too far down to actually have been given any cycles.

balakk · on Aug 12, 2011

If you use Windows, there's built-in Speech recognition.

http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).as...

There's a server version of it with Microsoft Lync I think.

Microsoft also has a cloud service for the same:

http://www.microsoft.com/en-us/Tellme/technology/default.asp...

yread · on Aug 12, 2011

when you develop on WP7 you get it for free http://research.microsoft.com/en-us/um/redmond/projects/hawa... along with OCR and other goodies. Well, for research purposes at least

dsulli · on Aug 12, 2011

Just one more option - Twilio has a voicemail transcription feature as part of it's API, with automatic call back. So anything recorded could be available as text that way.

kleiba · on Aug 12, 2011

Any ideas why only the first result comes with a confidence value?

hazexp · on Aug 12, 2011

I suspect it's because it's the most relevant.

For me personally I wouldn't be interested in using anything besides the most accurate prediction. However, I would probably make the alternative hypotheses available to choose in case the best prediction is incorrect. In that case I can only assume that the hypotheses are listed in decreasing order of confidences.

tincholio · on Aug 12, 2011

If you have a relevant text corpus (e.g. previous transcripts of the same person), you could use some Markov-type modeling/analysis to verify the transcription or find the most suitable alternative in the list.