Hacker News new | past | comments | ask | show | jobs | submit login
Fast and free Speech Recognition with Google's Voice API (fennb.com)
101 points by taybenlor on Aug 11, 2011 | hide | past | favorite | 15 comments



Let's be clear here: this is an unauthorized use of Google servers.

Unless Google provides an explicit Terms Of Service for this API endpoints allowing its use, you can assume that it exists only to serve Google-owned software (like Chrome).

Just because it exists doesn't mean you can use it.

There's an obvious risk that Google can modify the API and less obvious legal risk of misusing someone else's resources.

Doesn't mean you can't play with it and use it for experiments but you probably shouldn't cross the line of actually using it in any production software.


If it's publically accessible and not password or rate-limited then it's fair game for hackers to goof around with. Just don't base your startup on a (any) Google API.


I agree Google could easily change the API at any moment, but otherwise it seems a very grey area at best. Especially since they are including this within the open source parts of Chromium as the article implies.

I wouldn't rely on this for anything I wanted to support, but I don't see that it is incurring a legal risk to work with source code Google themselves have released.


This reminds me of the bad old days when Google Maps didn't have an API yet and people manually de-minified the source and hacked their way into the hidden API. The original mashups that were created as a result of this hacking resulted in the Google Maps API as it stands now. I've heard from Googlers that before the proliferation of these mashups there were no plans at Google to introduce a public Maps API.


The API is not reliable... I've invested a bit of time into this: https://github.com/taf2/speech2text and been underwhelmed so far... might be interesting to use google as a training tool for sphinx... the issue i found is because the utterance length supported is so short by google, you need to then figure out how to break wave's into small word chunks... but then you lose context so you lose quality...


There's a random, spammy-looking URL in the description, about call tracking metrics. Were you aware of that?


Is there any authorized service for small-scale speech recognition from Google or anyone else? A free or cheap one?

Or is there any kind of open-source project? I guess there probably is not since the training set for modern systems would be terabytes of audio data and probably more proprietary than the algorithms.


There are a few open-source projects, like at http://www.speech.cs.cmu.edu/ -- as you say, they haven't gotten the love that Google's or Dragon's have, not last I looked.


You could probably mash something up using Voxeo or even the new Twilio client stuff.

I have this on my project list for some home automation ideas, but it is too far down to actually have been given any cycles.


If you use Windows, there's built-in Speech recognition.

http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).as...

There's a server version of it with Microsoft Lync I think.

Microsoft also has a cloud service for the same:

http://www.microsoft.com/en-us/Tellme/technology/default.asp...


when you develop on WP7 you get it for free http://research.microsoft.com/en-us/um/redmond/projects/hawa... along with OCR and other goodies. Well, for research purposes at least


Just one more option - Twilio has a voicemail transcription feature as part of it's API, with automatic call back. So anything recorded could be available as text that way.


Any ideas why only the first result comes with a confidence value?


I suspect it's because it's the most relevant.

For me personally I wouldn't be interested in using anything besides the most accurate prediction. However, I would probably make the alternative hypotheses available to choose in case the best prediction is incorrect. In that case I can only assume that the hypotheses are listed in decreasing order of confidences.


If you have a relevant text corpus (e.g. previous transcripts of the same person), you could use some Markov-type modeling/analysis to verify the transcription or find the most suitable alternative in the list.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: