Any plans for other languages and locales? I immediately noticed the temperature in F in the example about the weather in Lima. I think everybody there uses C with the exception of American tourists :-) Seriously, it looks a great product. Maybe it returns even too much data in the JSON. I wonder how to take advantage of all of that if I don't know what people are going to ask. They're going to ask silly questions just for fun even if I have a vertical app (example: a mortgage calculator), because this is not a web form with constrained input fields but a free form input. The numbers I get into the answer could be unrelated to mortgages. Do you have examples of best practices? Maybe just write and speak the answer? Thanks.
Nice observation. Sadly, localization is an afterthought for a lot of developers. I am also curious to see how they handle other languages and locales, since I'm interested in learning how to use these kinds of systems.
To be fair, recognizing another spoken language is a much larger effort than localizing a web site. I was curious to know if they have plans to move beyond English, maybe next year or 2017.
The video is only 240p and quite shaky. As it is published by the SoundHound Inc. company, is this a marketing technique to make it look more amateurish?
Such a low latency means the demo was done over Wifi in the SoundHound building - especially if the speech recognition runs on the server side. Or which speech recognition software does that demo app use? Nuance software based on the client? Android 5 voice recognition isn't that fast.
After owning Echo, Roku and Fire TV, I'm super-bullish on voice commands finally being ready for prime time. It's a terrific interface for home audio, TV and car audio.
I've gotta think Apple will open up Siri to app developers sooner than later.
Definitely. I've been using voice commands in Android for about 5 years now (since ~2010) and I've consistently been shocked at how incredibly efficient an interface it is. The number of capabilities hooked up to voice control has only been increasing since then and it's been great.
I have been trying to use Android voice for five years and have been shocked at how many words it completely mangles for me. Just completely wrong. This has surprised me since I am Canadian, live in the USA and have a generic American TV accent.
The failure rate is high enough that I rarely use the voice feature, despite the fact I have problems with my hands that make typos a constant irritation. I know I am just an anecdote - the strange thing is that my voice and lack of accent should be the easiest thing for Android to navigate.
Interesting. One of the things that immediately got me hooked was how well it recognized my voice; That excitement has long since faded and now excellent speech recognition is something I just take for granted that it does well.
Maybe try going into Settings > Voice on the Google search app and downloading the voice pack for the version of English you think matches best? That's still pretty weird though.
I think voice with a screen is interesting, but voice alone can be difficult. What is the last voice controlled IVR (phone system) that was awesome to interact with. I think it takes a combination of voice and something can can be confirmed with another "button", or something you can touch or push to confirm or cancel what you've "asked" it to do.
I think it can augment things well, but not be the prime time star.
It's already really easy to get fast, efficient access to large data sets. I don't see much value in that. It is not fast,efficient, and easy to transform natural language queries into computationally actionable ones.
I would find more value as a developer if, when given a natural language query, it returned a structured query. Then I could tweak the query to conform to whatever data retrieval API I wanted.
I don't think what I'm asking for has to be mutually exclusive with what they're currently offering. Give me the option to have houndify do some or all of the work for me.
I am one of the developers for houndify.com, so I can answer this question for you!
We actually have an api endpoint dedicated to doing this for you. At the moment we have a concept of "domains" where developers use a proprietary language to help Hound understand topics. Using our api, you could technically do this yourself, and add functionality that doesn't currently exist on the platform.
You could use the hotel domain and get back a ton of pre-formatted data, or you could just get back speech-to-text, or you could specify hooks you want to take action on. I'm not a developer on the actual voice api itself, so I'm not the most informed, but perhaps that answers your question?
I've been using pocketsphinx with this neat Ruby gem[1]. It's really easy to use but has low accuracy (understands me correctly maybe half the time). I'm curious to see if Houndify does any better!
There is clearly a knowledge graph coupled with this in addition to the speech recognition. Sorry, "meaning" recognition. I feel like there is an opportunity to connect the deep knowledge graph of Wolfram Alpha -- or that maybe Wolfram missed the ball by not connecting their graph in a more usable way.
I wonder if it is based on Freebase.com knowledge graph, which Google discontinued last month. http://www.freebase.com/ (and recently IBM has bought Blekko web search and knowledge graph engine as a replacement for Freebase to power their IBM Watson)
Does this require a network connection? I'd love to start adding speech-to-text interfaces to my apps, but most of the stuff I work on needs to be able to work without the network, and most of the speech-to-text engines these days are SaaS products in some form or another.
its the complexity of the queries, and the contextual awareness that makes it impressive. But yes, my immediate thought was either Android s-to-t or google speech api plugged into wolfram alpha might create a (much simpler, but also much easier) version of this.
Really impressive work. Would love to play with the beta cough invite cough ;-) Looking forward to seeing this being used on some inclusive design projects.
If you have an android, we can definitely get you an invite. Not sure if hacker news has a PM system though. Not sure if you wanna publish your email in here.