Hacker News new | past | comments | ask | show | jobs | submit login
Houndify: Add voice enabled, conversational interface to anything (houndify.com)
138 points by scott_o on June 4, 2015 | hide | past | favorite | 51 comments



Any plans for other languages and locales? I immediately noticed the temperature in F in the example about the weather in Lima. I think everybody there uses C with the exception of American tourists :-) Seriously, it looks a great product. Maybe it returns even too much data in the JSON. I wonder how to take advantage of all of that if I don't know what people are going to ask. They're going to ask silly questions just for fun even if I have a vertical app (example: a mortgage calculator), because this is not a web form with constrained input fields but a free form input. The numbers I get into the answer could be unrelated to mortgages. Do you have examples of best practices? Maybe just write and speak the answer? Thanks.


Nice observation. Sadly, localization is an afterthought for a lot of developers. I am also curious to see how they handle other languages and locales, since I'm interested in learning how to use these kinds of systems.


To be fair, recognizing another spoken language is a much larger effort than localizing a web site. I was curious to know if they have plans to move beyond English, maybe next year or 2017.


You're probably skeptical (as I was) but watch this video demo of the Hound app: https://m.youtube.com/watch?v=M1ONXea0mXg

That's insanely fast, compound natural language queries. I'm impressed.


The video is only 240p and quite shaky. As it is published by the SoundHound Inc. company, is this a marketing technique to make it look more amateurish?

Such a low latency means the demo was done over Wifi in the SoundHound building - especially if the speech recognition runs on the server side. Or which speech recognition software does that demo app use? Nuance software based on the client? Android 5 voice recognition isn't that fast.


> As it is published by the SoundHound Inc. company, is this a marketing technique to make it look more amateurish?

yes, if you don't scroll down it looks like it's some users demonstration of it.


I’ve tested the app on dial-up internet, and it answers almost immediately.

So the question where the speech recognition happens isn’t easily answered.


How did you get invited? I'd love to test the app. With Natural Language we should be skeptical as long as all we have are canned demos.


You might be able to try signing up here:

http://www.soundhound.com/hound#!


Maybe answers are spoken a bit too fast but that speed makes the demo more impressive.


Just like how I listen to all of my podcasts between 1.8 and 2.2x I would want my phone's spoken responses to default to a high speed.


It feels like the video is sped up ever so slightly. Even the user is talking pretty quick.


It got the Space Needle question wrong. It gave the answer for DC rather than Seattle.


"the capital of the country in which the Space Needle is located"


ah, oops.


Guess it has better speech recognition than you do ;)


After owning Echo, Roku and Fire TV, I'm super-bullish on voice commands finally being ready for prime time. It's a terrific interface for home audio, TV and car audio.

I've gotta think Apple will open up Siri to app developers sooner than later.

Houndify looks interesting.


Definitely. I've been using voice commands in Android for about 5 years now (since ~2010) and I've consistently been shocked at how incredibly efficient an interface it is. The number of capabilities hooked up to voice control has only been increasing since then and it's been great.


I have been trying to use Android voice for five years and have been shocked at how many words it completely mangles for me. Just completely wrong. This has surprised me since I am Canadian, live in the USA and have a generic American TV accent.

The failure rate is high enough that I rarely use the voice feature, despite the fact I have problems with my hands that make typos a constant irritation. I know I am just an anecdote - the strange thing is that my voice and lack of accent should be the easiest thing for Android to navigate.


Interesting. One of the things that immediately got me hooked was how well it recognized my voice; That excitement has long since faded and now excellent speech recognition is something I just take for granted that it does well.

Maybe try going into Settings > Voice on the Google search app and downloading the voice pack for the version of English you think matches best? That's still pretty weird though.


I think voice with a screen is interesting, but voice alone can be difficult. What is the last voice controlled IVR (phone system) that was awesome to interact with. I think it takes a combination of voice and something can can be confirmed with another "button", or something you can touch or push to confirm or cancel what you've "asked" it to do.

I think it can augment things well, but not be the prime time star.


You should try out Echo. It works fine without a screen.


Privacy policy points out that the system sends voice recordings to Houndify but is totally silent on how they will be treated.


And that's different from Google recording all of your searches how?


Google does say what they will do with those searches: http://www.google.com/policies/privacy/#infouse


It's already really easy to get fast, efficient access to large data sets. I don't see much value in that. It is not fast,efficient, and easy to transform natural language queries into computationally actionable ones.

I would find more value as a developer if, when given a natural language query, it returned a structured query. Then I could tweak the query to conform to whatever data retrieval API I wanted.

I don't think what I'm asking for has to be mutually exclusive with what they're currently offering. Give me the option to have houndify do some or all of the work for me.


I am one of the developers for houndify.com, so I can answer this question for you!

We actually have an api endpoint dedicated to doing this for you. At the moment we have a concept of "domains" where developers use a proprietary language to help Hound understand topics. Using our api, you could technically do this yourself, and add functionality that doesn't currently exist on the platform.

You could use the hotel domain and get back a ton of pre-formatted data, or you could just get back speech-to-text, or you could specify hooks you want to take action on. I'm not a developer on the actual voice api itself, so I'm not the most informed, but perhaps that answers your question?


Q1: So a domain developer gets to determine what is returned to from the API for all calls matched to that domain?

Q2: As a domain developer, where does the query-able content reside?



I've been using pocketsphinx with this neat Ruby gem[1]. It's really easy to use but has low accuracy (understands me correctly maybe half the time). I'm curious to see if Houndify does any better!

[1] https://github.com/watsonbox/pocketsphinx-ruby


There is clearly a knowledge graph coupled with this in addition to the speech recognition. Sorry, "meaning" recognition. I feel like there is an opportunity to connect the deep knowledge graph of Wolfram Alpha -- or that maybe Wolfram missed the ball by not connecting their graph in a more usable way.


I wonder if it is based on Freebase.com knowledge graph, which Google discontinued last month. http://www.freebase.com/ (and recently IBM has bought Blekko web search and knowledge graph engine as a replacement for Freebase to power their IBM Watson)


Acquisition by <Google/Apple/Microsoft/Facebook> in 3... 2... 1...


Seems absofreakinglutely amazing. Congratulations to the team. Amazed that this isn't front page everywhere yet.


Does this require a network connection? I'd love to start adding speech-to-text interfaces to my apps, but most of the stuff I work on needs to be able to work without the network, and most of the speech-to-text engines these days are SaaS products in some form or another.


Even if the speech-to-text doesn't, the natural language understanding does.


I expect an acquisition announcement in about six days before the third Thursday in 2016.


Can't Android Google voice keyboard be used as a speech-to-text interface and then the text can be used to trigger a command in a similar fashion?


its the complexity of the queries, and the contextual awareness that makes it impressive. But yes, my immediate thought was either Android s-to-t or google speech api plugged into wolfram alpha might create a (much simpler, but also much easier) version of this.


This looks extremely cool and I can't wait to try it.

Bug: When scrolling down the page it is very very sluggish, using Chrome on Xubuntu 15.04.


I'm one of the developers behind Houndify. Thanks for this feedback, I'll look into it.


Really impressive work. Would love to play with the beta cough invite cough ;-) Looking forward to seeing this being used on some inclusive design projects.


Just a follow up, it's fine until I click on the examples.


I'm one of the developers behind Houndify. Feel free to ask any questions.


Any chance for an invite, or should we just use https://www.houndify.com/verify-invite ?


If you have an android, we can definitely get you an invite. Not sure if hacker news has a PM system though. Not sure if you wanna publish your email in here.


Hey I made that page!


Do I know you?


The thing that least impresses me about this demo is the voice synthesis :)


It is surprising how that has lagged behind. I haven't noticed any significant improvement for several years in the voice synthesis on Android.


I thought this was Apple-funded?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: