Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Siri-as-a-Service Speech API (wit.ai)
242 points by ar7hur on Feb 12, 2014 | hide | past | favorite | 60 comments



Nice demo! But someone has to play devil's advocate, so please allow me: why would I send commands about what I do in my house to your servers? Especially since you apparently even store them there (s. Inbox)?

Would you consider offering a version of your program that I can download and run on my home server? That would be cool...

Finally, since the title is "Siri as a Service", where do you expect the microphone to be in a Home Automation setting? Do you envision people using their cell phones for that?

Thanks.


Yes we plan to release an offline lightweight runtime that allows you to run Wit locally.

Many home automation systems will have a built-in microphone (or most probably an array of microphones, which is efficient to cope with background noise). But your smartphone might be useful in case you are in the garden for instance!


I'm very happy with this answer.


We just did a hack-week project using wit and we were blown away at the accuracy of it. Somehow it figured out 'show me the stories assigned to me' meant our 'find-owned-stories' intent ... and that was without any training! Maybe it was luck, but after using it for hours and hours it really is remarkable.


Oh my. Cannot wait to give this a shot. I played with Wit awhile back when they launched, and was seriously impressed. This will make it much easier to develop technology for their NLP.

One step closer to Jarvis.


Here is a link of an open source project. http://cmusphinx.sourceforge.net/

Looks like they are using it for their project http://cmusphinx.sourceforge.net/2013/09/processing-speech-r...


Wit and CMU Sphinx are friends with benefits :)

Wit leverages several speech engines, including Sphinx.

And Sphinx speech-to-text users can send text to Wit to do Natural Language Understanding (turn text into actionable data).


Does Wit also use the fragments of speech that get sent over the API as a dataset for training Sphinx?


Yes!


This is the first startup in the last 6 months that has me genuinely excited. Congrats!

I know this is considered bad mannered, but what's the NLP behind the scenes look like? I'm curious. :)


I'm very interested in using this for robotics, especially RoboCup. That that latter case, I can't use an internet connection, the robot must be fully autonomous and independent.

Can I run an instance of a wit server myself? And perhaps update its speech models regularly?


Yes you'll be able to run a lightweight Wit runtime locally very soon. Learning will still happen on the server. The embedded client will upload its usage data to feed training, and download updated models (for both speech and natural language understanding).


Any chance for those of us with lots of spare CPU and memory resources lying around to get ahold of a standalone server package? This out of an interest of all of not relying on third party services, a compulsion for DIY, and a slight completely illogical unease with sending personal training datasets to a potentially untrusted source.


What does "lightweight" entail? Only that the learning is online? What would still be awesome.


Lightweight = pure C + runtime only (as opposed to the training side, which needs lots of CPU and memory resources)


Very cool! I will definitely use this.


Nice - does this work for unconstrained speech - as in would this work for use-cases like transcription? What's the maximum audio length supported? Is accent an issue? Sensitivity to background noise? A FAQ would be nice to have.


FAQ seconded. This looks like a potentially amazing product, especially love the potential home automation use cases.


Does anyone have any recommendations for the other side of this - translating text into speech?

I know there are a million APIs for this, but most sound awful. I'd love a service that sounds as good as Siri.


IVONA has some great voices: http://www.ivona.com/en/

Alas, all of their SaaS plans have the same price: "negotiable". I never felt like negotiating, so I have no idea if they're affordable; unfortunately custom pricing usually means it's not the case.

http://www.ivona.com/en/saas/offer/


Yes. This is what I absolutely hate - I will never 'ask for prices' because I know it involves a harass-y phone call in my future. Such a shame people aren't more open.


The robot I work with (http://wiki.ros.org/Robots/AMIGO) uses a proprietary stand-alone package from Philips which works really well. Somehow, our license expired and now we use Festival, which works, but I really miss its old voice. Especially for Dutch text-to-speech it works quite well.

But for the life of me I can't find a link of where you could buy it.

Doing TTS correctly, with intonation etc. is really hard. Its not really my field, but I can imagine that getting intonation right is near impossible with just unannotated text.


I figured as much. I've been using Festival myself, which seems OK. But there appear to be a million voices and I have no idea which ones are supposed to be the best.



I've found this to be one of the better ones http://www.ispeech.org


Yes, the Czech version only sounds like a person after a stroke...


acapela infovox have the best voices I've heard.

http://www.acapela-box.com/


Which speech recognition engine do you use? Kaldi? Sphinx? Good ol' HTK? Homemade? With deep learning acoustic models?


We use several speech recognition engines in parallel, including customized Sphinx. We don't use deep learning acoustic models yet, but that's in the works.


Off topic but a question regarding text to intent object parsing:

Is there any date entity in the wit directory? How do I parse a date for e.g. 17th Feb or feb 27 or 13/02?

Datetime only has contextual date selection i.e. today or tomorrow.


`wit/datetime` does parse both absolute dates like "17th Feb", "feb 27" or "13/02", and relative dates like "tomorrow". If you find a date that is not parsed that's a bug.


I love to see this service improve. I just wish I could come up with something really neat to build with it!


Jarvis!


This is pretty cool!

Google's Web Speech API can also be used to build something similar.

Here is the Web Speech API Demonstration: https://www.google.com/intl/en/chrome/demos/speech.html


Google has achieved impressive accuracy for speech-to-text (especially for open-domain large vocabulary speech recognition).

If you use Google Web Speech though, you receive text and you still have to do NLP to "understand" the user intent. The other problem is, if Google does not know about specific words (like your company or product name), you have no way to customize the engine (no "Add to dictionary" entry point in the API!).


My friends and I did similar thing back in HackMIT. (http://hackmit.challengepost.com/submissions/18093-jarvis) They did most of the work, so kudos to them mostly. If and only if we had the time to push our idea forward then today it would be us lol

The idea is exactly like Wit and it was supposed to integrate with all kinds of web services out there, to be as close as Siri, but do a lot more than just opening up new app or visit a weather forcast website. Bascially, your virtual assistant operates like IFTT. At the end, I think speech into home automation is the future gold mine.


Click on, and play for a bit. This is actually a very impressive platform.


This is cool, was just talking about doing something like this the other way to create a "Natural Unix".. "copy the file to the desktop folder, then run the script"..


Really cool service. Had to try it out today and built a little prototype to interact with maps using speech recognition. If someone is interested: https://github.com/dwilhelm89/SpeechMap


Awesome, can't wait to try this out.

I guess I'll have to update my ROS wrappers at https://github.com/LoyVanBeek/wit_ros as well.


This looks great. What are the alternatives to wit.ai today? In general, what are the best natural language processing API / library / service out of the box today?


For more on previous discussion on HN: https://news.ycombinator.com/item?id=6373645


Very cool! On a side note, Is the voice Drew Houston?


No -- almost. Try again!


Ah Paul Graham it is!


Coolest thing i've seen on HN for a while.


Why do you not support Firefox? WebRTC and WebAudio have hit Firefox stable. Do feature detection, not UA detection.



Are you involved with the service? This seems like a very silly bug to completely drop Firefox support over.


http://www.maluuba.com Does the same as wit.ai I guess


Maluuba is a nice Siri-like virtual agent app for Android, but Maluuba's API only offers predefined intents (across 23 categories). Developers cannot create intents for their own domain.

Also, to my knowledge they rely on Android's speech recognition and the API accepts only text, not audio streams.


I contacted them a while ago and they stated that they will let developers create their own intents soon.

http://www.ask-ziggy.com is also similar


Wait, what's Wit's (hard to Google...) connection with Siri and SRI?


I just tested the html demo tutorial thing. This thing works amazingly.


This is awesome. Definitely using this soon.


Thanks! You can sign up on https://wit.ai


All cool, but who is behind this project?


Very interesting, I will check it!


Too cool to be true :). Love it!


awesome idea!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: