Show HN: Siri-as-a-Service Speech API

kleiba · on Feb 12, 2014

Nice demo! But someone has to play devil's advocate, so please allow me: why would I send commands about what I do in my house to your servers? Especially since you apparently even store them there (s. Inbox)?

Would you consider offering a version of your program that I can download and run on my home server? That would be cool...

Finally, since the title is "Siri as a Service", where do you expect the microphone to be in a Home Automation setting? Do you envision people using their cell phones for that?

Thanks.

ar7hur · on Feb 12, 2014

Yes we plan to release an offline lightweight runtime that allows you to run Wit locally.

Many home automation systems will have a built-in microphone (or most probably an array of microphones, which is efficient to cope with background noise). But your smartphone might be useful in case you are in the garden for instance!

13throwaway · on Feb 12, 2014

I'm very happy with this answer.

danlash · on Feb 12, 2014

We just did a hack-week project using wit and we were blown away at the accuracy of it. Somehow it figured out 'show me the stories assigned to me' meant our 'find-owned-stories' intent ... and that was without any training! Maybe it was luck, but after using it for hours and hours it really is remarkable.

endlessvoid94 · on Feb 12, 2014

Oh my. Cannot wait to give this a shot. I played with Wit awhile back when they launched, and was seriously impressed. This will make it much easier to develop technology for their NLP.

One step closer to Jarvis.

imtu80 · on Feb 12, 2014

Here is a link of an open source project. http://cmusphinx.sourceforge.net/

Looks like they are using it for their project http://cmusphinx.sourceforge.net/2013/09/processing-speech-r...

ar7hur · on Feb 12, 2014

Wit and CMU Sphinx are friends with benefits :)

Wit leverages several speech engines, including Sphinx.

And Sphinx speech-to-text users can send text to Wit to do Natural Language Understanding (turn text into actionable data).

ragebol · on Feb 13, 2014

Does Wit also use the fragments of speech that get sent over the API as a dataset for training Sphinx?

ar7hur · on Feb 13, 2014

forgotprevpass · on Feb 12, 2014

This is the first startup in the last 6 months that has me genuinely excited. Congrats!

I know this is considered bad mannered, but what's the NLP behind the scenes look like? I'm curious. :)

ragebol · on Feb 12, 2014

I'm very interested in using this for robotics, especially RoboCup. That that latter case, I can't use an internet connection, the robot must be fully autonomous and independent.

Can I run an instance of a wit server myself? And perhaps update its speech models regularly?

ar7hur · on Feb 12, 2014

Yes you'll be able to run a lightweight Wit runtime locally very soon. Learning will still happen on the server. The embedded client will upload its usage data to feed training, and download updated models (for both speech and natural language understanding).

existencebox · on Feb 13, 2014

Any chance for those of us with lots of spare CPU and memory resources lying around to get ahold of a standalone server package? This out of an interest of all of not relying on third party services, a compulsion for DIY, and a slight completely illogical unease with sending personal training datasets to a potentially untrusted source.

ragebol · on Feb 12, 2014

What does "lightweight" entail? Only that the learning is online? What would still be awesome.

ar7hur · on Feb 12, 2014

Lightweight = pure C + runtime only (as opposed to the training side, which needs lots of CPU and memory resources)

iandanforth · on Feb 12, 2014

Very cool! I will definitely use this.

balakk · on Feb 13, 2014

Nice - does this work for unconstrained speech - as in would this work for use-cases like transcription? What's the maximum audio length supported? Is accent an issue? Sensitivity to background noise? A FAQ would be nice to have.

weixiyen · on Feb 13, 2014

FAQ seconded. This looks like a potentially amazing product, especially love the potential home automation use cases.

untog · on Feb 12, 2014

Does anyone have any recommendations for the other side of this - translating text into speech?

I know there are a million APIs for this, but most sound awful. I'd love a service that sounds as good as Siri.

weirdcat · on Feb 12, 2014

IVONA has some great voices: http://www.ivona.com/en/

Alas, all of their SaaS plans have the same price: "negotiable". I never felt like negotiating, so I have no idea if they're affordable; unfortunately custom pricing usually means it's not the case.

http://www.ivona.com/en/saas/offer/

untog · on Feb 12, 2014

Yes. This is what I absolutely hate - I will never 'ask for prices' because I know it involves a harass-y phone call in my future. Such a shame people aren't more open.

ragebol · on Feb 12, 2014

The robot I work with (http://wiki.ros.org/Robots/AMIGO) uses a proprietary stand-alone package from Philips which works really well. Somehow, our license expired and now we use Festival, which works, but I really miss its old voice. Especially for Dutch text-to-speech it works quite well.

But for the life of me I can't find a link of where you could buy it.

Doing TTS correctly, with intonation etc. is really hard. Its not really my field, but I can imagine that getting intonation right is near impossible with just unannotated text.

untog · on Feb 12, 2014

I figured as much. I've been using Festival myself, which seems OK. But there appear to be a million voices and I have no idea which ones are supposed to be the best.

rahij · on Feb 13, 2014

http://translate.google.com/translate_tts?tl=en&q=hi%20how%2...

saturnine · on Feb 12, 2014

I've found this to be one of the better ones http://www.ispeech.org

agilebyte · on Feb 12, 2014

Yes, the Czech version only sounds like a person after a stroke...

draugadrotten · on Feb 13, 2014

acapela infovox have the best voices I've heard.

http://www.acapela-box.com/

snippyhollow · on Feb 12, 2014

Which speech recognition engine do you use? Kaldi? Sphinx? Good ol' HTK? Homemade? With deep learning acoustic models?

ar7hur · on Feb 12, 2014

We use several speech recognition engines in parallel, including customized Sphinx. We don't use deep learning acoustic models yet, but that's in the works.

x3c · on Feb 13, 2014

Off topic but a question regarding text to intent object parsing:

Is there any date entity in the wit directory? How do I parse a date for e.g. 17th Feb or feb 27 or 13/02?

Datetime only has contextual date selection i.e. today or tomorrow.

ar7hur · on Feb 14, 2014

`wit/datetime` does parse both absolute dates like "17th Feb", "feb 27" or "13/02", and relative dates like "tomorrow". If you find a date that is not parsed that's a bug.

mikeash · on Feb 12, 2014

I love to see this service improve. I just wish I could come up with something really neat to build with it!

_vya7 · on Feb 12, 2014

Jarvis!

jahansafd · on Feb 12, 2014

This is pretty cool!

Google's Web Speech API can also be used to build something similar.

Here is the Web Speech API Demonstration: https://www.google.com/intl/en/chrome/demos/speech.html

ar7hur · on Feb 12, 2014

Google has achieved impressive accuracy for speech-to-text (especially for open-domain large vocabulary speech recognition).

If you use Google Web Speech though, you receive text and you still have to do NLP to "understand" the user intent. The other problem is, if Google does not know about specific words (like your company or product name), you have no way to customize the engine (no "Add to dictionary" entry point in the API!).

yeukhon · on Feb 12, 2014

My friends and I did similar thing back in HackMIT. (http://hackmit.challengepost.com/submissions/18093-jarvis) They did most of the work, so kudos to them mostly. If and only if we had the time to push our idea forward then today it would be us lol

The idea is exactly like Wit and it was supposed to integrate with all kinds of web services out there, to be as close as Siri, but do a lot more than just opening up new app or visit a weather forcast website. Bascially, your virtual assistant operates like IFTT. At the end, I think speech into home automation is the future gold mine.

swalsh · on Feb 12, 2014

Click on, and play for a bit. This is actually a very impressive platform.

Mizza · on Feb 12, 2014

This is cool, was just talking about doing something like this the other way to create a "Natural Unix".. "copy the file to the desktop folder, then run the script"..

dnns · on Feb 13, 2014

Really cool service. Had to try it out today and built a little prototype to interact with maps using speech recognition. If someone is interested: https://github.com/dwilhelm89/SpeechMap

ragebol · on Feb 12, 2014

Awesome, can't wait to try this out.

I guess I'll have to update my ROS wrappers at https://github.com/LoyVanBeek/wit_ros as well.

duyhuynh · on Feb 13, 2014

This looks great. What are the alternatives to wit.ai today? In general, what are the best natural language processing API / library / service out of the box today?

fuzzythinker · on Feb 12, 2014

For more on previous discussion on HN: https://news.ycombinator.com/item?id=6373645

rahij · on Feb 12, 2014

Very cool! On a side note, Is the voice Drew Houston?

ar7hur · on Feb 12, 2014

No -- almost. Try again!

rahij · on Feb 12, 2014

Ah Paul Graham it is!

lcasela · on Feb 12, 2014

Coolest thing i've seen on HN for a while.

ddevault · on Feb 13, 2014

Why do you not support Firefox? WebRTC and WebAudio have hit Firefox stable. Do feature detection, not UA detection.

blandinw · on Feb 13, 2014

This: https://bugzilla.mozilla.org/show_bug.cgi?id=957724

ddevault · on Feb 14, 2014

Are you involved with the service? This seems like a very silly bug to completely drop Firefox support over.

jamesfranco · on Feb 13, 2014

http://www.maluuba.com Does the same as wit.ai I guess

ar7hur · on Feb 13, 2014

Maluuba is a nice Siri-like virtual agent app for Android, but Maluuba's API only offers predefined intents (across 23 categories). Developers cannot create intents for their own domain.

Also, to my knowledge they rely on Android's speech recognition and the API accepts only text, not audio streams.

jamesfranco · on Feb 13, 2014

I contacted them a while ago and they stated that they will let developers create their own intents soon.

http://www.ask-ziggy.com is also similar

rayiner · on Feb 12, 2014

Wait, what's Wit's (hard to Google...) connection with Siri and SRI?

lcasela · on Feb 12, 2014

I just tested the html demo tutorial thing. This thing works amazingly.

BenjaminN · on Feb 12, 2014

This is awesome. Definitely using this soon.

ar7hur · on Feb 12, 2014

Thanks! You can sign up on https://wit.ai

nighthawk24 · on Feb 13, 2014

All cool, but who is behind this project?

smefi · on Feb 12, 2014

Very interesting, I will check it!

mtreder · on Feb 12, 2014

Too cool to be true :). Love it!

eric_khun · on Feb 12, 2014

awesome idea!