Nice demo! But someone has to play devil's advocate, so please allow me: why would I send commands about what I do in my house to your servers? Especially since you apparently even store them there (s. Inbox)?
Would you consider offering a version of your program that I can download and run on my home server? That would be cool...
Finally, since the title is "Siri as a Service", where do you expect the microphone to be in a Home Automation setting? Do you envision people using their cell phones for that?
Yes we plan to release an offline lightweight runtime that allows you to run Wit locally.
Many home automation systems will have a built-in microphone (or most probably an array of microphones, which is efficient to cope with background noise). But your smartphone might be useful in case you are in the garden for instance!
We just did a hack-week project using wit and we were blown away at the accuracy of it. Somehow it figured out 'show me the stories assigned to me' meant our 'find-owned-stories' intent ... and that was without any training! Maybe it was luck, but after using it for hours and hours it really is remarkable.
Oh my. Cannot wait to give this a shot. I played with Wit awhile back when they launched, and was seriously impressed. This will make it much easier to develop technology for their NLP.
I'm very interested in using this for robotics, especially RoboCup. That that latter case, I can't use an internet connection, the robot must be fully autonomous and independent.
Can I run an instance of a wit server myself? And perhaps update its speech models regularly?
Yes you'll be able to run a lightweight Wit runtime locally very soon. Learning will still happen on the server. The embedded client will upload its usage data to feed training, and download updated models (for both speech and natural language understanding).
Any chance for those of us with lots of spare CPU and memory resources lying around to get ahold of a standalone server package? This out of an interest of all of not relying on third party services, a compulsion for DIY, and a slight completely illogical unease with sending personal training datasets to a potentially untrusted source.
Nice - does this work for unconstrained speech - as in would this work for use-cases like transcription? What's the maximum audio length supported? Is accent an issue? Sensitivity to background noise? A FAQ would be nice to have.
Alas, all of their SaaS plans have the same price: "negotiable". I never felt like negotiating, so I have no idea if they're affordable; unfortunately custom pricing usually means it's not the case.
Yes. This is what I absolutely hate - I will never 'ask for prices' because I know it involves a harass-y phone call in my future. Such a shame people aren't more open.
The robot I work with (http://wiki.ros.org/Robots/AMIGO) uses a proprietary stand-alone package from Philips which works really well.
Somehow, our license expired and now we use Festival, which works, but I really miss its old voice. Especially for Dutch text-to-speech it works quite well.
But for the life of me I can't find a link of where you could buy it.
Doing TTS correctly, with intonation etc. is really hard. Its not really my field, but I can imagine that getting intonation right is near impossible with just unannotated text.
I figured as much. I've been using Festival myself, which seems OK. But there appear to be a million voices and I have no idea which ones are supposed to be the best.
We use several speech recognition engines in parallel, including customized Sphinx. We don't use deep learning acoustic models yet, but that's in the works.
`wit/datetime` does parse both absolute dates like "17th Feb", "feb 27" or "13/02", and relative dates like "tomorrow". If you find a date that is not parsed that's a bug.
Google has achieved impressive accuracy for speech-to-text (especially for open-domain large vocabulary speech recognition).
If you use Google Web Speech though, you receive text and you still have to do NLP to "understand" the user intent. The other problem is, if Google does not know about specific words (like your company or product name), you have no way to customize the engine (no "Add to dictionary" entry point in the API!).
My friends and I did similar thing back in HackMIT. (http://hackmit.challengepost.com/submissions/18093-jarvis) They did most of the work, so kudos to them mostly. If and only if we had the time to push our idea forward then today it would be us lol
The idea is exactly like Wit and it was supposed to integrate with all kinds of web services out there, to be as close as Siri, but do a lot more than just opening up new app or visit a weather forcast website. Bascially, your virtual assistant operates like IFTT. At the end, I think speech into home automation is the future gold mine.
This is cool, was just talking about doing something like this the other way to create a "Natural Unix".. "copy the file to the desktop folder, then run the script"..
Really cool service. Had to try it out today and built a little prototype to interact with maps using speech recognition. If someone is interested: https://github.com/dwilhelm89/SpeechMap
This looks great. What are the alternatives to wit.ai today? In general, what are the best natural language processing API / library / service out of the box today?
Maluuba is a nice Siri-like virtual agent app for Android, but Maluuba's API only offers predefined intents (across 23 categories). Developers cannot create intents for their own domain.
Also, to my knowledge they rely on Android's speech recognition and the API accepts only text, not audio streams.
Would you consider offering a version of your program that I can download and run on my home server? That would be cool...
Finally, since the title is "Siri as a Service", where do you expect the microphone to be in a Home Automation setting? Do you envision people using their cell phones for that?
Thanks.