I went through the code a little while ago when I first saw this on HN.
My thoughts:
It really doesn't scale. It is designed for running on a single instance of physical hardware, RPi. In it's current guise it is really difficult to separate out, particularly, the TTS part which is tied to outputting to a speaker.
There is also no way to process different parts on different machines, wrap up the answer and post it back to a client device, for example a phone.
The detection of command is really just regular expressions, simple enough for a lot of people to work with
but is obviously inflexible, the priority setting is useful as long as you have everything in the right order (imagine ordering 20 or more modules).
What is nice is that processing component could be very easily replaced with one that does some NLP, and fortunately Python (which it is writen in) does have some excellent libraries for that (NLTK). This means you could more naturally detect 'weather <city>' or 'forecast <country>' with some entity matching.
The other nice bit is they have done a considerable amount of work abstracting STT and TTS libraries. By itself this would be pretty useful.
Thanks for the thoughtful feedback--I think these are very fair comments.
When we first designed Jasper, it was just for us to hack around with, so regex matching, the priority system, the single-instance configuration, etc.--these all made a lot of sense for our use case (and the use cases that we foresaw w/r/t casual hackers). Our goal was just to make things simple and accessible (hence our focus on documentation). Since our initial release, Jan Holthuis has taken over much of the development, and he's put a big emphasis on abstracting out the STT and TSS libraries (as you mentioned) and improving the design more generally. My hope is that Jasper will continue to grow and mature, and that the suggestions and possibilities you mention become realities.
That's entirely fair. I realise my use case wasn't the same as your intended goals so I was trying to not be too critical -- also half remembered ;)
You've done some really great stuff in terms of documentation and platform support and it's been designed perfectly for people to start hacking on, which I hope will bring in more interest and in turn bring in faster development.
This is cute, and definitely a good educational resource about mature foss HCI applications, but this will not give the degree of quality that many have come to expect from assistants as simple as even Amazon's Alexa. Sirius is much closer to what you want.
After a casual read: yes, I believe so, in that it does not require 3rd party API's. You do still need to run it on a server that your client device can connect to. If your goal is an assistant that does not report back to Apple/Google/MSFT/Amazon, then Sirius looks like another good option.
Was interested in this but from my reading of the docs it looks like it just queries a local instance of Wikipedia? So I couldn't ask "What's the weather today" or "Do I have an appointment on Saturday" or "Who won last night's Redskins game?". Which means it's a neat toy but nothing I'd ever use.
Was just reading this about Mark Zuckerberg wanting to build his own Jarvis... and then I saw this. Realise one is technical and the other is not, just thought I would share this moment of coincidence that lightly tickled me for a moment.
I won't comment on the "best" here as they all have their respective differences, but consider signing up for the Nuance Mix developer platform Beta: https://developer.nuance.com/mix ; we're offering full control custom ASR+NLU.
This looks pretty cool. I just recently got an Amazon Echo. Could something similar be done using Amazon's public dev tools or does anyone have experience with this [0]?
Looks very simple to create your own module. Would be great to see it work independent of raspberrypi. At least so you can run in text mode on your laptop to give it a whirl. Also would be awesome to see some sort of jasper-cluster where you could have it running on multiple devices that can push updates to all devices when modules are added/configurations changed.
I think the next step in coding will be VR to visualise objects (classes, modules, methods, lambdas, basic language concepts) and speech to command the text editor to edit line numbers or swap lines or indent, change words etc... basically things vim can do but with voice.
It would be cool if you could make a HTTP request with the voice command as text , then have it speak the result.
Would make it easy to write services for it.
I would be very interested in a product like this with the above functionality.
The recommended mic is probably ok, but for far field audio, or even slightly noisy environments, a microphone array enabling beamforming (like on Amazon Echo) would be much better
Some site design advice: The Home link didn't tell me anything about what Jasper could do. I clicked About, and it stayed on the same page, jumped down, and also didn't tell me what it could do.
I had to get to Documentation, then Usage, then scroll down to find out what Jasper could do. This is critical information to an assistant, you've gotta make that more available to us.
You also should avoid advertising features not available yet. "Control your home" is mentioned, but no home automation devices are listed as modules. Anything that can control your home should have a list of the hardware it works with readily available.
It can be entirely self hosted, though there are options that use voice services from AT&T and Google. And of course any plugins that use cloud services would of course send data to the cloud. But it can be configured self contained.
Ok, rereading FAQ found "None of this information ever leaves the Pi" for information in my profile. Still interesting what cloud services are being used, but this probably is on a per-module basis.
My thoughts:
It really doesn't scale. It is designed for running on a single instance of physical hardware, RPi. In it's current guise it is really difficult to separate out, particularly, the TTS part which is tied to outputting to a speaker.
There is also no way to process different parts on different machines, wrap up the answer and post it back to a client device, for example a phone.
The detection of command is really just regular expressions, simple enough for a lot of people to work with but is obviously inflexible, the priority setting is useful as long as you have everything in the right order (imagine ordering 20 or more modules).
What is nice is that processing component could be very easily replaced with one that does some NLP, and fortunately Python (which it is writen in) does have some excellent libraries for that (NLTK). This means you could more naturally detect 'weather <city>' or 'forecast <country>' with some entity matching.
The other nice bit is they have done a considerable amount of work abstracting STT and TTS libraries. By itself this would be pretty useful.
Inline documentation is excellent.