Hacker News new | past | comments | ask | show | jobs | submit login
Build your own Jarvis (jasperproject.github.io)
134 points by posharma on Jan 4, 2016 | hide | past | favorite | 30 comments



I went through the code a little while ago when I first saw this on HN.

My thoughts:

It really doesn't scale. It is designed for running on a single instance of physical hardware, RPi. In it's current guise it is really difficult to separate out, particularly, the TTS part which is tied to outputting to a speaker.

There is also no way to process different parts on different machines, wrap up the answer and post it back to a client device, for example a phone.

The detection of command is really just regular expressions, simple enough for a lot of people to work with but is obviously inflexible, the priority setting is useful as long as you have everything in the right order (imagine ordering 20 or more modules).

What is nice is that processing component could be very easily replaced with one that does some NLP, and fortunately Python (which it is writen in) does have some excellent libraries for that (NLTK). This means you could more naturally detect 'weather <city>' or 'forecast <country>' with some entity matching.

The other nice bit is they have done a considerable amount of work abstracting STT and TTS libraries. By itself this would be pretty useful.

Inline documentation is excellent.


Thanks for the thoughtful feedback--I think these are very fair comments.

When we first designed Jasper, it was just for us to hack around with, so regex matching, the priority system, the single-instance configuration, etc.--these all made a lot of sense for our use case (and the use cases that we foresaw w/r/t casual hackers). Our goal was just to make things simple and accessible (hence our focus on documentation). Since our initial release, Jan Holthuis has taken over much of the development, and he's put a big emphasis on abstracting out the STT and TSS libraries (as you mentioned) and improving the design more generally. My hope is that Jasper will continue to grow and mature, and that the suggestions and possibilities you mention become realities.


That's entirely fair. I realise my use case wasn't the same as your intended goals so I was trying to not be too critical -- also half remembered ;)

You've done some really great stuff in terms of documentation and platform support and it's been designed perfectly for people to start hacking on, which I hope will bring in more interest and in turn bring in faster development.


This is cute, and definitely a good educational resource about mature foss HCI applications, but this will not give the degree of quality that many have come to expect from assistants as simple as even Amazon's Alexa. Sirius is much closer to what you want.

http://sirius.clarity-lab.org/sirius-suite/


Is this a self contained project that does not need internet connection to do speech recognition/synthesis?


After a casual read: yes, I believe so, in that it does not require 3rd party API's. You do still need to run it on a server that your client device can connect to. If your goal is an assistant that does not report back to Apple/Google/MSFT/Amazon, then Sirius looks like another good option.


Was interested in this but from my reading of the docs it looks like it just queries a local instance of Wikipedia? So I couldn't ask "What's the weather today" or "Do I have an appointment on Saturday" or "Who won last night's Redskins game?". Which means it's a neat toy but nothing I'd ever use.


Didn't know that Caffe can be used for speech recognition! Does it convert the audio to spectral images and do it that way?

I wonder how good the results are compared to to Google's stack - http://googleresearch.blogspot.co.uk/2015/08/the-neural-netw... - check arxiv for more info. Haven't seen any open source implementations yet


the law of good-enough will strike again.


Two older discussions (1.5-2 years ago), has anyone followed the project and can summarize how it has evolved?

https://news.ycombinator.com/item?id=8206738

https://news.ycombinator.com/item?id=7546858


Was just reading this about Mark Zuckerberg wanting to build his own Jarvis... and then I saw this. Realise one is technical and the other is not, just thought I would share this moment of coincidence that lightly tickled me for a moment.

http://techcrunch.com/2016/01/03/iron-zuck/?ncid=rss


Does anyone know whats the best "method" te convert voice into commands? I previously ran into https://wit.ai/ which seems pretty good.

But i don't have any hands-on experience.

Edit: I just noticed that Jasper can use wit.ai under the hood ( http://jasperproject.github.io/documentation/configuration/#... ).


I won't comment on the "best" here as they all have their respective differences, but consider signing up for the Nuance Mix developer platform Beta: https://developer.nuance.com/mix ; we're offering full control custom ASR+NLU.



The module priority defining is something which makes it a little more smarter than normal voice command options.

The example given in the documentation is - "What's on Hacker News?" and "What's on the news?"

Jasper can differentiate the responses for both of the commands by "PRIORITY" attribute.

Has anyone worked on it in personal? I am curious.


This looks pretty cool. I just recently got an Amazon Echo. Could something similar be done using Amazon's public dev tools or does anyone have experience with this [0]?

[0] - https://developer.amazon.com/public/solutions/alexa/alexa-sk...


Looks very simple to create your own module. Would be great to see it work independent of raspberrypi. At least so you can run in text mode on your laptop to give it a whirl. Also would be awesome to see some sort of jasper-cluster where you could have it running on multiple devices that can push updates to all devices when modules are added/configurations changed.


I think the next step in coding will be VR to visualise objects (classes, modules, methods, lambdas, basic language concepts) and speech to command the text editor to edit line numbers or swap lines or indent, change words etc... basically things vim can do but with voice.


It would be cool if you could make a HTTP request with the voice command as text , then have it speak the result. Would make it easy to write services for it.

I would be very interested in a product like this with the above functionality.


Did anyone try installing it on Linux? If so I'd love some pointers, I tried unsuccessfully tonight.

It seems it was build solely for Pi, I don't think there will be issues though since it is written in Python


The recommended mic is probably ok, but for far field audio, or even slightly noisy environments, a microphone array enabling beamforming (like on Amazon Echo) would be much better


Some site design advice: The Home link didn't tell me anything about what Jasper could do. I clicked About, and it stayed on the same page, jumped down, and also didn't tell me what it could do.

I had to get to Documentation, then Usage, then scroll down to find out what Jasper could do. This is critical information to an assistant, you've gotta make that more available to us.

You also should avoid advertising features not available yet. "Control your home" is mentioned, but no home automation devices are listed as modules. Anything that can control your home should have a list of the hardware it works with readily available.


Is this a self-hosted solution or anything is being sent to the "cloud"? Strangely site's FAQ did not answer this.


It can be entirely self hosted, though there are options that use voice services from AT&T and Google. And of course any plugins that use cloud services would of course send data to the cloud. But it can be configured self contained.


Ok, rereading FAQ found "None of this information ever leaves the Pi" for information in my profile. Still interesting what cloud services are being used, but this probably is on a per-module basis.


It certainly seems to be per-module, for example: http://jasperproject.github.io/documentation/configuration/#...


This looks well timed with Mark Zuckerbergs new years resolution for 2016. To build an AI to assist around the house.


If you're interested in this stuff (AI, Cybernetics, ML, etc) come jump on Playa with me. http://getplaya.com/


Just a tip.

I do not understand what kind and what for your service is. Poor presentation in your website.


It's basically the platform for a "Jarvis" type interface.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: