Hacker News new | past | comments | ask | show | jobs | submit login
Leon: Open-source, self-hosted personal assistant (github.com/leon-ai)
402 points by thunderbong on Sept 4, 2022 | hide | past | favorite | 123 comments



I started to build Leon in late 2017 on my free time. There is no big organization behind Leon AI, just me and more with the community later.

At the moment the whole core is being rewritten, many new features are coming up. Please do know that the main strength of Leon is about the core and not the skills yet. Once the official release shipped, with the community we will all focus on building new skills by extending from the core and let Leon makes meaningful things. Among the upcoming things, new offline STT/TTS solutions will take place.

It's going to take some time to build a respectable personal assistant that respects our privacy, but we are getting there.

If you have any question or just wanna chat, feel free to join us on Discord.


Thanks for your work on this. Maybe it’s time to introduce a paid license to support the project and even offer consulting hours to get professional support?


Thanks for saying. I have some ideas in mind, like creating some courses specific to Leon. In the meantime, it is possible to support via GitHub Sponsors: http://sponsor.getleon.ai


This probably was submitted due to their recent blog activity, in which they talk about their NLP improvements: https://blog.getleon.ai/a-much-better-nlp-and-future-1-0-0-b...


Yes it is. Thanks for pointing that out.


Why would you take this over Mycroft? Is there a difference?


Do you have a problem with that, Dave.


I don't know how well Leon works but Mycroft is not running locally on your own server. It's still a cloud service. I believe it can be run locally but it's very complex and not well supported. At least this was the state last time I looked at it.

For this reason (and their failed Kickstarter) I've never really been into Mycroft. After all it's kinda the same as Alexa/Siri just a slightly more trustworthy party when they say they're not using your data.

I'm just not interested in alternatives until they can run fully locally (not necessarily on device, on a local server or docker is fine)

If this Leon does what it says on the tin it might be just the thing for me.


Mycroft does run on your own server.

It allows you to use the Google text to speech API for more natural voices, which is a cloud product.

It is a yaml config to use on device tts instead.

Install is flashing an sd card.


I thought the detection engine was too complex and not fully open? I remember them looking at Mozilla DeepSpeech or something but that being one of those never ending projects.

I was asking about self hosting on their forum and they were like "just use our hosted service, you can trust us". That's not different enough from Apple/Amazon/Google/Microsoft so I dropped it.

But like I said this was a few years ago. Never looked again since.


Leon can fully run on your own device, it includes the NLP, STT, TTS and so on. More offline STT/TTS solutions are coming up though.



What are some obvious use cases for something like this? I really have trouble imagining what you would use it for.


Most of my 30 something friends have ditched their Alexa's and Google assistant devices because of concerns around security and privacy. The ability to say "set a timer for 30 minutes" is nice, but not enough to invite always on microphones into private spaces


> my 30 something friends

That's a lot of friends! Good job.


I naturally parsed it as he has some friends who are around 30 years old. I am possibly wrong.


I think this is the right interpretation and you are replying to a joke taking advantage of this ambiguity. Your sibling nell is probably (jokingly) meaning that this is a joke their dad could have made.


dad, is that you?


Side-threads like this are why I read HN.


Try Reddit. I think you might like it.


Too bad they probably all still have phones then.


Wait what. When I hold the main button on android and say exactly that, it means I have an always on microphone??


The "always-on" microphone feature is used to listen for "Okay Google", if you have that enabled.


The wake word is generally processed locally but anything subsequent is usually sent to cloud


I see. So you can still use these assistants without the privacy concern then. So why abandon?


Disclaimer: I don't have one of these and don't particularly want one. The privacy concerns kinda creep me out. That said, I've been to friends homes and seen them use it.

As far as I can tell the primary use case for these things is to be <random place in your home> and then just say out loud "Alexa, set a timer for ....". I've heard that you can order stuff from Amazon also using your voice. I think a third use case starts with "Alexa, tell me a joke".

I'm assuming that there's other things you can do with these (and would love to know, if anyone's willing to share).

So - if the solution to the privacy concern is to walk over to the device and push a button then that seems to remove most of the usefulness of the device. Speaking as someone who doesn't want one / doesn't have one of these things I can totally see how eliminating the "voice control from anywhere" feature leads to opting out of it.

(When I'm walking around my home I've always got my phone on me (which, to be fair, has a bunch of privacy concerns too) so I can more easily set a timer / buy something on Amazon / Google for jokes by fishing my phone out of my pocket and then using that, rather than walking over to push a button.)

(The "what else can you do with these" is a genuine question - if people are comfortable sharing I'd love to hear what you can do with these)


There's a lot of little use cases. Hand free cooking stuff (set a timer, home many tablespoons in a pint). Device control, faster to turn off a TV with voice than dig for the remote, or play a playlist/skip songs. None of those really save that much more time than the old fashion way, so concerns about privacy mean things get done the old way.

I know some people like having them in if they have frailty or mobility concerns, which is probably the only really new usecase.


Definitely. We can think of connecting such basic skills with more advanced skills too. The fact that Leon has a modular architecture makes it very flexible. We just need to let our imagination drives us.


I have Alexa, although I'm going to remove her and replace with a locally-hosted thing.

I've tied mine in with home automation stuff. So I can turn on and off lights using voice, even if I'm not at home. I sometimes forget to turn off my workshop and I can do that from anywhere.

I'd like to figure up a way to reset my internet, because I access cameras, and it goes out sometimes. I'm very sure this can be done.

I also use her for weather, although I'm annoyed about some of her limitations there, and I intend to get exactly what I want by coding. I want to be able to ask things like "when will it rain next", but Alexa can't do that.

She can also do reminders in a week or whatever, I use that some. And I ask very simple questions that she can query Google for, but honestly she's terrible at it.

I also think she's too verbose, even with verbosity turned down. She just goes on and on sometimes workout being asked--like instructions on resetting the routers if she can't contact Amazon.

I also try Google assistant and Bixby. I use my watch for a lot of the things you said you use your phone for.

Anyway I'm not happy with any of them. I plan to work a bunch on some skills as my next project, after the current one is done.


> I'd like to figure up a way to reset my internet, because I access cameras, and it goes out sometimes. I'm very sure this can be done.

Just get one of those controllable outlets that can be controlled locally, it'll allow you to power cycle your modem/router/wifi AP. Shelly.cloud makes one and it can be controlled through REST [0] with a call like http://192.168.0.40/relay/0?turn=on

[0] https://shelly-api-docs.shelly.cloud/gen2/ComponentsAndServi...


I noticed that you referred to an AI with a disembodied voice as "her". Not that unusual, as sailors (even today) refer to ships as "her", and Davy Crockett named his rifle "Old Betsy".

But, it does beg the question: will you feel bad when you 'remove' her? (Will she? Shouldn't you ask?)


I do like to use the "her" pronoun for prized possessions. Favorite car or whatever.

I will not feel bad for her. I don't feel like I should ask.


I have some skills idea for Leon after the official release. The idea is to centralize all of them into your own hardware that you control. Then Leon can be seen as a "second brain".

Let's say we build a budget tracker, then we can ask Leon "How much did I spend last month in groceries?". Mini apps are coming up on Leon, feel free to read the latest blog post, I'm sharing some thoughts on it.

We can also think of a tracker skill where Leon can understand your location habits. Like if you spend 10+ minutes at a specific place, then you can flag it as "gym". Then when you go to this place again, Leon will trigger a counter and count the time you are at the gym. The next week you can ask "Leon, how long time did I spend at the gym last week?". It can be gym, office, or anything you could think of.

Such use cases will be possible to make after the official release and the mobile client.


I use a Google Home Mini.

Half the reason I originally bought it was to simply be able to turn the light and fan on and off without getting out of bed.

The other half was music. Google Play Music was a godsend for a long time before they killed it. I can't stand Youtube Music and don't pay for other services, so I just don't use voice to listen to music anymore. Actually pretty angry and haven't given Alphabet a single cent since.

I ask it the weather every day. It answers 'when will it rain' with an hour and/or day of the week it might next.

I used to use it for relaxing sounds like rain, but one day they replaced the realistic rain sound with one that sounds to me like generated white noise only somewhat resembling rain, so it kind of annoys me now.

I constantly set alarms and timers for various reasons. Reminders and calendar events also, which sync with google services of course so I get them on my phone.

It can make notes. Any time I think of something in the shower and can't write it down I consider buying another one for the bathroom.

I can ask it where my phone is and it'll make it ring.

General queries are no more or less as good as what searching Google gives you at the top. Still useful when wondering something. I can ask it to define words or look for synonyms when I'm writing, without taking my mind away from the text. Or random stuff like 'what day of the week will September 22nd fall on,' 'how many days until Easter,' etc.

I frequently use it as a calculator. Easier to just speak a lengthy list of numbers than to type them all.

The most important thing about all of this is I don't have to move a muscle, and don't have to avert my eyes from what I'm focused on. Whether I'm passing out in bed, have my hands full while late for an appointment, or working hard at my PC, that's invaluable to me. Maybe not as important to everyone though.

Do I recommend Google's assistant specifically? Not exactly, but I don't like the other options either. Alexa will constantly break my train of thought by advertising what it can do with suggestions and whatnot, which is a main reason I don't use it, but my housemate doesn't mind. Other assistants just don't seem as polished and useful. Google's interoperability with my phone is a big reason I use it.

For $25 just get one for your desk and/or bedroom. There's still a lot of room to grow in this space before there's a better option without privacy concerns.


This is a good list of thoughtful, interesting uses.

I can totally empathize with not wanting to get out of bed in the morning, and getting it to ring my phone would actually be really useful. Like, embarrassingly useful :)

Thank you for sharing!


A smartwatch can get you something like that without always on wake-word.


That's just one of several privacy concerns. It's possible to parse voice locally, as in TFA and eg Mycroft('s open source, self-hosted version, anyway), but for "some reason" mainstream assistants don't do it. Sure, you can hold a button, and Google will only hear about your timer request and nothing more, but some people find the idea of Google knowing when you're setting timers to be upsetting. Or at least worthy of avoiding.


They do actually do on device processing now…

Siri, G Assistant, Alexa, Bixby, Sonos all perform at least some locally. It seems the major issue is large dictionaries (eg music libraries) or complex queries. Most had an article about how basic features (times, smart home) work entirely on device.

[1] https://www.theverge.com/2021/6/7/22523438/apple-iphone-siri...

[2] https://appleinsider.com/articles/19/05/07/google-assistant-...

[3] https://www.amazon.science/blog/on-device-speech-processing-...

[4] https://www.xda-developers.com/samsung-bixby-will-speed-up-r...

[5] https://www.engadget.com/sonos-voice-control-music-assistant...


When my internet is out, Alexa won't even listen to me (besides her name)


Same with Siri (Homekit Mini). It won't even answer a question that doesn't need the internet like the time.


Do any of them have a setting to enforce on device processing only?


Sonos claims to be on device only. It’s not a general purpose assistant though


If I have to hit a button to activate the voice assistant, that removes use-cases like "my hands are full but I want to turn on the smart lights" and "I'm cooking and want a timer, but my hands are too dirty." These are the use-cases where the tool really shines because it has no competition.

Without such a use-case, the tool gets put in the back-of-mind. Sure it might be marginally easier to use than swiping and poking, but my mental model of using the phone is already swiping and poking.


When I used to use Alexa there were a lot of unexpected things that came up and I used it atleast 15+ times a day. Things like what’s the weather like, what time is it, set a timer, unit conversions, turn on/off lights or appliances, when is it going to rain, factual google information, when was someone born, what’s the sports scores, I like trivia so it could ask you questions while you were lounging around etc. and so many more things to be honest. I stoped using it like 3 years ago though, can’t have an always on speaker in my house that sends all its info back to Amazon, no matter what assurances they give me.


Personal assistants can really make a difference to people with disabilities


That's an excellent point, maybe something that should be addressed on the site.


Yes. I still did not figured out how exactly, any thought?


https://www.cuinsight.com/how-voice-assistants-can-benefit-p...

People with limited or no use of hands may use voice assistants, as may people with limited or no sight


I have both Google and Apple smart assistants in and around my home. I estimate 80% of that usage is to check the weather or set a timer. It would probably be great to not depend on some inscrutable listening device to do so.


My main usages for a voice assistant is adding items to the grocery list when I see something is missing in the kitchen, playing music, asking the weather, and timers.


Sailboat - My wife and I want a solution for our sailboat that allows for intelligent vocal control of ship systems. Quickly launching a series of activities with simple verbal commands or receiving verbal updates on conditions would be amazing. Easy things like automated voice capture to text for log books and maintenance would be helpful. "Oil level at 100%, Oil quality good, coolant level 100%, coolant quality good. Schedule oil change at 1500 engine hours." All of this must be done WITHOUT a constant internet connection.


How does this compare to some other open source, self hosted assistants?

rhasspy comes to mind, but i believe Mycroft can be self-hosted too.


I've tried to self host myfcroft but after tinkering with it for about 3-4 hours I gave up. They provide little to no information for self hosting. Sure, you can use their pre-built docker image but you still have to create an account in their cloud and connect to it. And their privacy policy is not so great imo.


I did figure out how to do it at one point. I think you had to remove some default config from a JSON file to stop it connecting to their cloud. You still had to query Google directly for the STT though


I believe they hired the rhasspy dev though and he's been trying to change this for the better.

We'll see if he gets anywhere...


Without detracting from OP's efforts, also see Project Alice. https://github.com/project-alice-assistant/ProjectAlice

It is self hosted, offline by default, with options to use various ASR and TTS engines, some online, depending on your own privacy, performance or quality choices. It's quite mature and the maintainers are aiming for a 1 0.0 version release. I have been running it as the primary voice interface to my home automation system for years.

As someone else said elsewhere, there are a few assistants around now. Perhaps there is some benefit for sharing of resources too, as all struggle for contributors.


I think this would be awesome to develop a Tesla "skill" for. Since it's self-hosted, I don't have to worry about sharing my Tesla credentials with a 3rd party. I tried to create a quick and dirty Alexa skill to do things like check on the state of charge and lock the car, but didn't want to climb the mountain of storing "secrets" in Alexa/AWS. Keeping them local in Leon sounds much better.

I would also like to use Alexa to control my doors (garage, house, etc.) but making that available to world doesn't sit right with me. But with Leon I don't have to worry about someone hacking my Alexa account, or even sharing my credentials with a 3rd party "skill".



Jasper can be considered as his dad.


Can it be shared by a group of people? Is it possible to extend it with custom actions like answering questions from local data sources or calling an internal API?


Yes it is possible to create skills actions based on the core. It is the main strength of Leon. For the sharing part, I will develop a platform to centralize skills so it will be easy to share skills with the community. A bit similar as the npm registry.


FWIW the thread about my cheap & cheerful effort from a while back:

https://news.ycombinator.com/item?id=25718392

Nice to see all the other efforts. There are a decent bunch now. Go, as they say, team :)


Hi, is this still active? I'm looking to mess around with one of these. I'm particularly interested in programming some specific weather Services, like telling me the next time rain is forecasted.

I see you mention weather in the README under Services, but it is not listed as a current service module. Is that something you need help with?


Simple weather service added:

  https://github.com/iamsrp/dexter/blob/master/service/weather.py
Feel free to hack on it, if you're still game.


Still active but rather sporadic. (Time is very much at a premium these days alas.)

I never quite got around to doing a weather module but, if you fancy trying to put one together, it should not be too hard by cargo-culting from the other services. If things are confusing then open a problem request and I will look to make them less so.


Checked out the repository's homepage. Is it just me or do other people hate the buzzword of "virtual brain"?

Seems like everything is being described as that or a "second brain" these days.


Those are beautiful animations / visual graphics on the landing page.

Awesome job!


Thank you! I try to express some thoughts by paying attention to the visual aspect of Leon.


Meanwhile, I have Rhasspy set up to do basic stuff, but lack the availability of Pi Zero 2 W’s to use as satellites :(


Do the microphones work well in a sat setup on rhasspy?


I have a PS Eye cam as mic for my single Pi 3 satellite, it works pretty well when there is no music playing. It does wakeword detection locally, and on recognition sends the audio stream via MQTT to the PI 4 base, which does the heavy lifting.


That sounds promising


Love that you have a Gitpod setup - make trying out and contributing to open-source projects so much easier!


Uh-oh, did you have to call it Leon?


Does it sound good thought? In the future I may allow the hotword customization. And a female version may arrive too.


What's wrong with Leon?


Thanks. Looks awesome.


self-hosted?

"You are in control of your data. Leon lives on your server"

Speech-to-Text: Google Cloud, IBM Watson, Coqui STT, Alibaba Cloud (coming soon), Microsoft Azure (coming soon)

So the AI assistant lives on my server, but if I want to have good quality speech recognition, everything I say is sent through a US cloud service. The only offline option, Coqui has a 7.5% word error rate [1] on LibriSpeech test clean, which is worse than Mozilla Deepspeech 2 from 2016 [2]. State of the art would be around 1.4% [3], meaning 81% less errors than Coqui.

[1] https://coqui.ai/blog/stt/deepspeech-0-6-speech-to-text-engi... [2] https://paperswithcode.com/paper/deep-speech-2-end-to-end-sp... [3] https://paperswithcode.com/paper/pushing-the-limits-of-semi-...


They might be interested in integrating Vosk, it's a speech-to-text engine that is just a shared library (.so file on Linux) and comes with API support for a variety of languages:

https://alphacephei.com/vosk/

https://github.com/alphacep/vosk-api

Still, I've found that the Big players have much better recognition models, and the post-processing that I assume they do (grammatical, maybe syntactical inferences that improve the end result) are probably much more powerful too.


Yes Vosk will definitely be part of Leon when the focus will be on implementing new voice solutions.



There aren't any good speech-to-text models that are open source. If you think there is one, please reply with a link. The cloud ones are far superior.


I fully agree and I would love to change that. I mean my company already funded work in that direction... but I sadly predict that we won't have good open source real-time speech recognition anytime soon.

My napkin calculation is that you need about $100k for each attempt at training a Conformer-Transducer. There's a pre-trained NVIDIA model but it appears to have a bad choice of hyperparameters and performance is much worse than what one would expect based on research literature and I believe you're not allowed to execute it on non-NVIDIA hardware.

A skilled team will maybe need 5-10 attempts for discovering a good set of hyperparameters. So the price to create the AI model will likely be around $1 mio. But if you have such large expenses, you have to plan things as a business venture. And that means an open source release is highly unlikely.

(unless, of course, someone like stability.ai is happy to bankroll 200 A100 GPUs for a few months each per target language. In that case, please contact me)


Mozilla DeepSpeech is not bad, and supported by Mycroft: https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customiz...

You can contribute to improving it here, too: https://commonvoice.mozilla.org/en


It is terrible. Try it against any of the cloud providers. The problem is the lack of data.


Leon was using Mozilla DeepSpeech before. But then we moved to Coqui STT instead.


Right, and that's fine. The point is that if that's the case, it's incredibly disingenuous to say that you are in control of your own data if you use Leon.


You can choose what voice solutions you want to use. So you can choose the ones that run offline (which are the default ones).


That doesn't make it right to lie.


Where is the lie here? You are free to pick voice solutions, and the default ones are the ones that run offline. Your call.


I don’t think the open source ones need to be superior to the cloud ones, or even as good. If they come close enough for the most common, let’s say, 80% of use cases, that’s good enough for many people.


Currently, they are like 5x the error rate, which is significantly worse.


I don't find this such a big deal. Don't use the speech to text and just write messages to it instead. You still have way more control of your data using Leon compared to the commercial funded alternatives.


Looks like quite a lot of marketing put into this open-source project. Heavyweight glossy website with trendy TLD, emojis everywhere. Is this kind of thing typical in the JS world in particular? Seriously asking.

I'm trying to figure out what they are selling me, or what megacorp they are associated with, but I don't see it yet.


And yet, I set out to find what this thing can do. I read the README.

  Today, the most interesting part is about his core and the way he can scale up. He is pretty young but can easily scale to have new features (skills). You can find what he is able to do by browsing the packages list.
  Sounds good for you? Then let's get started!
The packages list is a dead link. https://github.com/leon-ai/leon/tree/develop/packages


> The packages list is a dead link. https://github.com/leon-ai/leon/tree/develop/packages

From the blog ...

"As of now, 'module' and 'packages' no longer exist. Instead, they’ve been replaced by 'skills'."

New link is https://github.com/leon-ai/leon/tree/develop/skills


All those folders just contain a single json file with the name of the skill category in it? I don't see any actual features?


Not all those folders. Try utilities or news or leon or games or social_communication. You could be forgiven for thinking it was all of them, though -- not having anything in weather or music_audio, for a moment I thought so too.


Right? I'm finding this problem everywhere. When checking out new software, it's becoming more and more difficult to determine what to do with "good looking marketing," and it nearly cuts perfectly in roughly 3 ways; you're likely either a dedicated whatever-size team making something great that happens to have good marketing; you're a small team pushing garbage and putting all your money in marketing, or you're a megacorp (e.g. likely not great)


This is overly suspicious. The guy made a nice effort and open sources it. Would you prefer he had it closed source or sold it?


Thanks.


> Is this kind of thing typical in the JS world in particular?

Yes, pretty common in the frontend world.


>Heavyweight glossy website with trendy TLD, emojis everywhere. Is this kind of thing typical in the JS world in particular?

yeah in frontend projects/dataviz stuff for sure


No mega corp, I'm just a passionate guy who spend my free time to work on Leon. Is there anything wrong with that?


It's a showcase for a front-end framework. See the link at the bottom. https://vercel.com/


Vercel is not a front-end framework. Also that's a sponsorship link.


Sure, but vercel has nothing to do with the UI look. It’s a framework for developing the overall application, not the components or designs.


Am I the only one freaked out by the focus on a Leon instance setup unnecessarily framed as being "born" in their documentation materials here?

> $ leon create birth

> At this stage, Leon is born and can already start to run via this command: [...]

It seems like an unnecessary anthropomorphization.


Well, I just think it is fun.


Does it make you twice before you kill Leon or its children?


kill -9 leon


Sure, anthropomorphization is unnecessary, but IMO it’s kinda cute. It made me happy.

I’m not equipped with the biology to give birth myself, so it was nice to be able to give birth virtually to a son. A son that could talk to me before he could walk. He hasn’t had much luck yet with walking.


"Leon uses AI concepts, which is cool." - somehow this really discouraged me from looking deeper into it.


It seems to be written by a young guy, English is his second language, and he's excited to be learning different technologies along the way.


Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead

https://news.ycombinator.com/newsguidelines.html


To be fair, that's not the most provocative thing in the article, so it's okay.


English is his second language, I'd cut him some slack and create a PR to fix it and help him instead of criticizing.


> somehow this really discouraged me from looking deeper into it.

I was also discouraged by that remark. But I've never had (or even used) Alexa or Siri or whatever; they're a cool idea, but I'm not prepared to rely on either of those sevice providers. So I'm interested.


How can I improve it?


But what can it actually do?


At the moment, the main focus in on the core.

"Once the official release shipped, the big focus will be to build many skills along with the community and cover most of the basic cases and beyond of existing closed source assistants"

You can find more here: https://blog.getleon.ai/a-much-better-nlp-and-future-1-0-0-b... and there: https://blog.getleon.ai/a-much-better-nlp-and-future-1-0-0-b...


Scale!


Its still not clear what you can do with it.

There are no examples at all.

Can it(he) show me the weather ? Execute a shell script? For example to update a server ? Trigger a webhook ?

A lot of details are lacking.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: