I started to build Leon in late 2017 on my free time. There is no big organization behind Leon AI, just me and more with the community later.
At the moment the whole core is being rewritten, many new features are coming up. Please do know that the main strength of Leon is about the core and not the skills yet. Once the official release shipped, with the community we will all focus on building new skills by extending from the core and let Leon makes meaningful things.
Among the upcoming things, new offline STT/TTS solutions will take place.
It's going to take some time to build a respectable personal assistant that respects our privacy, but we are getting there.
If you have any question or just wanna chat, feel free to join us on Discord.
Thanks for your work on this. Maybe it’s time to introduce a paid license to support the project and even offer consulting hours to get professional support?
Thanks for saying. I have some ideas in mind, like creating some courses specific to Leon. In the meantime, it is possible to support via GitHub Sponsors: http://sponsor.getleon.ai
I don't know how well Leon works but Mycroft is not running locally on your own server. It's still a cloud service. I believe it can be run locally but it's very complex and not well supported. At least this was the state last time I looked at it.
For this reason (and their failed Kickstarter) I've never really been into Mycroft. After all it's kinda the same as Alexa/Siri just a slightly more trustworthy party when they say they're not using your data.
I'm just not interested in alternatives until they can run fully locally (not necessarily on device, on a local server or docker is fine)
If this Leon does what it says on the tin it might be just the thing for me.
I thought the detection engine was too complex and not fully open? I remember them looking at Mozilla DeepSpeech or something but that being one of those never ending projects.
I was asking about self hosting on their forum and they were like "just use our hosted service, you can trust us". That's not different enough from Apple/Amazon/Google/Microsoft so I dropped it.
But like I said this was a few years ago. Never looked again since.
Most of my 30 something friends have ditched their Alexa's and Google assistant devices because of concerns around security and privacy. The ability to say "set a timer for 30 minutes" is nice, but not enough to invite always on microphones into private spaces
I think this is the right interpretation and you are replying to a joke taking advantage of this ambiguity. Your sibling nell is probably (jokingly) meaning that this is a joke their dad could have made.
Disclaimer: I don't have one of these and don't particularly want one. The privacy concerns kinda creep me out. That said, I've been to friends homes and seen them use it.
As far as I can tell the primary use case for these things is to be <random place in your home> and then just say out loud "Alexa, set a timer for ....". I've heard that you can order stuff from Amazon also using your voice. I think a third use case starts with "Alexa, tell me a joke".
I'm assuming that there's other things you can do with these (and would love to know, if anyone's willing to share).
So - if the solution to the privacy concern is to walk over to the device and push a button then that seems to remove most of the usefulness of the device.
Speaking as someone who doesn't want one / doesn't have one of these things I can totally see how eliminating the "voice control from anywhere" feature leads to opting out of it.
(When I'm walking around my home I've always got my phone on me (which, to be fair, has a bunch of privacy concerns too) so I can more easily set a timer / buy something on Amazon / Google for jokes by fishing my phone out of my pocket and then using that, rather than walking over to push a button.)
(The "what else can you do with these" is a genuine question - if people are comfortable sharing I'd love to hear what you can do with these)
There's a lot of little use cases. Hand free cooking stuff (set a timer, home many tablespoons in a pint). Device control, faster to turn off a TV with voice than dig for the remote, or play a playlist/skip songs. None of those really save that much more time than the old fashion way, so concerns about privacy mean things get done the old way.
I know some people like having them in if they have frailty or mobility concerns, which is probably the only really new usecase.
Definitely. We can think of connecting such basic skills with more advanced skills too. The fact that Leon has a modular architecture makes it very flexible. We just need to let our imagination drives us.
I have Alexa, although I'm going to remove her and replace with a locally-hosted thing.
I've tied mine in with home automation stuff. So I can turn on and off lights using voice, even if I'm not at home. I sometimes forget to turn off my workshop and I can do that from anywhere.
I'd like to figure up a way to reset my internet, because I access cameras, and it goes out sometimes. I'm very sure this can be done.
I also use her for weather, although I'm annoyed about some of her limitations there, and I intend to get exactly what I want by coding. I want to be able to ask things like "when will it rain next", but Alexa can't do that.
She can also do reminders in a week or whatever, I use that some. And I ask very simple questions that she can query Google for, but honestly she's terrible at it.
I also think she's too verbose, even with verbosity turned down. She just goes on and on sometimes workout being asked--like instructions on resetting the routers if she can't contact Amazon.
I also try Google assistant and Bixby. I use my watch for a lot of the things you said you use your phone for.
Anyway I'm not happy with any of them. I plan to work a bunch on some skills as my next project, after the current one is done.
> I'd like to figure up a way to reset my internet, because I access cameras, and it goes out sometimes. I'm very sure this can be done.
Just get one of those controllable outlets that can be controlled locally, it'll allow you to power cycle your modem/router/wifi AP. Shelly.cloud makes one and it can be controlled through REST [0] with a call like http://192.168.0.40/relay/0?turn=on
I noticed that you referred to an AI with a disembodied voice as "her". Not that unusual, as sailors (even today) refer to ships as "her", and Davy Crockett named his rifle "Old Betsy".
But, it does beg the question: will you feel bad when you 'remove' her? (Will she? Shouldn't you ask?)
I have some skills idea for Leon after the official release. The idea is to centralize all of them into your own hardware that you control. Then Leon can be seen as a "second brain".
Let's say we build a budget tracker, then we can ask Leon "How much did I spend last month in groceries?". Mini apps are coming up on Leon, feel free to read the latest blog post, I'm sharing some thoughts on it.
We can also think of a tracker skill where Leon can understand your location habits. Like if you spend 10+ minutes at a specific place, then you can flag it as "gym". Then when you go to this place again, Leon will trigger a counter and count the time you are at the gym. The next week you can ask "Leon, how long time did I spend at the gym last week?". It can be gym, office, or anything you could think of.
Such use cases will be possible to make after the official release and the mobile client.
Half the reason I originally bought it was to simply be able to turn the light and fan on and off without getting out of bed.
The other half was music. Google Play Music was a godsend for a long time before they killed it. I can't stand Youtube Music and don't pay for other services, so I just don't use voice to listen to music anymore. Actually pretty angry and haven't given Alphabet a single cent since.
I ask it the weather every day. It answers 'when will it rain' with an hour and/or day of the week it might next.
I used to use it for relaxing sounds like rain, but one day they replaced the realistic rain sound with one that sounds to me like generated white noise only somewhat resembling rain, so it kind of annoys me now.
I constantly set alarms and timers for various reasons. Reminders and calendar events also, which sync with google services of course so I get them on my phone.
It can make notes. Any time I think of something in the shower and can't write it down I consider buying another one for the bathroom.
I can ask it where my phone is and it'll make it ring.
General queries are no more or less as good as what searching Google gives you at the top. Still useful when wondering something. I can ask it to define words or look for synonyms when I'm writing, without taking my mind away from the text. Or random stuff like 'what day of the week will September 22nd fall on,' 'how many days until Easter,' etc.
I frequently use it as a calculator. Easier to just speak a lengthy list of numbers than to type them all.
The most important thing about all of this is I don't have to move a muscle, and don't have to avert my eyes from what I'm focused on. Whether I'm passing out in bed, have my hands full while late for an appointment, or working hard at my PC, that's invaluable to me. Maybe not as important to everyone though.
Do I recommend Google's assistant specifically? Not exactly, but I don't like the other options either. Alexa will constantly break my train of thought by advertising what it can do with suggestions and whatnot, which is a main reason I don't use it, but my housemate doesn't mind. Other assistants just don't seem as polished and useful. Google's interoperability with my phone is a big reason I use it.
For $25 just get one for your desk and/or bedroom. There's still a lot of room to grow in this space before there's a better option without privacy concerns.
This is a good list of thoughtful, interesting uses.
I can totally empathize with not wanting to get out of bed in the morning, and getting it to ring my phone would actually be really useful. Like, embarrassingly useful :)
That's just one of several privacy concerns. It's possible to parse voice locally, as in TFA and eg Mycroft('s open source, self-hosted version, anyway), but for "some reason" mainstream assistants don't do it. Sure, you can hold a button, and Google will only hear about your timer request and nothing more, but some people find the idea of Google knowing when you're setting timers to be upsetting. Or at least worthy of avoiding.
Siri, G Assistant, Alexa, Bixby, Sonos all perform at least some locally. It seems the major issue is large dictionaries (eg music libraries) or complex queries. Most had an article about how basic features (times, smart home) work entirely on device.
If I have to hit a button to activate the voice assistant, that removes use-cases like "my hands are full but I want to turn on the smart lights" and "I'm cooking and want a timer, but my hands are too dirty." These are the use-cases where the tool really shines because it has no competition.
Without such a use-case, the tool gets put in the back-of-mind. Sure it might be marginally easier to use than swiping and poking, but my mental model of using the phone is already swiping and poking.
When I used to use Alexa there were a lot of unexpected things that came up and I used it atleast 15+ times a day. Things like what’s the weather like, what time is it, set a timer, unit conversions, turn on/off lights or appliances, when is it going to rain, factual google information, when was someone born, what’s the sports scores, I like trivia so it could ask you questions while you were lounging around etc. and so many more things to be honest. I stoped using it like 3 years ago though, can’t have an always on speaker in my house that sends all its info back to Amazon, no matter what assurances they give me.
I have both Google and Apple smart assistants in and around my home. I estimate 80% of that usage is to check the weather or set a timer. It would probably be great to not depend on some inscrutable listening device to do so.
My main usages for a voice assistant is adding items to the grocery list when I see something is missing in the kitchen, playing music, asking the weather, and timers.
Sailboat - My wife and I want a solution for our sailboat that allows for intelligent vocal control of ship systems. Quickly launching a series of activities with simple verbal commands or receiving verbal updates on conditions would be amazing. Easy things like automated voice capture to text for log books and maintenance would be helpful. "Oil level at 100%, Oil quality good, coolant level 100%, coolant quality good. Schedule oil change at 1500 engine hours." All of this must be done WITHOUT a constant internet connection.
I've tried to self host myfcroft but after tinkering with it for about 3-4 hours I gave up. They provide little to no information for self hosting. Sure, you can use their pre-built docker image but you still have to create an account in their cloud and connect to it. And their privacy policy is not so great imo.
I did figure out how to do it at one point. I think you had to remove some default config from a JSON file to stop it connecting to their cloud. You still had to query Google directly for the STT though
It is self hosted, offline by default, with options to use various ASR and TTS engines, some online, depending on your own privacy, performance or quality choices. It's quite mature and the maintainers are aiming for a 1 0.0 version release. I have been running it as the primary voice interface to my home automation system for years.
As someone else said elsewhere, there are a few assistants around now. Perhaps there is some benefit for sharing of resources too, as all struggle for contributors.
I think this would be awesome to develop a Tesla "skill" for. Since it's self-hosted, I don't have to worry about sharing my Tesla credentials with a 3rd party. I tried to create a quick and dirty Alexa skill to do things like check on the state of charge and lock the car, but didn't want to climb the mountain of storing "secrets" in Alexa/AWS. Keeping them local in Leon sounds much better.
I would also like to use Alexa to control my doors (garage, house, etc.) but making that available to world doesn't sit right with me. But with Leon I don't have to worry about someone hacking my Alexa account, or even sharing my credentials with a 3rd party "skill".
Can it be shared by a group of people? Is it possible to extend it with custom actions like answering questions from local data sources or calling an internal API?
Yes it is possible to create skills actions based on the core. It is the main strength of Leon.
For the sharing part, I will develop a platform to centralize skills so it will be easy to share skills with the community. A bit similar as the npm registry.
Hi, is this still active? I'm looking to mess around with one of these. I'm particularly interested in programming some specific weather Services, like telling me the next time rain is forecasted.
I see you mention weather in the README under Services, but it is not listed as a current service module. Is that something you need help with?
Still active but rather sporadic. (Time is very much at a premium these days alas.)
I never quite got around to doing a weather module but, if you fancy trying to put one together, it should not be too hard by cargo-culting from the other services. If things are confusing then open a problem request and I will look to make them less so.
I have a PS Eye cam as mic for my single Pi 3 satellite, it works pretty well when there is no music playing. It does wakeword detection locally, and on recognition sends the audio stream via MQTT to the PI 4 base, which does the heavy lifting.
"You are in control of your data. Leon lives on your server"
Speech-to-Text: Google Cloud, IBM Watson, Coqui STT, Alibaba Cloud (coming soon), Microsoft Azure (coming soon)
So the AI assistant lives on my server, but if I want to have good quality speech recognition, everything I say is sent through a US cloud service. The only offline option, Coqui has a 7.5% word error rate [1] on LibriSpeech test clean, which is worse than Mozilla Deepspeech 2 from 2016 [2]. State of the art would be around 1.4% [3], meaning 81% less errors than Coqui.
They might be interested in integrating Vosk, it's a speech-to-text engine that is just a shared library (.so file on Linux) and comes with API support for a variety of languages:
Still, I've found that the Big players have much better recognition models, and the post-processing that I assume they do (grammatical, maybe syntactical inferences that improve the end result) are probably much more powerful too.
There aren't any good speech-to-text models that are open source. If you think there is one, please reply with a link. The cloud ones are far superior.
I fully agree and I would love to change that. I mean my company already funded work in that direction... but I sadly predict that we won't have good open source real-time speech recognition anytime soon.
My napkin calculation is that you need about $100k for each attempt at training a Conformer-Transducer. There's a pre-trained NVIDIA model but it appears to have a bad choice of hyperparameters and performance is much worse than what one would expect based on research literature and I believe you're not allowed to execute it on non-NVIDIA hardware.
A skilled team will maybe need 5-10 attempts for discovering a good set of hyperparameters. So the price to create the AI model will likely be around $1 mio. But if you have such large expenses, you have to plan things as a business venture. And that means an open source release is highly unlikely.
(unless, of course, someone like stability.ai is happy to bankroll 200 A100 GPUs for a few months each per target language. In that case, please contact me)
Right, and that's fine. The point is that if that's the case, it's incredibly disingenuous to say that you are in control of your own data if you use Leon.
I don’t think the open source ones need to be superior to the cloud ones, or even as good. If they come close enough for the most common, let’s say, 80% of use cases, that’s good enough for many people.
I don't find this such a big deal. Don't use the speech to text and just write messages to it instead. You still have way more control of your data using Leon compared to the commercial funded alternatives.
Looks like quite a lot of marketing put into this open-source project. Heavyweight glossy website with trendy TLD, emojis everywhere. Is this kind of thing typical in the JS world in particular? Seriously asking.
I'm trying to figure out what they are selling me, or what megacorp they are associated with, but I don't see it yet.
And yet, I set out to find what this thing can do. I read the README.
Today, the most interesting part is about his core and the way he can scale up. He is pretty young but can easily scale to have new features (skills). You can find what he is able to do by browsing the packages list.
Sounds good for you? Then let's get started!
Not all those folders. Try utilities or news or leon or games or social_communication. You could be forgiven for thinking it was all of them, though -- not having anything in weather or music_audio, for a moment I thought so too.
Right? I'm finding this problem everywhere. When checking out new software, it's becoming more and more difficult to determine what to do with "good looking marketing," and it nearly cuts perfectly in roughly 3 ways; you're likely either a dedicated whatever-size team making something great that happens to have good marketing; you're a small team pushing garbage and putting all your money in marketing, or you're a megacorp (e.g. likely not great)
Sure, anthropomorphization is unnecessary, but IMO it’s kinda cute. It made me happy.
I’m not equipped with the biology to give birth myself, so it was nice to be able to give birth virtually to a son. A son that could talk to me before he could walk. He hasn’t had much luck yet with walking.
> somehow this really discouraged me from looking deeper into it.
I was also discouraged by that remark. But I've never had (or even used) Alexa or Siri or whatever; they're a cool idea, but I'm not prepared to rely on either of those sevice providers. So I'm interested.
"Once the official release shipped, the big focus will be to build many skills along with the community and cover most of the basic cases and beyond of existing closed source assistants"
At the moment the whole core is being rewritten, many new features are coming up. Please do know that the main strength of Leon is about the core and not the skills yet. Once the official release shipped, with the community we will all focus on building new skills by extending from the core and let Leon makes meaningful things. Among the upcoming things, new offline STT/TTS solutions will take place.
It's going to take some time to build a respectable personal assistant that respects our privacy, but we are getting there.
If you have any question or just wanna chat, feel free to join us on Discord.