As far as I know Mycroft does its speech recognition in the cloud, so your voice has to leave your network unfortunately. This is the reason why I don't have a voice assistant yet. There was snips.ai which tried to solve this problem locally but they were acquired by Sonos.
I've been having great success building homebrew voice assistants with Rhasspy [0] and voice2json [1] (they're sister projects from the same maintainer). The code's not ready to share yet, but I have a voice-controlled Raspberry Pi music server in my car now, working 100% offline (got it ready just in time to never have to drive anywhere, lol).
The pieces of Snips that were OSS before Sonos bought and ditched the community are being leveraged along with new work in Project Alice https://github.com/project-Alice-assistant It continues to strive to be modular and offline. By design, the choice of online/offline elements including Google ASR and Amazon TTS along with corresponding quality and privacy tradeoffs is your choice. Come give a hand.
The default configuration is cloud based. However you do get to choose where your voice data goes. They also have a method to keep the TTS local and I haven't checked in on the project in a while but they were either working on or already had a local home server so you can make the whole system internal if you wish. I'd provide reference links but as noted site is down currently.
Edit: There are also several skills available that can point at full, local, downloads of wikipedia and the like. So if you prefer faster results and keeping some of your queries internal as a sort of hybrid thing that's an option as well.
They were definitely not the only ones, just the ones with the most aggressively spammy marketing team. HN mods actually stepped in once or twice about it.
Rhasspy is a really cool project that is compatible with a bunch of fully offline speech frameworks. PicoVoice is a super neat company too.
I have the (older, discontinued) 6+1 USB mic array, but I'm in general wary of the ReSpeaker stuff. I have the original ReSpeaker Core v1 hardware as well, and it's incredibly unstable. It comes with a tiny amount of flash that you're supposed to augment with a SD card (which then gets overlayfs'd onto the root filesystem), but the hardware is so buggy that running things from the SD card causes random SIGSEGV and SIGILL all over the place (and yes, I've painstakingly verified that this is not a software issue).
They essentially abandoned development of the v1 line (but appear to still be selling it!) for the v2 line, and I have a hard time trusting them.
Admittedly, I haven't done that much with the mic array I have, mainly because I got burned out dumping tens of hours into trying to get ReSpeaker Core v1 working (which I ended up trashing; it was that bad). I'd like to use it with a Raspberry Pi; hopefully that works out.
Seconding both ReSpeaker and Rhasspy/voice2json, I've been having great luck with that combo. The docs and example code for ReSpeaker are great, very easy to work with.
Oh, thanks for that, I've been looking for a good offline speech framework. Unfortunately, the hardest part for me is the microphone array, hopefully I can find something good there.
I haven't had time to play with Rhasspy, but the fact that you can plug and play components from different speech platforms is really promising, particularly when I want to interface it with my own assistant at some point.
Wake word detection, speech to text, intent recognition, etc. is all split out and able to be plugged into separately.
Keep in mind deepspeech is an implement of speech recognition algorithm... the results will largely depend on model used which typically require lots of manual labor
That is one of the toolkits used by mycroft. It isn't everything needed for an assistant but if you want to make one it is probably the best starting point.
I found about Mycroft through their Kickstarter project, Mycroft Mark II. What convinced me to back them was that they said that, theoretically, I should be able to self-host their server on my own hardware at home.
Unfortunately, they mismanaged Mark II so badly that I lost all faith in Mycroft in general.
I, too, saw the Indiegogo campaign and gladly handed over nearly $200 for the Mycroft Mark II.
Two and half years later, there has been little progress, with no telling when they'll actually deliver the device to backers.
It's one thing for a company to delay production to have lost money on this with no help from Indiegogo and a refusal from the company to refund backers, I have no hope for this product seeing the light of day any time soon.
That is the nature of crowdfunding. This is not a scam unless there is deliberate malice involved. A mere inability to deliver a product is not a scam. They cannot give you a refund as the money had been spent on development. If they had held your money so that they could now give you a refund, that would be extremely problematic.
I can't wait for a community-driven, open-source data model that we can take offline and plug into any software adapter. With a little version control and some voluntary data samples, a large enough community could get it going.
Well, there is some current research in Europe at least on how to make voice assistant technology more respecting of their user's privacy: https://www.compriseh2020.eu/ It's still far from market-ready, though...
The fact that they are online is stopping a bunch of people from shouting at nobody in particular to "dim the lights", "play this (local) playlist", "turn ac on", "play two random episodes of paw patrol"... Eg. me.
Many of those who are concerned with apps phoning home would be the ones to keep their data and media locally too!
Very practical, there are even dictation apps with quite good accuracy (and that is dictation, which is much harder because the sentences are free-form, not predefined) which work offline (like some offline Nuance Dragon apps).
One theory of security is that you can't ever be truly secure; but you can make it difficult/expensive for your security to be violated. If you lock the door, someone can always break a window, but that creates more noise and therefore more risk, making an intruder more likely to seek a softer target. A ten-foot wall can be breached, but it will likely dissuade anyone without an eleven-foot ladder.
The problem with mass surveillance is not that the NSA/etc can breach anyone they want; that's effectively always been true. It's that it's cheap to breach everyone, by default, all the time. Taking reasonable precautions such that you can't have your data swept up cheaply, instead requiring targeted human effort and/or a court order, does create a deterrent, and helps counteract the current imbalance of power between TLAs and We The People.
If you are a target of the NSA or some other state actor, then perhaps. If not, and you take a bit of care to disable the voice assistant on your devices (Siri, Google. etc), then you should be fine.
That just disables the voice assistant. It does nothing to ensure that your mic isn't on, listening, and recording at any time. I've had too many targeted ads start popping up after conversations with people to be convinced otherwise.
My pet theory is that this is actually a geo-location type of tracking, combined with the recency illusion.
Let's say person A was doing research on a product they're interested in buying. That product category is now associated with person A. Person A goes and visits person B. Because our phones track where we go by default for most people, now there's an association between Person A and Person B, and a potential likelihood that person B is also interested in that product. Thus the occasional ad shows up for the product Person A was researching. Since Person A and Person B are friends, it's possible Person A will talk about the product they're interested in. Person B now notices that they're getting ads for the product that Person A talked about.
it's observer bias, nothing more. test it out. pick a phrase, my spouse and i chose "snowmobile". For the next week, we had random conversations about how cool snowmobiles are, how badly we wanted one, what price we were willing to pay, financing, etc. we peppered our conversations with click bait, honestly. neither of us particularly ever had any interest whatsoever in snowmobiles, so we figured this would be a pretty ok test for anecdata.
at the end of the week, we both saw no increase or even a mention in our targeted adverts regarding snowmobiles. likely what is happening is that you searched something somewhere on your phone, then brought it up in conversation, then saw an increase in ads related to that subject.
> At the end of the week, we both saw no increase or even a mention in our targeted adverts regarding snowmobiles.
While I do respect you and your spouse for trying this experiment, it's entirely too small a sample size to really prove that this doesn't happen. There are a ton of variables involved, the biggest being which advertisers may or may not be listening in at any given time. It's hardly a controlled experiment, and I can't say I can put any stock in it.
It also does not prove that it is observer bias. I have had things pop up that I am %100 absolutely positive I never in any way shape or form looked into, yet there the ads were after a short conversations with others who did have an interest.
One thing i thought of and wanted to test: what if you yourself don't do any internet queries but someone on the same network as you does? I've had conversations with people, one time we were talking about a concert venue which i did not look up, but i did start to receive ads for that concert venue. It creeped me out initially, but I believe the other person on the same home network was googling or looking for information on that venue on their phone. I believe we were just associated together for advertisement purposes.
I’m pretty sure this happens. When I was doing work with NetSuite, my roommate (who has no interest in anything remotely related to enterprise software, and doesn’t have any other devices that could have been listening in) started getting ads for them on his laptop.
It makes sense for them to target by IP, so if one person in an office is researching the product they can start targeting the entire office and maybe get their ad in front of an exec who can make a purchasing decision.
Even if he doesn't there are snowmobile resorts. People pay good money to go to places with a lot of snow, rent a snowmobile and ride in the wilderness for a week.
Also a fair number of people live where a snowmobile isn't practical but have a cabin where they are and so they will be making weekend trips to where they are practical.
Your point stands though, anyone who lives in the wrong area is unlikely to be targeted if the only indication they might want a snowmobile is home conversion. If you are serious you will do other searches (tracked for sure), and have location history in a target area (might not be track able, these area often have poor cell coverage)
I take active steps to minimize when and how the microphones around me listen in. In no way do I subscribe to the idea that "we are already screwed".
Please do not take the worst possible interpretation of what people say on HN. It's in the guidelines and generally looked down upon. This is not reddit.
Wow. The patent is for "Using voice commands from a mobile device to remotely access and control a computer". That's so obvious it's laughable. How was this patent granted?
Once at a meeting to discuss the patent application the presenter mentioned how Jeff Bezos patented "one click buy" - i.e. Buying something with a single click. I asked why we didn't patent "2-n click buy" to prevent any competitors from selling anything.
Everyone laughed but I meant it seriously. I still don't get how "1 click buy" is a thing you can patent but you can't patent "2 click buy". The patent process is incredibly stupid and arbitrary as far as I can tell - at least when it comes to software.
Patents are not granted for what they achieve, but instead how they achieve it. So the patent is for a specific methodology of "using voice commands [...]". Not that the content of the patent is novel -- I have not looked deep into it personally. It is likely a general BS methodology that the troll never used for a real product.
Mycroft as software is used by a small group of users and seems pretty stable. More features are continuously added and the design principles look promising (open source, as private as possible).
The biggest problem is their hardware: they have a Mycroft v1, (to me personally) a prototype alike piece of hardware. There have been successful campaigns for a v2 release, with new hardware and an improved design.
However, they fail to work with reliable partners and there's still no working device which resembles the final production level. I have been a backer of the indigogo campaign but it's frustrating they postpone their Mycroft v2 every time again. I really hope the can deliver the device at some point, but they keep rewriting software and if they ship, the hardware is pretty outdated probably.
It's true v1 was more of a proof of concept device. The core hardware target has always been the raspberry pi family of devices since they are pretty well ubiquitous. 2-3 is the current set. Base software is linux so if your audio devices, mic and speaker, work with Linux then you're pretty well good to go. Most microphone hats for Pi are supported. I use the Google Voicekit AIY v1 with raspberry pi 3 b+. Works a treat.
As with most open source efforts(especially early on) there's a lot of tinkering and DIY at the get go and they've designed their product to be supportive of this. Their "retail" devices are, much like Googles intent with Nexus, intended to be a best possible target for other vendors to target including the DIY crowd. Whether that's the right way to come at it is open to debate but the premise that they don't have a set hardware target is at best misleading.
I think perhaps it’s hard to hit a hardware target when the software is still in a pretty big state of change. It’s hard to say what hardware will be needed and what is the best compromise between hardware performance and the DeepSpeech NN for STT. DeepSpeech is still in development as well.
I think the priority needs to be getting the STT to a good, neural net backed, open source engine. Once the software is stabilized I think there’s room for a whole ecosystem of hardware interfaces.
This is already a step-up from Google Home devices in that you can trust it's not sending audio to Google outside of what you intend to, but I'll be properly excited when the Open-Source speech-to-text component is working[1] and I don't have to send my voice to Google at all.
I might be wrong so grain of salt. The main website is down. I believe the open source STT was working and in. As with many community projects it relies on folks contributing time to update documentation and unfortunately Mycroft hasn't received the love of other projects so I believe, again I may be wrong, that this documentation is outdated.
I made a fully offline voice assistant on a Raspberry Pi with Pocket Sphinx and Festival Lite glued together with Python. Performance wasn’t great but it was a fun project nonetheless.
Most web pages need to be able to be updated. I think the two most popular options for that are a CMS or a static site generator. In general it is easier for "less technical" users to update the content in a CMS. Hence you get a database for rendering.
Alternate analysis: Personally it is not surprising to me at all that a marketing website would be run on Wordpress, Drupal, etc. And therefore it is quite clear that a database connection would be needed.
Even with wordpress you don't necessarily need the database connection. Either a colocated reverse proxy can cache the result, or you can configure cloudfront or some similar 3rd party service which will not hit the origin unless it has to. Or you can even use one of the wordpress plugins which caches the resources/pages as static files.
It requires extra half an hour of work, but... it's probably worth it.
The website is down but a lot of work happens at the forums and the forums are still up. If you're interested in some insight in to the community around the project have a look over there. https://community.mycroft.ai/
Stupid question: has anyone hooked up a voice assistant sucessfully against upnp renderers and media servers? Would be willing to invest some effort if I could use them to browse music. Seems quite tricky particularly for non native speakers :)
You can absolutely do this. I haven't played with it in a while so I don't recall if that got put behind the service paywall. The wake word is processed local though and the software is open source so absent the online management dashboard I'm certain you can hard code it. They were also working on, or had, a home server for managing your devices entirely internal to your network and it seems unlikely they didn't include a way to manage the wake-word.
Because the way PHP work (1 process per request), it can't handle a lot of simultaneous requests and also anyone can crash ANY wordpress website with a single WRK command ...
If you have all the content cached by your CDN it might be ok, but if you have a dynamic website with API/DB, not just static content ... wordpress is slow as hell from my experience
I've been using WordPress for 4 years with good scaling and no downtime. It's only "slow as hell" if you don't actually know WordPress (which most WordPress users don't). ;)
is there any good terminal assistant? something that accepts these kinds of queries but with in a cli, preferably without the need of constant internet connection
Mycroft has a curses CLI: `start-mycroft.sh cli` or `mycroft-start cli` if services are running, `start-mycroft.sh debug` or `mycroft-start debug` to start services and go straight into the CLI.
> Mycroft is named in honor of Mike, the supercomputer in Robert A. Heinlein’s classic novel “The Moon is a Harsh Mistress”. Heinlein’s Mycroft was a High-Optional, Logical, Multi-Evaluating Supervisor, Mark IV, Mod. L” – a HOLMES FOUR. Mycroft’s friend Manuel named him “Mycroft” after Sherlock’s elder brother Mycroft Holmes. This was later shortened to Mike.
Mycroft was actually more brilliant than Sherlock, though also lazy.
"My dear Watson," said he, "I cannot agree with those who rank modesty among the virtues. To the logician all things should be seen exactly as they are, and to underestimate one's self is as much a departure from truth as to exaggerate one's own powers. When I say, therefore, that Mycroft has better powers of observation than I, you may take it that I am speaking the exact and literal truth."
... "I said that he was my superior in observation and deduction. If the art of the detective began and ended in reasoning from an arm-chair, my brother would be the greatest criminal agent that ever lived. But he has no ambition and no energy. He will not even go out of his way to verify his own solution, and would rather be considered wrong than take the trouble to prove himself right. Again and again I have taken a problem to him, and have received an explanation which has afterwards proved to be the correct one. And yet he was absolutely incapable of working out the practical points which must be gone into before a case could be laid before a judge or jury."
edit: spelling