Hacker News new | past | comments | ask | show | jobs | submit login
Sirius: An Open Intelligent Personal Assistant (github.com/jhauswald)
165 points by tectonic on March 15, 2015 | hide | past | favorite | 23 comments



Projects like this are great. Too many valuable services exist within a walled gardens that are actually just straightforward pieces of software that anyone should be able to run without having to buy a specific cell phone or sign up for an account on an ad publisher's site/app.

I can't help but wonder, though, if the way we tend to go about it can really affect change in a meaningful way. How many people are actually going to go through the effort of setting this up and running it? And keep it running? And put up with inevitable bugs? And updating it? How well can it inter-operate with other types of services? While they've gone above and beyond to document a contribution policy and use GitHub Issues, is it only possible to add a feature by hacking on a huge, monolithic project?

I don't know the best path forward, but it seems like the first few questions could be addressed by something like sandstorm or docker. The rest is architectural, and it's not a problem with this project so much as it's a lack of a standard for extensibility or interoperability in the OSS community, which has enough trouble getting past issues of language and init system preferences..


> monolithic project

Actually it's not. Looking at what is on Github, it seems that the project is mainly about bridging a few open-source modules (kaldi, sphinx, pocketsphinx, openephyra) together to make an end-to-end system. So if you want to contribute, you can always work on improving one of the base modules (that handle things like speech recognition, question answering, etc).


Most of the files appear to be images, not source-code.

"I can't help but wonder, though, if the way we tend to go about it can really affect change in a meaningful way. How many people are actually going to go through the effort of setting this up and running it? And keep it running? And put up with inevitable bugs? And updating it? How well can it inter-operate with other types of services? While they've gone above and beyond to document a contribution policy and use GitHub Issues, is it only possible to add a feature by hacking on a huge, monolithic project?"

How do ideas and open source generally work? How does a tree grow?


I think (hope) fortune will favor more modular approaches to these types of suites/programs instead of huge monoliths. Since projects of this type, especially ones that are OSS, are so very new and few in number, it will take some time before we start to really figure out what the core components really should be, and even longer still to hammer out specs/APIs that allow interoperability.


As phones get more capable, you just package it and put it on the app store.

This project is trying to provide replacements for Google's various Android services:

https://github.com/microg

This fits into that sort of attitude. I have not idea how mature or useable microg is.


I'm all for seeing the source, but a better starting point to see what Sirius is capable of is their videos page at:

http://sirius.clarity-lab.org/category/watch/

Also, the site was having some DB connection errors with the attention, so it looks like you can view the same vids directly on YouTube at: https://www.youtube.com/channel/UCEiLPIvZiW4jq9ZMonYdhcg


Digging a little deeper (since the Clarity Labs site is still throwing the DB connection errors), Jason Mars is a professor at University of Michigan. A Sirius paper[1] is listed on his homepage[2]. The second publication link about Protean Code is also extremely interesting. It seems Prof. Mars focuses on Warehouse Scale Computing (WSC), and Sirius and Protean Code[3] are two noteworthy research results described by his group as having immediate commercial applicability.

[1] http://web.eecs.umich.edu/~profmars/wp-content/papercite-dat...

[2] http://web.eecs.umich.edu/~profmars/index.html%3Fp=8.html

[3] http://www.cse.umich.edu/eecs/about/articles/2014/Protean-Co...


Thanks for the links, they make it clearer what it can do and how it feels. Looks really interesting, and I like that it doesn't require some third-party server somewhere on the internet.


The Clarity Lab group reuses mature open source libraries for speech recognition, computer vision AND question answering.

They link these into a whole using what appears to be a minimal amount of glue and then proceed to optimize certain components by trying out to make a multithread (hence CPU multicore) or a GPU port for bottlenecks.

In my opinion, a major contribution is a GPU-optimized Machine Learning thingamajig (GMM) that's plugged back into a well known frequently used speech recognition library (Sphinx).

Superficially, one would expect a publication to some AI, ML or Robotics workshop/conference/journal. However, if you spend time digging through the source code you will see that there is zero contribution in these areas. Hence their first publication is to what appears to me to a be a systems research publication. The conclusion of their paper talks about total cost of ownership in datacenters by using GPUs etc.

The amount of marketing and vague biz-dev type speak almost made me discard this whole site as BS, but now I see that if nothing else, they've given away a GPU accelerated ASR library under a BSD license. Cool work, dudes and dudettes.


Nice idea. But the demos are disappointing. The promotional video is awful. Talking heads and stock shots of a server farm. Please.

The demo video is better. But the question answerer seems rather dumb. There's no indication that it has any context; that is, it doesn't seem to use what you said previously in any useful way. The screen message for the recognition of the Empire State Building says "Parsed metadata", so they may have not actually recognized the image at all. Also, answering "Where is the Empire State Building" with "New York" is not particularly useful if you need directions.

Context is the hard problem. Without conversational context, it's like having voice input to a search engine. If you don't get the answer you want, you have to ask a new question, treated independently of the original one. You can't continue the conversation in the area of interest.


It seems to me that what you really want is proper implementation of coreference resolution [1]. This is extremely hard and extremely poorly done so far. Perhaps with more open sourcing, the problem would be more widely tackled. Or perhaps not.

[1]: http://en.wikipedia.org/wiki/Coreference#Coreference_resolut...


Coreference resolution is a start. You need that just to disambiguate questions.

I wonder how well Hello Barbie does at this.


question-answer.tar.gz contains JAR files like Javelin.jar, which code vanished from the public web.

Ephyra also contains some components that have been adopted from the JAVELIN system., though the link is dead: https://web.archive.org/web/20081203004045/http://durazno.lt... and it seems there exists no archived source code of it. https://web.archive.org/web/*/http://durazno.lti.cs.cmu.edu/... (not archived). Javelin was used in the TREC competition, and there a lot of papers about it e.g. http://www.cs.cmu.edu/~llita/papers/lita.javelin-trec2005.pd...

I wrote the maintainer of Ephyra an email.


Appears that the JAVELIN homepage has now moved to https://mu.lti.cs.cmu.edu/moin/Javelin_Project. However, the code and data page appears to be locked behind an authentication wall.


Thanks a lot!

I haven't received an answer yet from the Ephyra maintainer.


I just learned something can be vanished from public web. Thanks. :) (I have just watched part of 'cave of the forgotten dreams' and soundly I started to think that the internet is the new kind of cave for humanity to express themselves.)


Science Friday did a good podcast on a related project:

http://www.sciencefriday.com/segment/11/21/2014/would-you-tr...

They discuss "Amy" from x.ai. Turns out that x.ai needs to employ an "AI trainer" (discussed at 7min in) to handle all the situations that the algorithms can't.


So... just saying here, but with "Siri" in the name, the first thing the title made me think was "oh, is it a clever hack to pipe Siri's processed text into a different program?".


"Siri" -- Apple, closed source -- amended with "us" -- the community. Not a clever hack but a pretty awesome statement.


Actually I took it a different route at first. I thought Sirius satellite radio and without noticing the link at first I thought they were going to start offering some sort of personal assistant service to compete with OnStar or similar services.


I'm worried by the lack of package definitions in the Java source I looked at. You shouldn't be putting Java classes into the global namespace. I have to wonder what they were thinking.


Pretty sure they were thinking about implementing AI. You'll often see computer scientists running roughshod over engineering best practices for the simple reason that they aren't engineers. And you know what? That's okay!


They all use Sphinx for speech recognition under the hood, which sadly works very poor for me.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: