Hacker News new | past | comments | ask | show | jobs | submit login
Intel open sourced Stephen Hawking’s speech system (msdn.com)
432 points by btzll on Aug 17, 2015 | hide | past | favorite | 64 comments



Fun fact: the latest version of this software uses SwiftKey under the hood - http://swiftkey.com/en/blog/swiftkey-reveals-role-professor-... (Disclaimer: I used to work for SwiftKey)


As a loyal SwiftKey on Android user, the prediction engine for SwiftKey is unnervingly good. Glad to see it's put to a better use than helping me write Facebook and HN posts.

It begs the question though, why isn't there a proper SwiftKey keyboard for Windows? The OSK on 8.1 is awful compared to SwiftKey on Android. The Windows 10 is a improvement, but I'd still pay for a better one.


Have they improved performance of SwiftKey? I had the paid version for years, was even a VIP (won a t-shirt and everything), but I switched off last year in favor of Fleksy as a faster, more lightweight alternative.


I think the performance of Swiftkey improved about half a year ago. But I think around the same time I started noticing a significant drop in Swiftkey's accuracy. It "feels" less accurate to me than it was, although I use it with 3 enabled languages at once and I imagine that also brings lower accuracy by default. Still, I think it has become quite a bit worse than before, and I worry they did that on purpose as a compromise to improve performance.


Was a loyal user. Then they started making themes, emojis and all those bells and whistles and thrusting it all in the app which made it really slow and bulky.


Does this mean that SwiftKey is now open source, that what they've open-sourced doesn't include the actual prediction engine, or that what they've open-sourced is not the latest version? The GitHub page says they use Presage.


Interesting, since Presage is GPL 2 but this project is Apache licensed. Intel says "Integration with Presage is through the Windows Communication Framework" which I guess is a roundabout way of avoiding the GPL?


The Apache license v2 is GPL-compatible, which was actually one of the major reasons for having a v2.


Apache licensed code can be included in a GPL project, not the other way around. If the project includes any GPL code, the whole thing is GPL.


I think it is better to think of it from the perspective that this code and this project is Apache, but any binaries that result from linking this code to that code are under GPL.


Bear in mind this project was started around the same time there was a ton of uncertainty around the future of Silverlight and WPF. Alas, one did die, one lives on for now. But nobody knew that at the time, including Intel, or apparently Microsoft. WinForms has never faced any forward compatibility uncertainty so it is a good long-term bet.


It's also a compelling choice if you need to operate in a resource-constrained environment. I don't know what kind of hardware is in Hawking's system nowadays, but back when this was first being built a low-power portable system that runs a higher-activity WPF UI smoothly could have been fairly expensive.


You mean Silverlight died? I've seen that said over the years, but:

A) Microsoft still mantains and updates silverlight.

B) It's mentioned as one of the components of Windows 10[1].

So I'm really confused here as whether Microsoft will kill it already, or keep it alive. They seem to be willing to kill it[2][3], but Netflix alone is enough reason to keep it alive.

[1] https://www.microsoft.com/en-us/privacystatement/default.asp...

[2] http://www.digitaltrends.com/computing/microsoft-wont-includ...

[3] http://www.windowscentral.com/microsoft-confirms-its-new-edg...


Dead can mean different things. Often it just means dead-end technology in the sense that MS does not intend to develop the technology further. Since MS have a strong focus on backwards compatibility, dead-end technologies often can keep running a very long time. VB6 for example is clearly a dead-end technology, but nobody prevents you from running and even developing VB6 apps.

Silverlight has been a dead-end technology for some time, but it will continue to be supported in Internet Explorer in "legacy mode". You can still run ActiveX-controls in IE (in legacy mode), so expect to be able to run Silverlight apps for the foreseeable future.



Running Firefox 41 and it still loads the Silverlight plugin and explicitly asks for it if disabled... Also, Netflix needing Silverlight drove the development of Pipelight for Linux: http://pipelight.net/cms/about.html

It got my attention that Chrome supports HD "up to" 720p, unlike the rest that get up to 1080p. Why would that be?


WinForms is still a great way to write desktop software if you don't need all the features of WPF, I just started a new project with it and have been really productive.


WinForms is not simple. There are so many core classes that have counterintuitive edge-cases and overcomplicated behavior, and so many things you'd expect to work by default don't.

Databinding is a complete trainwreck, the Combo-box class is horribly overcomplicated by its double-duty as text-entry and drop-down-list, the DataGridView is a complete beast of leaky abstractions, and the layout engine completely falls apart if somebody alters the DPI unless you obsessively test DPI alterations yourself.

I don't blame Microsoft for any of this - it was 2000 and they were making a wrapper around some terrifying legacy code.

But this thing should have been tossed in the dustbin of history a long time ago.


I believe that no technology is simple on it's own, it all depends to the abstractions that you are used to. Counterintuitive is dependent on how things are expected to work.

Data binding is not solid, but it is a quick hack to display data, the solution is using a business model and mvp or mvc.

What do you find complicated about the combo class?

Data grid view... It is a train wreck, but then again, there is not much need to use it if you have a proper model behind.

The layout engine does sucks... The only alternative I found is to use the dev express layout control. The rumor is that 4.6 solves this.

Win forms is solid and it has very little chance of disappearing. Areas of the screen can be controlled independly, which means ui encapsulation is there... Something not easily done in html.


What a hyperbole of a comment. It really isn't that bad.


Sure. But - serious question - what offering from Microsoft would you replace it with?


I keep using it because I'm used to all its warts and idiosyncracies. So I don't know what properly-supported alternative one should use. I just get annoyed how many brand-new fresh-out-of-college developers I meet that use it. They need something better.


Having used both WPF and Silverlight until 2010 (at which point I abandoned the Microsoft stack altogether) I agree, but I don't think the answer is either of those technologies.

Have you tried building GUI apps in Racket? That's the sort of thing I was wishing for when using either Java or .NET to build Windows GUIs.


I've done some academic intro-to-FP stuff in Racket, but haven't really got my feet wet with a non-toy application in it. So the GUI framework is good?


Yup. I haven't built anything of significance in it (yet) but it's proved really easy to learn, and (again, in my limited experience) rock-solid stable and fast enough:

A trivial example:

  #lang racket
  (require net/url
           racket/gui/base
           racket/sandbox)

  (define (menu-file-exit-click item control)
    (exit 0))

  (define frame
    (new frame% [label "Demo"] [height 480] [width 640]))

  (define menu
    (new menu-bar% [parent frame]))

  (define menu-file
    (new menu% [parent menu] [label "&File"]))

  (define menu-file-exit
    (new menu-item% [parent menu-file] [label "E&xit"] [callback menu-file-exit-click]))

  (send frame show #t)
... is all you need to create a basic GUI app with a File -> Exit menu option. And that really is all there is - no resource compilation, no code-behind, no separate languages for expressing the UI and the actions connected to it.


WinForms is still the easiest way to write desktop software quickly and easily. If you don't need a fancy UI for customer facing work, its a great platform for internal use.


... for some definition of "easiest." Maybe I would agree if you don't need to support a custom look-and-feel (Win32-looking apps don't really fly anymore), responsive layout, high DPI, touch/pen input, system theme colors, accessibility, localization, and haven't learned XAML.

WPF and UWP apps are both far easier.


Win32 is the style that Windows apps are supposed to be and what people expect in a Windows app. What are you talking about? In what way does the win32 look "not fly"?

I've tried to use WPF and it was just a major pain and felt like a mess. There is no impediment to "responsiveness".

Where did you get the idea winforms apps don't follow the system colors?


> Win32 is the style that Windows apps are supposed to be and what people expect in a Windows app.

Maybe in the Windows XP era. None of the built-in apps in Windows 10 look like Win32 apps, apart from legacy stuff that hasn't been ported yet and now looks sorely out of place.

https://msdn.microsoft.com/en-us/library/dn894631.aspx?f=255...

Much of the Windows shell isn't even written in Win32 anymore, it's all UWP (source: I work on the start menu).


Have a look at lazarus (http://www.lazarus-ide.org/). It uses a different language (FreePascal instead of C#), but for me it's much more productive, and the programs written run without any framework and feel much snappier.


The data binding in WPF seemed more powerful to me... enabling MVVM approaches, although I'm sure the same is possible in WinForms as well.

Does WinForms still get love in the new versions of the SDK or is it considered done now?


WinForms is in maintenance mode. The only updates it gets is to make sure it runs on new versions of Windows & maybe some security related updates.

(There still are bugs around, but I think they will not be addressed ever. Fixing those could potentially break existing applications relying on that behaviour.)


I don't know, I've done both (Silverlight MVVM and Winforms) plenty and I prefer the manual approach. Databinding is cool until you have any sort of complexity in your app, and then it becomes unwieldy quickly.


Databinding (and MVVM by extension) is ok only in very simple scenarios. Once you hit some complexity you end up in a world of hurt.

Databinding is not a leaky abstraction ,it's a fucking flood abstraction.


Um, if you're using MVVM, then by definition you should have a clean separation between the properties of your Model, which is part of your domain, and the properties of your View-Model, which is part of your presentation tier and directly tied to a particular View. If you have this, what exactly can be leaked by databinding? The only things you're binding to should be properties of your View-Model.


While that's largely true when dealing with pure business logic, there's a maddeningly large amount of UI logic, and hybrid business/UI logic that becomes really annoying (or outright impossible) to handle in pure MVVM with WPF. Reasons for this include that many user interface properties aren't exposed nicely to allow data-binding.

Probably the most infamous one is that the WPF listview with multiple select enabled doesn't allow you to bind to the collection of selected items. Instead you have to do all sorts of work arounds, that while individually aren't too bad, when put all together, makes all the other hard work you put into doing MVVM on the components that you fully control super frustrating.


> many user interface properties aren't exposed nicely to allow data-binding

You hit the nail on the head with this. This is why I don't like to use WPF outside of writing small utilities.


I've worked on large and complex apps with MVVM in WPF and on the web with Angular, and have not regretted using MVVM for a second.

Databinding is indeed a leaky abstraction, but at the same time it's a very powerful one. I'm willing to learn the inner workings of the binding system to avoid performance pitfalls and other weirdnesses. I'm also willing to continually wrap all sorts of not MVVM-ready components to make them data-binding friendly. When people talk about databinding being a leaky abstraction, what I hear is "I was promised magic and it's not actually magical."

Also - there's many different approaches to how it's done with various tradeoffs. Compiled bindings on Android and the new Windows platforms look interesting, and you should also check out how ReactiveUI approaches it.

In the end though, I've never been able to achieve a satisfactory level of loose coupling, testability, and portability without databinding. Despite the overhead and occasional surprises, it's paid off in spades as far as quality and productivity.


As far as I can tell, it seems to be done.


Discussion of a previous article, focusing on the difficulties during the development of ACAT and tailoring it towards Stephen Hawking: https://news.ycombinator.com/item?id=8686757



To everyone who's interested and programming and are thinking; what should I do/program? I'm sure there are a lot of small things or applications you can do to help other people in need. See it as a learning experience and something that might have a huge impact in other peoples life, how's that for a motivator for something to do? Kudos and respect to all people behind this project and to Stephen Hawking himself.


My mom had ALS, and used single-switch input for a while, after typing and writing on paper weren't possible. She wrote out little notes about what she was thankful for, prayers, practical messages (she had Type 1 diabetes, and told folks her insulin doses), and recipes this way. Eventually she had to switch to giving messages to a human holding a letter board by looking towards them for 'yes' and away for 'no'--cameras or Hawking's infrared-laser-based system weren't really feasible.

We looked at some software called EZ Keys from a company called Words Plus. (I don't think she used it specifically, at least for long--I know she used another program, a DOS-based one called Living Better that ran in 40-col. mode that I can't anything about on the Internet.) EZ keys looked more or less like Intel's thing -- scan rows, scan items in a row, completions/predictions over at left. It even had an option to use a frequency-sorted keyboard like the Intel one, with the common letters pushed to the top left (since those are the first rows/cols to be scanned). Hawking apparently used EZ Keys, so it's possible the Intel folks intentionally gave their thing a similar interface to make the transition easy.

It is worth remembering that no user cares if it's WinForms or whatever. Some folks might like a nicer voice if they haven't gotten used to theirs like Hawking ;), but the main concern is just getting the message across. Intel seems to have worked on the right stuff: better prediction (Presage http://presage.sourceforge.net/, which looks interesting) and context-sensitive controls. The infrared-laser-based input method sounds cool, too.

This is a neat space: an optimization/prediction problem where improvements can be a significant help to someone. (There are also practical optimizations that don't have much to do with the general word-prediction problem: sometimes people have to say things about their care, food, etc., or generic 'hi' and 'bye', and it's good if those are fast.) A Web page or Chrome extension can do a lot--how close can you get to smoothly operating the Web with just the spacebar? the arrow keys and Enter? or plain old typing, but slowed down and using 0-9 for completions?

I've heard that nowadays, people with communication trouble and enough movement use text-to-speech on mobile gadgets with their nifty and highly refined predictive input and that's awesome.


Thinkpad love there as well. His "custom" computer appears to be an X220 tablet in an enclosure.


Would make sense, easily available commodity hardware that is reliable and in a decently small form-factor.


Is there Hawking's speech synthesis at all (there was an article that his voice is based on some hardware device http://www.wired.com/2015/01/intel-gave-stephen-hawking-voic... )? I understand it's "just" a "navigation" system (replacing the mouse and keyboard with the facial movement virtual key). If it's so, the title (the "speech system") is misleading.

The project also doesn't use SwiftKey but

https://github.com/01org/acat

"Presage, an intelligent predictive text engine created by Matteo Vescovi."


It's seems that the company which owns "Paul" voice preset and DECtalk now is SpeechFX:

http://www.speechfxinc.com/dectalk.html

I don't know how different is that from CallText 5010 which was, as the Wired article states, eventually bought by Nuance Communications. Still, as per Wikipedia:

https://en.wikipedia.org/wiki/DECtalk

"The CallText 5010 is still listed on Hawking's site as of 2015.[9]"


@acqq: Shameless plug. This doesn't seem to have speech. However, I've tried to build a text to speech synthesizer in www.voiceclonr.com. Appreciate if you could try and leave feedback.


If I understood correctly the open-sourced version uses as an example the Microsoft's Speech API. Searching for which I find the gems like this:

https://connect.microsoft.com/VisualStudio/feedback/details/...

"System.Speech has a memory leak - by eoghanoh

Status: Closed as Won't Fix"

I see your work is based on http://hts.sp.nitech.ac.jp/ Can you tell us what are your changes?

Edit: I see HN already commented your work:

https://news.ycombinator.com/item?id=9812734


So much developer rage in that Status :) On the HMM stuff, it was pretty much the baseline code from the link. The things I recall experimenting were more about getting it done faster (threading some training phases, different gcc options during synthesis etc).


So the core and a really important part (speech synthesis) is still closed source.

That's a shame really, I was really looking forward to try it out. And the title is grossly misleading.


If anyone is considering downloading this to use his voice to annoy your best friend John with things like "Hello, my name is Steven Hawking. The universe is big, but not as big as John's mother.", let it be known that this software doesn't sound like Hawking.


Presage is great, but to clarify some other comments - it doesn't involve any specific dataset, like SwiftKey - it simply does nice smoothed predictions when given a large database of n-grams (groups of words) and their frequencies. It's fairly easy to chop up a corpus into n-grams using NLTK or other tools, and there's a good port for Python called Pressagio.

My startup Spoken - http://spokenaac.com - uses n-gram predictions to help users with aphasia or other language disorders speak. The user interface challenges aren't quite as intense as Stephen Hawking's binary input, but it's an interesting field if you're into design and big data.


Forcing a disabled man to use Internet Explorer. Surely this is the basest form of cruelty.


Now security researchers will analyze the code to find vulnerabilities to exploit Stephen Hawking's speech system. Next headline Stephen Hawing's voice sounds like Justin Bieber's voice, lol.


The code in the GitHub repository [1] is pretty interesting to look around in.

[1] https://github.com/01org/acat


Does it make sense to be more aggressive in predicting by giving the user a second level from which to choose? For example, if he types 'b', then it could offer to type 'black' or 'black hole'.

On the iPad, for example, if I type 'f', I get shown 'for', then if I accept, I always see 'example' and 'instance'.


I believe there is a predictive typing keyboard based off of Tries that is floating around out there for one of the mobile OSes.


Suspect "sometimes, but rarely," because choosing "hole" after "black" is pretty cheap--you need your expected savings from potentially saving that choice to exceed the cost of bumping your worst option from the list. It's more likely to be practical when you can offer lots of choices rather than the typical three on mobile, though (bumping a 10th choice is cheaper than bumping a 3rd choice).

Separately, predefined phrases/templates can be really practical for things related to care, food, saying hi and bye, etc. That's a special case--user likely cares more about getting it done efficiently than choosing the exact wording they want each time.



[flagged]


think sagivo meant to respond to https://news.ycombinator.com/item?id=10072738


Ah, that makes a lot more sense.


That about sums up my reaction. I feel like maybe this should be flagged for being off topic, but people seriously need to pay attention to what they commit...


.Net WinForms? No wonder it sounds so bad.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: