Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft in advanced talks to buy Nuance (cnbc.com)
89 points by jbredeche on April 11, 2021 | hide | past | favorite | 65 comments



What’s the angle here? Microsoft wanting a position with interactive voice products for dataset curation? Getting into the automotive supplier space? Surely it’s not for the speech tech. Nuance has largely failed to fully embrace deep learning approaches.


>>What’s the angle here?

Probably the patent portfolio that Nuance aggressively asserts[1]. They also went on a 51-company (!!!) acquisition spree up to 2018[2] after ScanSoft acquired Nuance[3] (but kept the name) in 2005, when both of them were #1 and #2 in the speech recognition space.

While Nuance may have failed to embrace DL approaches, Apple has been willing to license Nuance technology to power Siri for many years[4].

EDIT: They recently acquired another company in February 2021.[5]

[1] https://www.ft.com/content/dd908d81-4859-4d19-a1e8-e5f7b4f01...

[2] https://en.wikipedia.org/wiki/Nuance_Communications

[3] https://www.computerworld.com/article/2556291/scansoft-to-bu...

[4] https://www.forbes.com/sites/rogerkay/2014/03/24/behind-appl...

[5] https://www.fiercehealthcare.com/practices/nuance-acquires-s...


I was at one of the companies Nuance absorbed. They laid us all off, nice touch but turns out they needed me. I got all my demands met with zero resistance which only means I was thinking too low.

I got to see inside the beast and it www just out of control growth from acquisition. The IBM voice patent had eclipsed scansoft was.

Kind of strange MS wanting it for anything but their patents. The business was shit. As a side benefit they can shut down Siri since the apple deal was such a linchpin.


This could pair very well with the Discord buy.


>What’s the angle here?

As someone who works professionally in this space, I can say that most commenters thus far have completely missed it. The angle here is Conversational AI in the Enterprise Call Center market. This is a market worth hundreds of billions of dollars annually. This announcement is almost certainly a reaction to the partnership between Genesys and Google to create Google Cloud Contact Center AI. I know a lot of people hate customer service chatbots, doesn't matter, companies are spending a fortune on them to replace agents and Microsoft/Google/Amazon want a slice of that pie.


Your reply intrigued me. I worked with Nuance about 20 years ago, around the time VoiceXML was being created. I was writing code in a language that was one of the precursors of VoiceXML, and then for a while I was writing code that generated VoiceXML for telehony user interfaces and that depended on Nuance.

I even wrote a VoiceXML interface to Zork that used Nuance. The user could say stuff like, 'Go east', 'use axe', etc. It was pretty silly but you could play original Zork via the telephone :)

20 years ago people were talking about the big money in voice recognition being in enterprise call centers. And so it's interesting to me that this is still the big market force.


Playing Zork with voice sounds really fun. Back a few years ago when conversational ai was the hot thing to invest in I wondered if text adventures had a chance of coming back because of it. Probably too geeky still.


> Nuance has largely failed to fully embrace deep learning approaches.

This may be a feature, not a bug. Explainability is very difficult with deep learning systems, and that often makes deep learning less suitable for some compliance applications like government usage. Microsoft may also still be feeling more conservative after their twitter bot learned so well a few years ago.


The pre-deep-learning speech recognition methods don't have any reasonable explainability anyway, it's also based on large statistical models, so I wouldn't consider explainability as a reason.


Also for healthcare. Nuance is a big player in the healthcare market. A whole segment where DL approaches are not suitable due to lack of transparency.


You won't use speech recognition to heal patients or make diagnosis..

And if you use it to control medical devices, then transparency is not what matters. A protocol matters, like the machine repeating the commands and asking for confirmation.


No, you use it for recording patient interactions and making notes.

The main difference in voice for these apps is an intentionally limited and specialised vocabulary, where you have to be pretty certain that they said aberrant rather than apparent, or anuresis rather than enuresis. A lack of humeral would mean you are lacking a particular fluid, while a lack of humerous would mean you are missing a bone. The difference between an apparent mole and an aberrant mole is pretty huge, so you want to be sure your AI isn't over-fitting.

Also the wider ecosystem of this is that you have to have a full software for editing and checking the notes that is compliant to local healthcare regulations, and standard interfaces into medical software. Also penetration into the healthcare market is notoriously hard.

I would imagine synergies in this space include things like Microsoft Teams being in a good place to take on remote consultations compared to Zoom et al, as you could have a seamless flow from consultation to record in the healthcare system and medically-accurate transcription.


I'm going to write this all with the speech to text on my droid and add line breaks after.

Aberrant mole apparent mole. Works like a charm. Now let me blow your mind with another speech to text: ganglioneuralgia.

I think this acquisition has anything to do with DL vs traditional. We're running opaque DL models transparency at a major academic hospital to help treat patients. This is a cake walk by comparison. Specialized vocabulary? Just pay a few people to read some medical textbooks and your done.

I think they just want the reputation customers and product suite.

Also they're making integrations out the wazoo with ehrs. I actually can't believe it, but epoc let Microsoft integrate teams. Integrating and licensing voice is easily a next step.


Microsoft is having success with aggressively pushing Teams because most enterprise IT is too small-minded to explore anything else.

So it's the most "modern" solution they can offer their company without having to do any real additional security work.


> Microsoft is having success with aggressively pushing Teams because most enterprise IT is too small-minded to explore anything else.

Or alternatively, no other solution offers similar price/performance - They are having success because for most corporate subscriptions it is free. Also it's the most commonly used solution for business in the UK, so I have less problems with people joining teams calls than any other provider.

I don't think it's small mindedness - what other solution should they offer?

* Zoom - Awful internal messaging capability. A full office 365 subscription costs almost the same as a standalone zoom licence.

* Slack - No support for external conference calling without integration to another service.

* Google Meet - Makes sense if you use google suite, but not really if you use 365. Other than that not too bad.

* Webex - Awful client requires downloading which causes people to have issues half the time.


The big deal here is that EPIC has been classically closed off to outside software. The fact that they partnered with Microsoft to create an integration is astounding.


Yes and as you say, you don't need interpretability for those use cases. You need performance, as in most situations. Also, people cannot explain why they heard a sentence, they just heard it. So explainability doesn't really mean anything in speech recognition. Performance is key, and dragon was used because it was better.


1. It's not like Hidden Markov Models (the approach that dominated recognition prior to the deep learning revolution) is any more explainable than deep learning models.

2. You generally don't gain more confidence in the accuracy of a particular word by looking at less context. This is neither how human nor machine recognition works.


I'm not saying that a particular technical approach is better, I'm just saying that from a product perspective the medical industry has specific requirements which are currently satisfied in Dragon and not in the generalised speech libraries Apple / Google have at the moment.

I've not got direct experience in healthcare, but do have experience in industrial voice, and this is an area where Apple/Google generalised libraries perform significantly worse than specialised software (Dragon is also big in this industry, albeit at the SDK level). In industrial voice the main requirements are high levels of background noise, restricted vocabulary (20-30 words) and people speaking very quickly.


Monopoly Money - Microsoft stock price is really high. They are buying a revenue and market share.

Nuance has very strong presence in medical transcription, retail and a few other areas. Microsoft can move them to Azure and their own speech to text technology.


The main reason I am worried about this is Microsoft killing off yet another viable on-premise speech technology in favor of forcing people to pay for Azure subscriptions.


When people want to pursue antitrust legislation, these sorts of purchases are what we should be going after. If we continue letting incumbents buy all the tech (coughFacebook's purchase of Instagramcough) with their country-sized budgets, no wonder the incumbents are unkillable.


if they kill off this patent troll it's a net benefit for humanity.


as long as MS continues to sell an offline on device version of Dragon (Naturally Speaking) in the future, I can agree. But otherwise I'm stuck on Pro Individual 15


The automotive bizz was spun off into Cerence


I'd wager services for Azure.


Everyone is wanting to take on Twilio.

Last September, Microsoft first announced their Communication Cloud.

A huge focus at Twilio now is moving upstream to the Call Center, where Nuance is a significant player. So Microsoft picking up Nuance makes sense.

It’s clear Microsoft sees communication services as a strategic core part of their business.

(Even at the consumer / gamer level with the rumored Discord acquisition talks)

https://techcrunch.com/2020/09/22/microsoft-challenges-twili...


Companies too big to fail grow even bigger. This is not sustainable. At some point (sooner rather than later) we need a legislation that will be splitting companies going over a certain size. Big companies don't pay the same high taxes as small companies and beside the size, this is another unfair competitive advantage that stifles development, keep wages low and does not pay enough money in taxes, so that people have to pick up the tab. We get a situation that CEOs fly private jets and paying hardly any tax and workers getting peanuts in comparison plus having to pay like 40% of what they make and going to work in a beaten up Toyota.


Visiting a parking lot in Redmond might help you reconsider your image of pitiful, suffering Microsoft workers driving beaten-up Toyotas.


I 'm going to go against the groupthink here and suggest that if these companies had done ICOs instead of IPOs , people could vote with their feet against them by walking away from their currency. The way the current financial system is set up, these companies have their market caps because the state a) protects the currency and b) heavily regulates where and how people can invest. As the sibling comment suggests, the system incentivizes the status quo


ICO seems a better idea, indeed, but still you have a situation where the real consumer is a shareholder and the clients are the product. I don't think this is in the spirit of capitalism.


Size imposes a penalty on companies anyway. And companies shrink as well as grow. IBM shrank. Microsoft closed up its entire mobile efforts after acquiring Nokia (another company that shrank). Then there’s Polaroid, Blockbuster, Xerox, DEC, ... the list goes on and on.

Even Sears was once unassailable. Until it wasn’t. Similarly, Amazon might seem too entrenched now, but really it’s just one storefront among many and could easily go the way of Sears.

There are also lots of antitrust tools available to regulators, including breakup.

Saying “acquisitions are bad” and imposing an arbitrary limit on size is short-sighted nanny-state thinking. The market should be regulated, but market forces are also an important regulator and that fact needs to be recognised more widely.


> Then there’s Polaroid, Blockbuster, Xerox, DEC...

I wonder what the correlation is to long-term business health and acquisition. Facebook I would bet you would have shrunk substantially by now if not for its acquisition of Instagram. If Google's long purchase history of various supplementary services (like the tech behind Google Earth or YouTube) had been blocked, they would have at least taken a lot longer to get to their current position.

By contrast, I don't think any of your example of shrunk companies going on anything like the acquisition spree that these tech giants have. At this point they don't have to innovate because they have enough money to keep buying the new hotness forever.

In general I agree with your opposition to "short-sighted nanny-state thinking", but this seems like the lightest touch that solves the problem of tech/wealth consolidation that I've seen, but I'm certainly interested in hearing about lighter solutions.


> I don't think any of your example of shrunk companies going on anything like the acquisition spree that these tech giants have.

IBM has a long history re acquisitions[1]. Xerox hasn't done it quite as much, but they started as a photographic paper company (Haloid) and actually acquired their photocopying business (the Rectigraph Company). They've made many other acquisitions more recently[2], but of course on a scale that suits their budget.

To the bigger point about the multi-billion dollar acquisitions that current tech giants do, those are a function of the massive cash piles the companies are sitting on and the low interest rate regime worldwide (which causes stock prices to go up). It's not like these "lock up" competition or anything. Indeed, by ending Nuance as an independent specialized voice to text company, there'll be more pressure now to fund alternative independent implementations. Google already has one that's nipping at Nuance's heels. You might see others fund a third effort now.

[1] https://en.wikipedia.org/wiki/List_of_mergers_and_acquisitio...

[2] https://www.news.xerox.com/investors/acquisitions


> Microsoft closed up its entire mobile efforts after acquiring Nokia

Sometimes what happens is, that big company sees something useful that small company is doing, they buy them, pick the part that they need and then close the rest.

> Amazon might seem too entrenched now, but really it’s just one storefront

Then they should pay taxes just as a storefront.

> The market should be regulated, but market forces are also an important regulator and that fact needs to be recognised more widely.

The regulation is very weak. There is no level playing field as big companies have money to buy politicians and regulation that favours them and penalises competition. The whole concept of legalised corruption - lobbying goes against free market and should be illegal.


> Then they should pay taxes just as a storefront.

If they haven’t paid whatever tax is due, then the relevant national tax body should throw the book at them.

But I suspect you’ll find their tax lawyers & accountants ensure they’re in full compliance with relevant local laws. Cases like [1] come to mind.

[1] https://www.reuters.com/article/us-eu-apple-tax-idUSKCN24G10...


I understand your point, but I have MSFT shares, I win, if Microsoft wins.. My point is I and many others profit from the status quo. If anything, I would want that 40% tax reduced and the surplus invested in shares, such as MSFT..

Not saying it's an ideal situation..


In my country shareholders don't pay the same income tax as workers - this also plays in favour for the rich. It is insane that people who make money from other people work pay less tax than workers who create all those profits.


Good. Microsoft's own speech-to-text in Windows 10 continues to have poor accuracy in Long-form dictation. I was spending so long editing I went back to typing.

My android phone was more accurate so now I dictate there and email myself the text.


Don't get your hopes up too high. Dragon isn't that great, in my experience. I use it for prose text to help limit my typing, and I do spend a significant portion of the time frustratedly editing dumb mistakes that should obviously be something else from the context (wrong grammar or nonsensical combinations of words). Correcting and re-correcting and training and retraining words it keeps getting wrong is really frustrating. It may be that their NZ English model was built from a smaller corpus, so YMMV.

I'd rather not use a cloud service like Google.


You might be interested in this: https://github.com/biemster/gasr (not production-grade, but depending on your workflow you can hack on it)


If this goes through, Microsoft could do something positive with some tech that Nuance owns that probably isn't making much money anymore anyway. One of the text-to-speech (speech synthesis) engines that Nuance got from its various acquisitions is called ETI-Eloquence. It does pure speech synthesis, not concatenation of pre-recorded human speech like most modern TTS engines. So it's particularly popular among blind power-users because it can speak very fast while still being quite intelligible. At the same time, it doesn't hold much if any appeal for mainstream applications. But the codebase has been abandoned for a while. So it would be awesome if Microsoft would open-source ETI-Eloquence. Hey, I can dream.


My thought exactly. An open source version could be nice in it's own right or lead to good improvements on other open source speech synthesizers such as Espeak. I think current TTS research/software is only focused on sounding nice and human. This is cool if you want to replace a human voice with something computer generated, but not ideal if you want an efficient speech output that can convey as much info as possible as fast as possible. Predictability is key here, I could proofread a text in Dutch (my native language) with ETI Eloquence set to English and just by the sound of certain letter combinations I would know if there is a spelling mistake, couldn't do that with any other "better sounding" synth.


Nuance used to buy every company they could get their hands on. It didn't work out so well for them in the long term. Microsoft has been doing the same recently. Every week there is a nother big company they are trying to acquire (Pinterest, Discord, etc). They are becoming another Salesforce in my opinion.


I will note that at least on iOS devices Microsoft's speech recognition is Not Great. To try it, install Swiftkey and try the MS powered voice recognition. Speak only in fully formed sentences with no pauses or you'll get a bunch of sentence fragments instead.


Nuance speech recognition tech has a long and interesting history of acquisitions.


Like L&H which in return acquired Dragon Dictate?


Yes and former nuance employees are responsible for starting speech teams at companies like google


didn't dragon come out of ibm?

fun fact: nuan was my favorite stock in 05/06... 15y too early i guess...


Wasn't this company the one that also made the original Swype keyboard for Android? I remember it having it's own "Dragon" Dictation system built in after a while.


Nuance acquired Swype and then shut it down.

Disclosure - I worked at Swype and was part of the acquisition.

https://www.theverge.com/2011/10/06/confirmed-nuance-acquire...

https://techcrunch.com/2018/02/20/nuance-ends-development-of...


Sorry to hear that.

Swype was _by far_ the best keyboard for all-screen smartphones, and I miss it very much. It was markedly better than any other swiping keyboard implementation, as well as a substantially better feature set.

If MS silently switched Swype recognition and features into its Swiftkey product, but maybe kept Swiftkey's prediction engine – the only good thing about it – I would be happy to buy it again.


I think you can still get it if you look into your past purchses on google play, and I think last year I even managed to somehow download language packs for the multi-language support, indicating that the servers that host those are still up in some form?

EDIT: Yep- Just checked and downloaded the app & the german language pack successfully, will be fun swiping again. Neither swiftkey or gboard could replicate how good swype used to do it.


Oh yes. On Android, my licence still works, even on new devices.

But I got an iPhone 6S+ about 3Y ago – and the last time I owned an iPhone was before iOS supported replacement keyboards. After a lot of fussing around, I managed to copy a friend's installation onto it, but every time I restarted the phone, he had to authenticate it with _his_ Apple Store ID and password. Not convenient at all.

Also, AFAICS, the iOS version misses 2 crucial features from the Android version:

• it doesn't seem to have the edit-keys screen, with a cursor square and dedicated keys for delete forwards, delete backwards, cut, copy and paste;

• or the dialling-keypad screen (with the numerals in a phone-style square, as distinct from the numbers-and-symbols screen).

I've not seen any other Android keyboard with these. Some other swiping keyboards now support looping over a letter to double it (very handy in English, which is full of double letters – e.g. "cuter"/"cutter", "planing"/"planning"). I don't think I've seen any others which used swiping upwards off the screen to capitalise a word. Nor can any other I've seen handle swiping down to a punctuation mark to include it and keep going: "cant"/"can't", "were"/"we're", "ill"/"I'll", for example.


Ah - damn. Was one of the few apps I never regretted spending money for on my phone. Must've been fun to work on while it lasted though! Do you have any interesting anecdotes from developing such an influential app that you can talk about?


Well that's a surprise. MS used to not really be interested in speech, guess they changed their minds?


Windows has had built-in speech recognition for years.


A decade ago it was also purely local. It didn't require sending any data to third party clouds, and worked faster (no network roundtrips involved).


Since Windows Vista, at least. I remember being quite surprised by how well it worked - I could dictate and there were very few mistakes.

I was particularly taken with how it understood commands. It was amazing to me that it could understand "open Firefox" and "open mIRC", since I figured it must have had to guess how arbitrary program names were pronounced.

But yeah, here we are with pocket-sized devices supposedly more powerful than the computers of the time, which have to send off everything to The Cloud for approximately the same quality recognition. Progress!


You're right. I wrote my first program using MS Speech API around 2006/2007.

It was even better than that! One thing it had, that doesn't seem easily available to current cloud systems, was an easy API for restricted grammar. You'd write a piece of XML that described what kind of things could follow which other kind of things, to build e.g. a graph of commands.

The other big thing it had was training. You could repeatedly read text to the system, and it'll improve the recognition of your voice. If at any point the system had a problem understanding you, you could just pop into Control Panel, type in what you tried to say, and do a training run on it, and from then on, it would understand you just fine.

Using this, I built a simple music control program that offered playback control, volume control and playlist selection. I defined the words to use and described the grammar in XML, and had my program control a WinAMP instance (good ol' days when people cared about interoperability - all I had to do was to send WM_USER messages to the WinAMP window...). I soldered a 3m cable to a $0.1 microphone I bought in electronics supply store, glued it to the wardrobe, and had a Star Trek like music control.

I printed out all the common combinations of spoken commands, and trained the Speech API backend with it for an hour, just reading the commands and changing my location, and the type and loudness of music that played in the background during training. I.e. same text, done {near microphone, from bed, from doorway} x {quiet, medium, loud} x {pop, classical, radio}, IIRC. After doing this, the system worked flawlessly - I could listen to very loud music on the loudspeakers, and it had no problem understanding me from any point in the room. This was much better experience that I can achieve today with my overpriced smartphone and Google's advanced cloud AI.

I really miss offline-first, low-latency, end-user-trainable, transparent, bullshit- and surveillance-free speech recognition.


MS has close to zero presence with regards to papers and few industry positions. Them building a (bad) toy system means little. Google and FB are on a totally different level in comparison.

I have a vague memory of reading that the higher ups just didn't believe in voice-based interfaces.


sadly that does not contain Nuance Document Imaging, since that was already sold to kofax.


Good, let one (former) cancerous company swallow another.

As soon as Siri is no longer powered by Nuance, things might start to get better.


Nuance isn't the problem here. The component Nuance provides (voice to text) works great in Siri. She often understands what words you said and then completely misses their meaning.[0][1][2][3][4]

[0] https://i.redd.it/btg8ky48ka551.jpg

[1] https://i.redd.it/vj4hngj5qip51.jpg

[2] https://i.imgur.com/ugh8LWV.jpg

[3] https://i.imgur.com/Yt8DycO.jpg

[4] https://imgur.com/BANa4XG


Siri has not been "powered by Nuance" in several years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: