The backstory of Alexa’s Indian makeover

justboxing · on Nov 29, 2017

> “We had to think outside the box of just understanding English. We had to train Alexa to understand the proper noun in Tamil, Hindi, Telugu, Punjabi, Malayalam, among others,” says Kumar

Alexa India's true test will be on how it handles ____glish (where ____ = the base language) that's very common in everyday spoken interactions, esp. in the South.

Ex:

Tanglish = Tamil + English mixed-in. https://en.wikipedia.org/wiki/Tanglish

Kanglish = Kannada + English mixed in. https://en.wikipedia.org/wiki/Kanglish

Even TV programs have this. For instance some of the Tamil Soap operas on youtube have this message at the beginning

"Marrakama Bell Symbol-la Press Pannunga, Subscribe Pannunga".

= 3 Tamil words, 4 english words. 1 Tamil+English word.

Translation. "Don't forget to press bell symbol and subscribe."

onion2k · on Nov 29, 2017

Could Amazon just treat those as new, separate languages with a corpus of words that includes the main language plus English? It'd mean the dictionary is much bigger but that shouldn't make a big difference (as far as I know, which isn't very far...)

lozenge · on Nov 29, 2017

Without a corpus, the training will "teach" the computer that an English word will always be followed by another English word.

kaybe · on Nov 29, 2017

I'm not sure this is true anywhere, and it's changing constantly.

Here is my favorite sentence from a German ad:

'We' eröffnet einen neuen Store für Men und Women.

(A shopping chain called 'we' opens a new shop for men and women.)

This is by no means normal everyday language but marketing is full of these things.

(Not that all Germans understand it, I remember a TV show going around asking people what they thought the marketing slogans meant - it was hilarious.)

TeMPOraL · on Nov 29, 2017

Hopefully there will be more work done in mixing languages. That's a particular issue for me with Google Now - it will recognize sentences in Polish, or sentences in English, but... that's not how think in my head. Most of the time I do a free-form mix of those two languages, often at the level of an individual sentence.

(To be clear: I'm not whining. Speech recognition is getting absurdly good as it is. It's just that being able to mix languages in a free-form way would be... perfection :).)

vadimberman · on Nov 29, 2017

1. It's not the same as the original language plus English. 2. It's not easy to find data for these languages. It's easier to add the modifications, apparently.

zaarn · on Nov 29, 2017

Alexa can't even understand proper Bavarian, how can I expect it to understand Bavarian-English?

dingo_bat · on Nov 29, 2017

> “Every 100 km there is language change and in every 30 km the dialect changes,”

This is so true. The term number of languages is a bit inadequate when you talk about India. Another issue is the high amount of migration between states. People who moved a generation ago from Gujarat to Tamil Nadu speak Tamil with a few Gujarati mannerisms thrown in. I have seen examples of this in my friends and I have seen the vice versa too. A Tamil family living in Gujarat for a couple of decades. I think Alexa needs to learn the dialect and mannerisms per user. That will be revolutionary.

nkoren · on Nov 29, 2017

Yes, this. I'm American but have cumulatively spent several years in India -- mostly Delhi, with about 6 months in Ahmedabad -- over the past couple of decades. It's given me the the ability to comprehend (but not really speak) basic "chaste" Hindi, with a bit of Gujarati on the side.

In practical terms this is essentially meaningless, because most of the time, people are speaking several languages at once. Last night I was in a supermarket in London, listening to the staff who were animatedly speaking what I think was a mashup of Bhojpuri (some of which I could understand) and Malayalam (no chance). Was totally baffling. Elsewhere I've seen Salman Rushdie describe the language of Bombay as "HUG-ME": "Hindi Urdu Gujarati Marathi English", with the apparent objective being to cram words from all five languages into every sentence. He's not wrong.

This will be one hell of an ML challenge.

_0nac · on Nov 29, 2017

Eh, this kind of thing is surprisingly common. Singlish, the lingua franca of Singapore, is loosely based on English but incorporates vocab and grammar from a slew of Chinese dialects (Hokkien, Hakka, Teochew, Cantonese, Mandarin), Malay and various Indian languages as well. And English itself is also an unholy mess of imports from everywhere.

naturalgradient · on Nov 29, 2017

Now this is a very negative opinion in a general anti-consumerist sense but:

Why does one buy items like an Alexa? I get that it performs some features, it slightly increases convenience but it does not enable anything new or different.

Something I do not understand about hackernews: There are a lot of people on here who, on average, care about the environment. By the same token, there are many users here who have an interest in electronics and technology, and are just generally interested in new gadgets.

How do you square buying gadgets with being somewhat conscious of the mounting environmental problems? The amount of electronic waste, the almost literal slave labour to mine the resources, the energy necessary to transport and assemble the parts, and the final device.

All to not use one's smartphone's assistant but a home assistant.

The consumer does not ask himself 'why do I consume', every decision is in the framework of making consumerist choices (e.g. buy a Tesla instead of a Jeep), the option 'I do not need to consume most of these things at all' is barely present.

So I am honestly asking to people buying these things, what is the thought process? Is there a feeling of happiness or joy in having new gadgets? Does it last? Are environmental or more generally philosophical considerations on whether you need to buy new things part of your thought process?

briandear · on Nov 29, 2017

Alexa/Siri et al aren’t about “consumerism” — that’s the naïve interpretation— that’s just a surface look. These devices represent the future of how man interacts with machine. Right now, they are mostly toys used for “consumer” purposes, but look at the internet itself — during the early days one could have asked “why should we be building networks for the military? Dismissing Alexa/etc. as mere vehicles for consumption is missing the greater possibilities. This tech has to start (and be financed) somewhere: thus Amazon using this as a tool to buy things is fine — because it ultimately will push the future forward despite your worries about slave labor. If we stopped buying everything tomorrow, there would still be slave labor and bad things in the world as there always has been. People need to eat, clothe themselves and make shelters. Stopping weapon production doesn’t end wars nor would stopping AI personal assistants reduce some poorer people working for some less poor people to fill an economic need. That’s reality.

megaman22 · on Nov 29, 2017

I really hope that we never have to control our computers with something as ambigous and fuzzy as speech. It's going to happen, because people think Star Trek is cool, but, gah, I'm not interested in it. Interfaces are getting progressively less useful, as far as I'm concerned; people try to displace mouse and keyboard, but it's always worse, and drags down the standard for everything.

naturalgradient · on Nov 29, 2017

So any consumption is justified because it supports the overall advancement of technology and we need not think about consumption because things will be consumed either way?

I am sorry I do not buy this - where is the justification for the assumption that this is the only way we can advance tech? As an AI researcher, I strongly strongly disagree that I have to enable more consumption for tech to progress.

dominotw · on Nov 29, 2017

I saw a billboard this morning on the highway for alexa which said

"Alexa buy me an Kenmore washing machine"

wtf.

kraig911 · on Nov 29, 2017

I just got an Echo for black friday. I can't get how dumb the thing is with all this press?

Alexa play some ambient music...

"I think you'll like this... playing Selena Gomez"...

Alexa I hate Selena Gomez...

"Shuffling Selena Gomez..."

Alexa Stop

"Playing Stop & Erase by Selena Gomez"

Literally happened here just now.

nmstoker · on Nov 29, 2017

I don't want to come across as an Alexa fanboy and defend it, but it would help to understand which parts of your interaction you think of as dumb.

I agree Alexa Stop was clearly a mistake (albeit, not one I've ever observed) but for the rest, it's understanding what you said, you simply have different expectations of what the best response is. Were it a human, one wouldn't say they were dumb, just that they had differing taste or ideas.

In particular what did you expect from "I hate Selena Gomez"? Have you seen any other music player that handles that kind of ambiguous "command"?

rdtsc · on Nov 29, 2017

> it would help to understand which parts of your interaction you think of as dumb.

All of them were dumb. That's the problem with AI conversational interfaces. Once people talk to an "assistant" they intuitively expect the level of intelligence of a human. Because of that, mistakes are not tolerated much and conversation quickly goes from "ah this is so awesome we are living in the future" to "this is the dumbest thing ever, just give me a search box and I'll type what I need or I'll click with the mouse".

Usually irritation gets amplified because with each mistake, the AI forces the human to repeat or explain things, which just makes it worse.

> Have you seen any other music player that handles that kind of ambiguous "command"?

Other music players are not anthropomorphized AI assistants. So people don't expect that from them. Once it has a human sounding name, and you are addressing it by that name, the expectation is that it would act and perform like a human.

> Were it a human, one wouldn't say they were dumb, just that they had differing taste or ideas.

That's interesting solution perhaps, what if it responded with "I don't know much about music, so I can't decide or pick what to play". I think some people would have preferred that instead of Selena Gomez. Even further maybe start a dialog about "can you give me some example of the artists or songs you like, I'll find some similar ones". That might be less irritating than simply playing Selena Gomez.

> In particular what did you expect from "I hate Selena Gomez"?

No OP, but I would expect it to never pick or play Selena Gomez again, unless explicitly asked.

vadimberman · on Nov 29, 2017

The phrase can and should be handled as a synonym of "stop" in the current context.

tehwebguy · on Nov 29, 2017

I would agree with OP that if it can't understand "I hate $musical_artist" it is almost worthlessly dumb

aaronbrethorst · on Nov 29, 2017

A natural language interface should adapt to you as opposed to requiring you to adapt to it. The part where it comes across as being "dumb" is where it Requires. The. Human. Being. To. Conform. To. Its. Very. Strict. And. Narrow. Language. Processing. Abilities.

BoorishBears · on Nov 29, 2017

I mean, does “Play something else” work?

That’s one step removed from expecting Alexa to understand human emotions but still a rather natural interaction

derefr · on Nov 29, 2017

If phone trees can jump straight to a representative when you swear at them, Alexa should be able to treat an angry tone of voice as a generic undo.

BoorishBears · on Nov 29, 2017

If phone trees can jump to an option in a tree whose options you could probably map out by hand in a few minutes... NLP should recognize your tone of voice and undo actions that have an uncounterable number of subtle interactions with different systems? Isn’t that a bit of a stretch?

And generic undo of playing a station is... playing a different station? Not stopping the music altogether? The concept of generic undo would be hard for people to come to a consensus on, let alone an NLP system.

lozenge · on Nov 29, 2017

"Alexa, play I'm going crazy"

"Sorry. Turning off".

kraig911 · on Nov 29, 2017

I tried that. It did not work.

BoorishBears · on Nov 29, 2017

Looks like a bug:

https://www.reddit.com/r/alexa/comments/6s7tgp/play_somethin...

arkades · on Nov 29, 2017

> Have you seen any other music player that handles that kind of ambiguous "command"?

Siri does sorta; I just tested it now. I said “I hate this song,” and I got the feedback “I’ll remember you hate that song,” and a different song from another artist immediately began. “I hate this artist” immediately transitioned to a song by a different artist.

“I hate (artist name)” didn’t work, but the framework is clearly there.

This doesn’t seem like an unreasonable functionality to expect at all. It’s already mostly there. In Siri.

nindalf · on Nov 29, 2017

I bought an Alexa yesterday and the second thing I asked it to do was “play the Desi playlist from my Spotify”. I tried 5 different variants of this, and also in different accents but it was unable to understand it. It’s cool that they’re doing work to improve understanding in Desi contexts, which makes it that much stranger that it couldn’t understand “Desi”.

justinjlynn · on Nov 29, 2017

Idea... "Alexa, how do I (get you to do/ask for)..." "Sorry, I'll have to think about that for a few minutes" In that time, a support tech reviews the question, perhaps with some machine leaning assistance, and then gets back to you with the answer. "(first name), about your request, you can (insert instruction or apology)." Amazon already has mechanical turk... no reason why Alexa shouldn't take advantage.

freeone3000 · on Nov 29, 2017

Because mechanical turk MUST be supervised. The reward is disassociated from performance - it's the agent problem, writ large, since workers are effectively anonymous and paid piecework. In any paper using mechanical turk, half or more is trying to get the humans to do what they're told correctly. (They behave more like mischevious djinn than computers if allowed to act freely.) Just dropping in mechanical turk without a feedback system would lead to arbitrary responses, unhandled queries, and unsolvable inconsistencies.

justinjlynn · on Nov 30, 2017

> support tech

Amazon employees with more to lose than a few cents per hour.

rljy · on Nov 29, 2017

This. I really don't understand why companies are so allergic to human intervention.

justinjlynn · on Nov 30, 2017

I would imagine it's because, unlike with computerised services, the per-interaction costs aren't entirely consistent and have non-negligible cost. It's also much much harder to hire and retain people than it is to buy a million new blackbox computers.

SoulMan · on Nov 29, 2017

I was really surprised how it is able to recognise accents, I have friends with so many different accents and it could recognise them all.

atomicnumber1 · on Nov 29, 2017

Considering how successful we have been in speech recognition in recent years, I think this will a next big challenge for speech recognition.

amingilani · on Nov 29, 2017

I'm so happy because this means we're one step closer for an Alexa for Pakistan.

needcaffeine · on Nov 29, 2017

Does Pakistan have a similar number of dialect shifts within the nation? It's at least a far fewer number of languages right?

rurban · on Nov 29, 2017

Nope, about the same crazyness as in India.

Urdu, Pashto, Punjabi, Sindhi, English, Saraiki, lasi, Kutchi, Thari, Balochi, Brahui, Hazaragi, Chitrali, Kohistani, Hindko, Kashmiri, Shina and Balti are the main languages and dialects spoken in Pakistan. https://en.wikipedia.org/wiki/Provincial_languages_of_Pakist...

amingilani · on Nov 29, 2017

Yeah, but the emphasis on Urdu as our national language means almost everyone everywhere is comfortable speaking Urdu!

sreejithr · on Nov 29, 2017

Does everyone there in Pakistan accept and use Urdu as the de-factor language? Here in India, that's unfortunately not the case.

amingilani · on Nov 29, 2017

Yes, actually. Everyone in the country accepts Urdu as a national language and atleast speaks it as a second language.

DiabloD3 · on Nov 29, 2017

*defacto

amingilani · on Nov 29, 2017

The national language in Pakistan (Urdu) is incredibly similar to Hindi. It's just written with a different alphabet and has different words every now and then.

Urdu is fairly standard across the country with slightly different accents..

The regional languages are different though. Some (like Punjabi) are spoken in India too, and then there's Pashto (also spoken in Afghanistan) but then there are Sindhi and Seraiki which can be similar and different to other languages.

However, what excites me is the fact that Hindi is so similar to Urdu, the same tech could be repurposed for Urdu with much less work than a whole new language.