Hacker News new | past | comments | ask | show | jobs | submit | wxs's comments login

If you like this you may enjoy the "Pink Trombone" an interactive speech synthesizer that you control by moving around the virtual tongue and lips: https://dood.al/pinktrombone/


Some past discussions:

Pink Trombone - https://news.ycombinator.com/item?id=18912628 - Jan 2019 (75 comments)

Pink Trombone: Speech Synthesis Simulation in JavaScript - https://news.ycombinator.com/item?id=14135658 - April 2017 (52 comments)

A mouth simulator - https://news.ycombinator.com/item?id=13973261 - March 2017 (1 comment)


> of consciousness, we have agents that perceive the environment and act, in order to maximize rewards

I don't believe "agents acting to maximize rewards" is a good description of any human I know, or if it is the reward function is certainly unknown.


> if it is the reward function is certainly unknown

It is a collection of reward channels related to the functioning of the body (food, shelter), learning (curiosity), socializing (and physical touch), physical integrity (avoiding harm). Even newborn babies like to be held and are curious about objects around them - they are already learning to maximize rewards.


Like Bret Victor? Who seems to have been involved in presenting this to Jobs? I'd say he's a pretty interesting guy and probably not worth firing even if you don't like one of his proposals.

https://twitter.com/worrydream/status/641705818585337856


I agree with the first half of this comment, but I don't follow you here: "Until there is an element of randomness fueled by context, they aren't creating anything new. That's where genius lies."

I suspect reframing of problems, seeing things in new light, paradigm shifts, new ontologies, whatever you want to call it are not quite so simple as context-dependent randomness! I don't think we understand this process very well right now.

Finally; I think there is an artfulness in copying existing patterns, because the way in which you "abduce"[1] an observation-explaining theory out of the infinite space of possibilities is a creative and aesthetic process.

[1] https://en.wikipedia.org/wiki/Abductive_reasoning


What I mean by thinking of context is this:

Lil Wayne has a wonderful line in his song 6'7', "real G's move in silence like lasagna". Unbelievable.

Let's extrapolate that pattern:

"Real H's move in silence like phonebooks".

Not really the same ring to it, huh? It's not just pulling out a silent letter to emphasize the silence, but there's the context of Lil Wayne being a rapper and self-proclaimed G (gangster.)

Is our extrapolation "new"? Sure, in that no one has (likely) ever said that. But while it mimics Weezy's style, it doesn't understand it's context. Similarly, if Katy Perry sang the same line as Lil Wayne, the context doesn't make sense (Katy Perry is no gangster...)

EDIT

A better example, specific to art, would be Warhol. He explicitly copied real-world objects as art, to create something new. But the newness wasn't that he made a clear copy, it was that his copies reflected the shift in materialism that came with mass-production. Warhol mass-producing "art" was a social comment that resonated _at that time_.

His art was more than the process, it was the context in which it was made and what that said more broadly about society. That's why it resonated.


> Lil Wayne has a wonderful line in his song 6'7', "real G's move in silence like lasagna". Unbelievable.

Sorry, I'm too thick. I mentally thought of a gangster moving laterally, splayed out and wondered how it would be quiet. I had to google this to understand that he meant silence like the letter 'G' in 'lasagna'.


Even harder for me, since I'm Italian and the g in lasagna is not silent at all.


I think most English speakers would pronounce the gn as ñ.


That's how it is pronounced also in Italian. Gn is a digraph, pronounced like ñ in Spanish. The g is not silent, as it is part of the digraph.


Wow, this is more complex than I thought. It looks like in Italian, you have a single consonant ɲ. In English, ɲ doesn't exist. We say nj, a two consonant cluster, and most people can't properly distinguish the two sounds.

The end result here is that in English the g causes a sound change after the n, and basically qualifies it as a silent letter. This is a subtly different sound from the Italian version, where it merges with the n and does not qualify as a silent letter.


A neural network generating rap lyrics would be trained to contain 'G' in it's vocabulary.


Extrapolation in text space is not the same as more abstract movement - even simple things like word2vec can capture a lot of meaning through context. Probably not enough to construct wordplay like this yet, but it is not outside the realm of possibility. At least a few journalists were fooled by a walk through latent space in a more complex model here.

[0] https://blog.acolyer.org/2016/04/21/the-amazing-power-of-wor...

[1] https://www.theguardian.com/technology/2016/may/17/googles-a...


Rap Genius broke that line down a few years ago if anyone is curious.

http://genius.com/72892


> context dependent randomness

Dependent randomness? or, in other words, deterministic randomness? a paradoxical notion


Unicode Technical Report #51, which is where Emoji are laid out, talks a bit about the current thinking of the committees on this:

> The longer-term goal for implementations should be to support embedded graphics, in addition to the emoji characters. Embedded graphics allow arbitrary emoji symbols, and are not dependent on additional Unicode encoding. Some examples of this are found in Skype and LINE—see the emoji press page for more examples.

> However, to be as effective and simple to use as emoji characters, a full solution requires significant infrastructure changes to allow simple, reliable input and transport of images (stickers) in texting, chat, mobile phones, email programs, virtual and mobile keyboards, and so on. (Even so, such images will never interchange in environments that only support plain text, such as email addresses.) Until that time, many implementations will need to use Unicode emoji instead

[1] http://unicode.org/reports/tr51/#Longer_Term


The matrix has nice features like flexible dates on a multi-leg trip that Google Flights doesn't offer.


Google Flights has gradually added ITA Matrix stuff over time.


I really like your Hidden Unit Zoo here http://colinmorris.github.io/rbm/zoo/ as a window into what this thing is actually "thinking" about. The "top matches" for a given hidden unit are pretty helpful.


Not officially, although people have started reverse-engineering the website ;).

We can get an actually supported one up officially if there's interest. Email me at xavier@whirlscape.com!


Yeah, I saw that you guys take a query test and spit out json with emojis :). It would be nice to have official one so you guys don't cut IPs off :)


Well you can search in Dango, too ;)

But yeah our main focus is suggestions. You can use Dango concurrently with the normal emoji keyboard, of course! It can just sit there showing you emoji you might not know about "ambiently"


It seems like you're so close to having a full predictive virtual keyboard (with nothing but dynamically-generated keys).

Have you given any thought on integrating this with some sort of bluetooth thimble-like button (makey makey?) on each finger for untethered typing?

I've written more about this line of reasoning here[1] if you're interested. Feel free to ping me on twitter if there's any way I can help. Congrats on this awesome project!

[1]: https://news.ycombinator.com/item?id=11223697


The possibilities for how language will work in the future are really exciting and interesting! We made the Minuum keyboard, too, which also explores how machine learning assistance can open up new ways of communicating.

One reason we're interested in visual communication with Dango, though, is that regular text input is pretty good already. Chorded keyboards exist and are way faster, but people mostly can't be bothered to use them. QWERTY is just good enough. But the field is wide open for rich communication with images, nothing out there is particularly good yet.


I might add that this is pretty much exactly what Elon Musk is asking for with his "Neural Lace" idea to merge humans and machines in "symbiosis".

Replace thimble-keys with OpenBCI and you already have it.

Urbit.org sounds like a good fit for the immutable append-only content-addressable private keylog (now would be the time for a portmanteau generator). I would love to help in any way to make this happen.


Yeah so this is a legitimate concern. Of course sometimes it's fun to say "let's eat pizza :pizza_emoji:", but that's not hugely valuable.

However, Dango's training data includes people using Emoji to augment rather than repeat their sentence. So if there are two different interpretations and an emoji could disambiguate, the ideal is that Dango has seen people use that phrase both ways and, and that it suggests both possibilities and you can pick the one that you meant. In many cases this works now, in many cases we still have work to do.

It also suggests based on messages sent to you, so if there are a couple different replies it can show you them all (although this feature still needs work).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: