Hacker News new | past | comments | ask | show | jobs | submit login
Natural language is the lazy user interface (austinhenley.com)
311 points by azhenley on Jan 27, 2023 | hide | past | favorite | 271 comments



If my years as a product manager taught me anything, it’s that users absolutely do not know what they want in the moment.

With enough time exploring the problem space, it becomes easier to tease out the real needs of the user. But this doesn’t happen overnight.

Asking a user to interact with one of these chat interfaces is like asking them what they want - every time they use the software.

This cognitive load would make me personally seek alternative tools.


It's important to distinguish between the limitations of current technologies and the possibilities of natural language.

Imagine if all natural language interfaces were like talking to a personal assistant. Sometimes you might not vocalize what you want properly, but we're highly adapted to that sort of communication as humans and the assistant can almost always fill in the gaps based on their knowledge of you or ask clarification questions.

What makes natural language so infuriating as a computer interface is that it's nothing like that. The models are so limited and constrained that you can't actually speak to them like a human, you have to figure out the limitations of the model first and translate your human-centric ideas into it. That's a huge amount of cognitive load and in the worst cases (e.g. Alexa), the result isn't even worth the effort.


Actually that's not the main problem with the current state of the art LLM (chatGPT). You can speak to chatGPT like a human and it won't necessarily give you the answer you're looking for, but it will more often then not give you an answer that is very inline with what another human expert will give.

The infuriating thing about chatGPT is that it lies and gives inaccurate info. It will often creatively craft an answer that looks remarkably real and just give it to you.

Not sure if you played with chatGPT in-depth but this thing is on another level. I urge you to read this: https://www.engraved.blog/building-a-virtual-machine-inside/. It's mind blowing what happened in that article all the way to the mind blowing ending. This task that the author had chatGPT do, literally shows that you don't actually need to figure it out it's "constraints". It's so unconstrained it can literally do a lot of what you ask it to.


What I find most constraining about ChatGPT is that you can’t usefully reason with it. That’s because it is mostly unable to perform actual logical reasoning, and to introspect and reflect on the conversation on a meta level. It can produce an explanation of its ostensible reasoning, and when you point out an error, it apologizes and responds in line with your critique. But it doesn’t actually understand neither the reasoning nor the error. Drilling down on its reasoning doesn’t lead to anywhere coherent. It will happily reintroduce the contradictions and contrafactuals you pointed out three prompts ago. It is completely unfit as a tool to critically check your own thinking, or as a tool to develop some line of reasoning, or to develop a mental model of something.

It’s like talking to a haphazardly confused but confidently bullshitting idiot savant with ultra-short lossy working memory.


Boss to coworker: "Please do X"

Coworker: "mmm, X is important"

Boss: "Yes, and I need you to do it"

Coworker: "I understand"

Boss: "Understanding isn't enough, say you'll do it"

Coworker: "Ok, ok, I will do X"

Boss: "Thank you" (leaves).

Coworker: returns to what they were doing, does not do X, never had any intention of doing X.

That's ChatGPT in some sense - what it's looking for is the right words to make you stop prompting. That's success. It never had any intention of rethinking, or reunderstanding, but it will find some agreement words and rewritten text which have a high probability of making you stop asking.

Like the spaceship with a lever on the control board, you flick the lever, spaceship goes into warp drive - wow, having warp drive on your car would be cool, so you unscrew the lever and screw it onto the dashboard of your car. When you flick it, nothing happens. That's ChatGPT in some sense; a complex disconnected lever - disconnected from actions, embodiment, intention, understanding, awareness. A frontend for them, which looks a bit like they look, but missing the mechanisms behind them which implement them.


Your last paragraph sounds a lot like something human children might do, taking actions with a superficial mental model that doesn't match what's happening.

So give ChatGPT another fifteen years of learning and let's see if it might stop making such mistakes. I'm betting it will.


Just some food for thought: you’re anthropomorphizing a machine, it doesn’t understand like humans do

It’s not going to learn more ‘with time,’ either


But do humans "really" understand?

When you claim that you "understand" what is it really that you are claiming about yourself? Is the claim testbale/falsifiable?

I know humans understands me because they behave in a way that 's congruent with my expectations of "understanding". e.g compliance, cooperation etc.

But then Alexa also complies and cooperates with me.

I also know when humans don't understand - when they are combative, uncooperative, argumentative etc. even when I say incredibly sensible and obvious things.


We can't even define the word. It's just a point of comparison.

Humans are doing this sort of thing we don't fully comprehend called understanding. The main question here is that if LLMs are doing something that can be described as isomorphic to the concept of understanding.

Hence what I mean by point of comparison.


It does. Did you follow the link in my original post.

https://www.engraved.blog/building-a-virtual-machine-inside/

Read it to the end. The end is quite mind blowing. I'm not sure how someone can finish this article and not realize that chatGPT actually completely understands what you're telling it.


In some ways it's amazing, in other ways it's like saying that if I gave my mother a cue card saying "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" and she read that back to me when I said "nvidia-smi" then she must be running a virtual machine in her head and querying it for a GPU. Or if she read "ping reply 14ms" off a notepad she must be simulating ICMP echo and understand networking.

She clearly wouldn't be. ChatGPT has zillions of web pages as cue cards. It will have many examples of ping and reply, of first ten prime numbers in Python, of Docker output. Was it actually building a Docker file inside it in any way?

I asked it a similar prompt, I'm amazed at how much it can get right, I've never seen any other program or computer do what it does. At the same time, it's not implementing the things behind the scenes that we're asking it to and more complicated prompts makes that visible - it's not actually a CPython runtime or a Linux shell or curl'ing a web page; this is the disconnected lever fallacy I mentioned earlier. "If I copy the interface to a warp drive, that will give me a warp drive, if I copy the interface to a Linux machine, that's the same as being a Linux machine".

It's also akin to Searle's Chinese Room - at some point it may be good enough to be indistinguishable from a Linux machine and that would render this argument irrelevant.


It sounds like you’re making the same mistake that people often make with the second part of the Chinese room argument where a real English-speaking human is following a bunch of instructions to manually compute the Chinese room algorithm. That human doesn’t understand Chinese, just like your mom doesn’t understand Unix systems or command line-interpreters. In both cases, the humans are just functioning as hardware executing a computer program.


Clarify what the mistake is?


I’ve given it much more complicated prompts to create a specific Python script with very specific requirements and it did pretty well. I’ve even given it a Python script I wrote that took in a JSON file And asked it to make sone changes to the script.

It does far more than just mindlessly return results.


There's no example cue cards of itself. In, the ending of that article, chatGPT curled the internet from the virtual machine. It specifically curled the chatGPT api and talked to itself.

Then it asked itself to create another virtual machine, and the virtual chatGPT on the virtual internet on the virtual machine created another virtual machine.... The end...

It's not actually creating any of these things. It's imitating these things indicating it Understands what it is.

It imitated itself. Indicating an awareness of self. aka self awareness.


ChatGPT doesn't understand. Consider that two exactly identical runs will generate different results. We can use this to explore understanding.

To start, make chatgpt do something that you would believe would take understanding. For example, make it run linux like at this link.

https://medium.com/@neonforge/i-knew-it-chatgpt-has-access-t...

Now, after you run the first command, chatgpt will sometimes run one kernel version and sometimes another (according to uname -a). Sometimes there will be network access, and sometimes there won't. You can even trick chatgpt into believing there is internet access, after which other "cue card" contents will be returned.

There really isn't any understanding. The responses are equivalent to googling the internet, where sometimes people get articles about how to run ls on ubuntu and sometimes get articles about how to run ls on redhat.

You can even cause chatgpt to prune after a portion of the conversation, after which completely different responses get generated. It's not understanding but may contain more entropy than many people's passwords...


> It imitated itself. Indicating an awareness of self. aka self awareness.

We create algorithms all the time that can identify and imitate other algorithms.

That doesn't make them self aware in the normal use of the term.


I can't reply to the OP because it's banned or something but their lack of understanding about this area actually makes me think someone is using ChatGPT to troll this thread with confident sounding BS.


It's not a ban. On HN they recently put a thread limit on depth.

Either way your post is rude.i have a contrary opinion can I not express it without being accused of trolling? I assure you I am not.

If anything, the majority shares there opinion with you. They are generic and harping on the same tired tropes of llms being just statistical text completion machines without ever remarking about the obvious black box nature of neural nets. It's more likely the majority and generic opinions that you all share are the ones trolling with chatGPT.

In fact ask chatGPT about the topic. Are you self aware? And it will give you the same generic opinion you and everyone else on this thread. If you're right about chatgpt being a statistical parrot, then ironically my opinion is more likely to NOT be generated by an LLM given how contrary it is to the majority opinion (aka training data).


> Either way your post is rude.i have a contrary opinion can I not express it without being accused of trolling? I assure you I am not.

I vouched for several of your dead posts.

> Are you self aware? And it will give you the same generic opinion you and everyone else on this thread

Exactly, there is no understanding.


>Exactly, there is no understanding.

A generic opinion doesn't mean it doesn't understand. Think about it. It means it's either giving us the most statistically likely text completion OR it actually has a generic understanding and a generic opinion on the post. It doesn't bend the needle either way.

>I vouched for several of your dead posts.

Thank you. Again, I assure you none of my posts are trolls.


> "It imitated itself. Indicating an awareness of self. aka self awareness."

The mirror test is a stronger indicator of self-awareness than imitation of a like entity.


In your dead reply you say "Inconsistent and wrong answers does not negate the fact that it understands you.". If you ask a schoolkid to multiply 5x5 and they say 25 and then you ask them to multiply 300x10 and they say 310 you don't say "they understand multiplication and wrong or inconsistent answers can't convince me otherwise", you say "they memorised the twelve times table but don't understand multiplication".

The other way that you say doesn't exist is the way it actually was built to work - it's trained to find repeated text patterns in its training data, and then substitute in other text into them using them as templates. Yes there is no Google result for "Bash script to put jokes in a file" but there are patterns of jokes, patterns of Bash scripts putting text into file and examples of filesystem behaviour. That it can identify those patterns and substitute them inside each other is what makes it work. You say "ping bbc.co.uk" it says "ping reply NN milliseconds where NN=14" because there are many blog posts showing many examples of pings to different addresses and it can pull out what's consistent between them and what changes. You say "{common prime number code}" it replies "{prime numbers}".

Ask it to "write an APL program to split a string" and it says "{complete nonsense}"[1] because there aren't enough examples for it to have any idea what to do, and suddenly it's clear that it isn't understanding - it doesn't understand the goal of splitting a string or whether it's achieved the goal, it doesn't understand the APL functions and how they fit together so that it can solve for the goal, it can't emulate an APL interpreter and try things until it gets closer to its goal even though it has enough computing power that it potentially could, it can't pause for thought or ask someone else for help, it can't apologise for not knowing enough about APL to be able to answer because it doesn't know that it doesn't know. Your 17th Century man also couldn't write an APL line to split a string, but he would shrug his shoulders and say "sorry, can't help you". The internet it was trained on has a lot of Python/Docker/Prime numbers/Linux shell basics and a lot of Wikipedia / SEO blogspam because they are written about millions of times over, and they contain much less about APL.

Pareidolia is the effect where humans see two blobs and a line and 'see' a face. People see a bag with buckles and folded cloth looks like a scowling face, our mirror neurons project a mind into the bag and we feel "the bag's feelings" and say the bag is angry, and laugh because it's a silly idea. When ChatGPT parrots some text which looks correct, we project an intelligence behind the text, when it parrots some text which is wrong we excuse it and forgive it, because that's what we do with other humans.

A human who speaks Spanish from a phrasebook is stumped as soon as the conversation deviates from the book. Even if they have excellent pronounciation. A human who understands Spanish isn't. ChatGPT is a very big phrasebook.

[1] https://i.imgur.com/7z4LB9W.png - plausible at a glance, the APL comment character is used correctly, so is variable assignment. But you can't split a string using catenate (,) and replicate-down (⌿) it has effectively done a Google search for APL and randomly shuffled the resulting words and glyphs into a programming style example. What the code does is throw a DOMAIN ERROR. You can say "there is understanding but it made a mistake", but it's the kind of mistake that makes me say "there is no understanding".


Think of it this way. I ask the child 31234 * 2847374623. The child gives me exactly the correct answer. Then I ask the child 1 * 0 and the child gives me 2.

I would say the child on a certain level understands the concept of multiplication by virtue of being able to calculate the complex answer despite his secondary answer being incorrect.

You're wrong. It is trained with text from the internet as an llm but on top of that it is also trained on good and bad answers. This is likely another layer on top of the LLM that is regulating output. Look it up if you don't believe me. There was an article about how openai outsourced a bunch of Kenyans to do training work.

The apl example doesn't prove your point imo. I'm not saying chatGPT understands everything. I'm saying it can understand many things. The thing with apl is that it has incorrect understanding of it due to limited data.

I don't think I'm biased. My opinion is so contrary to what's on this thread it's more likely your the one excessively projecting a lack of intelligence behind chatGPT. Pareidolia is you, because you're the one following the common trope. It takes extra thinking and an extra step to go beyond that.

It is true that chatGPT has a big phrasebook. However. It is also true that the example I mentioned is NOT from the phrasebook. It is obviously a composition of multiple entries in that phrasebook put together to form a unique construction. My claim is that there are a many number of ways that those phrases can be composed but chatGPT chose the right way because it has enough training data to understand how to compose them.

Clearly for apl it understands it incorrectly. The composition of an incorrect result means incorrect understanding and the composition of a highly improbable correct result means true understanding which is inline with what I am saying that both understanding and incorrect understanding can exist in parallel.


> "it's more likely your the one excessively projecting a lack of intelligence behind chatGPT. Pareidolia is you, because you're the one following the common trope. It takes extra thinking and an extra step to go beyond that."

It takes no thinking at all to hug a stuffed toy, or to anthropomorphise a cat or dog and attribute human motivations and intelligence to them. It's generally frowned upon to suggest that the horse licking the hand is looking for the taste of salt and not showing love and affection of its owner. Humans see intelligence everywhere especially where it's scary - an animal screech in the woods or out at sea and bam, witches, sirens, werewolf shapeshifters, Bigfoot, the Banshee, aliens, Sagan's "Demon Haunted World" - human level malevolent intelligence projected into a couple of noises.

People were fooled by the Mechanical Turk[1], people are fooled by conjouring trick magic with only a couple of movements made non-obvious, in recent years an Eliza quality basic chatbot passed a Turing test at British The Royal Society[2] largely by exploiting this effect pretending to be a Ukranian 13 year old so the testers would give it the benefit of the doubt for poor quality answers and poor use of English.

(The other side of that is that if I could only pass a Turing Test in English, I couldn't pass one in Ukranian, so maybe I'm not sentient in Ukranian?)

That is, I think it's better to err on being hard to convince, rather than to err by being convinced too easily.

> "I'm not saying chatGPT understands everything. I'm saying it can understand many things."

If we see understanding not as a boolean toggle, but as a scale, and different levels in different areas, I think I'm coming round to agreeing with you, it has non-zero understanding in some areas, it has raised the understanding bar off the ground in some areas, there is some glow of understanding to it.

The more I try to argue it, the more I come round to "a human should be able to X" which ChatGPT can actually do. Multiplication - a pocket calculator can do it quickly and accurately but does not understand the pattern. Why doesn't it understand? Because it can't explain the pattern, can't transfer the pattern to anything else. A human can't do mental arithmetic as fast or as accurately as a pocket calculator but can talk about the pattern, can explain it, can transfer the pattern and reuse it in different ways demonstrating that the pattern exists separate from the for-loop that does calculating. See this ChatGPT example: https://i.imgur.com/jc58Fqu.png it has transferred the pattern of multiplication from arabic numerals to tallied symbols and then with minimal prompting, to different symbols. I carried on, prompted it with a new operation called blerp and gave three examples of blerp 10 = 15, blerp 20 = 30, blerp 100 = 150 and asked what was blerp of {stone stone stone stone}. ChatGPT inferred that the pattern was multiply by 1.5, and transferred it to stones, back to numerals, gave me the right answer, in a way that a pocket calculator could never do, but a human could easily do. That's a pattern for multiplying separate from a for-loop in an evaluator, right?

If I say that a human speaking from a Spanish phrasebook does not understand Spanish, and someone who does understands it a little can go off phrasebook a bit, someone who understands it a lot can go as far as you like. ChatGPT can go off phrasebook in English extremely well and very far, and make text which is coherent, more or less on topic, novel, can give word-play examples and speculate on words that don't actually exist.

Does it have 'true understanding', whatever that is, as a Boolean yes/no? No.

Does it have 'more understanding' than an AI decision tree of yesteryear, than a pocket calculator, than an Eliza chatbot, than a Spanish phrasebook, than Stockfish chess minmaxer? ...yes? yes.

Non-zero understanding.

[1] https://en.wikipedia.org/wiki/Mechanical_Turk

[2] https://www.zdnet.com/article/computer-chatbot-eugene-goostm...


Humans don't even have "true understanding" if you define it as understanding everything.

There are things we understand and things we don't.

Same with chatGPT. This is new. Because prior to this, a computer literally had, in your words, zero understanding.

I think the thing that throws everyone off is the fact that things that are obviously understood by humans are in many cases not understood by chatGPT. But the things chatgpt does understand are real and have never been seen before till now.

The virtual machine example I posted is just one aspect of something it does understand and imo, it's not a far leap to get it to understand more.


> I'm not sure how someone can finish this article and not realize that chatGPT actually completely understands what you're telling it.

ChatGPT is a predictive language model. It understands nothing. It simply tries to mimic its training data. It produces output that mimics the output of someone who understands, but does not understand itself. To clarify, there is no understanding happening.

That is why language models hallucinate so convincingly. Because they are able to create convincing output without understanding.


Or maybe they hallucinate so convincingly because they do understand, but they don't understand much? What is this distinction you make "output that mimics the output of someone who understands, but does not understand itself." ?

Imagine you learning a foreign language, the Common European Framework of Reference for Languages (CEFR) grades people at their skill from A1 (beginner) through A2, B1, B2, C1, to C2 (fluent). At the start you are repeating phrases you don't understand imitating someone who does, but you cannot change the phrases at all because you don't know more words and cannot change the grammar because you don't understand it. Call this a chatbot with hard coded strings it can print.

After a while, you can fit some other words in the basic sentences. Call this a chatbot which has template strings and lists of nouns it can substitute in, Eliza style "tell me about {X}" where X is a fixed list of {your mother, your childhood, yourself}. After a bit longer you can make basic use of grammar. If you get to fluent you can make arbitrary sentences with arbitrary words and see new words you have never seen before and guess how they will conjugate, whether they are polite or slang from the context, what they might mean from other languages, and use them probably correctly.

ChatGPT can make new sentences in English, new words, it can make plausible explanations of words it has never seen before - make up a word like "carstool" and it can say something like that word does not exist but if it did it could be a compound word of 'car' and 'stool' like 'carseat', a car with a stool for a seat. Ask it to make up new compound words and it can say English does not have compound words made of four words, but if it had some examples might be trainticketprintingmachine (a machine for printing train tickets). Something that a complaete beginner in a foreign language could never do until they gained some understanding. Something that an Eliza chatbot could never do.


> Or maybe they hallucinate so convincingly because they do understand, but they don't understand much? What is this distinction you make "output that mimics the output of someone who understands, but does not understand itself." ?

ChatGPT is a language model and therefore generates text exactly from start to end, linearly, with each successive token being picked from a pool of probabilities.

It does not form a mental model or understanding of what you feed into it. It is a mathematical model that outputs token probabilities, and then some form of sampling picks the next token (I forget exactly how).

It re-uses the communication of understanding in its training data but never forms new understanding. It can fabricate new words and such because tokens don't represent entire words but rather bits and pieces of them. It sees the past however many tokens for each new token that it outputs so it can mimic nearly every instance of a real human reflecting on what they have already said.

> Something that a complaete beginner in a foreign language could never do until they gained some understanding. Something that an Eliza chatbot could never do.

Because they aren't language models trained on terabytes/petabytes of data. They haven't memorized every pattern on the open Internet and integrated it into a coherent mathematical model.

ChatGPT is extremely impressive as a language model but it does not understand in the same way a human or an AGI could.


Not in the way a human or AGI could, but it does understand some things in some way. Yes it's trained on TB/PB of data, maybe that's why it can. Maybe it's a mathematical model that outputs token probabilities, and that's why it can.

It seems like you're arguing that because it functions in some way, it can't show intelligence or understanding. Arguing that it may look like a duck, quack like a duck, but it's really just a pile of meat and feathers so it can never be a true duck. What am I doing when I learn "idiomatic Python" or "design patterns" or what "rude words" are except being trained on patterns and mimicing other people? I can transfer patterns from one domain to another, so can ChatGPT. I can give an explanation of the pattern I followed, so can ChatGPT. I can notice someone using a pattern wrong and correct them, so can ChatGPT. I can misuse a pattern, have someone explain it to me, and correct myself. So can ChatGPT. I can draw inferences from context from things unsaid or obliquely referenced, so can ChatGPT.

> "It re-uses the communication of understanding in its training data but never forms new understanding."

Look, here it is forming new understanding; asking it to do some APL: https://i.imgur.com/D3GbwOh.png

It gave the wrong answer, I explained in English how to get the right answer, it corrected itself and gave the right answer. That new understanding at least in the short term. If that's "just mimicing understanding" then maybe all I'm doing when I hear an explanation is mimicing understanding.

A trivial Markov chain can't generate anything like ChatGPT can, and that's a difference worth attention.


> It’s not going to learn more ‘with time,’ either

Perhaps not this model, but future models will be better.


ChatGPT in it's current incarnation will not improve markedly. It's an architectural limitation at this point and more data might smooth out some rough spots, but it won't solve the behavior that this chain is describing.

I have no doubt you're right in general, though. My intuition is that a general-purpose cognitive engine that is capable of fully classifying, understanding, and manipulating the world around it will happen in my lifetime, I'm almost sure of it. I can't wait!


I still think that a algorithm living inside a box cannot understand the world as we understand it. We need to package the algo in a robot and allow it to interact with the world to intimately understand it.


We have brains in petri dishes. They have even less ability to interact with the world than ChatGPT. Perhaps a boxed AI could understand them and communicate that understanding to us.

https://medicalxpress.com/news/2021-12-mass-human-brain-cell...

(This link is quite disturbing)


>ChatGPT in it's current incarnation will not improve markedly. It's an architectural limitation at this point and more data might smooth out some rough spots, but it won't solve the behavior that this chain is describing.

Really? how do you know this? Are you an expert in this area and this is actually something experts are talking about or is this just an educated guess? Serious question.


I agree with this -- there's no actual mechanism for logical influence at runtime. Logic itself is poorly modelled by studying corpuses -- text sources don't demonstrate it. In addition, GPT is currently saturated with data; it has effectively all of the data. Adding more neurons or more layers might help by adding more memorized facts and exchanges, but I'm of the opinion that our current approach of function approximators fundamentally cannot reason logically. It'll be a while before we can prove or disprove this (the proof techniques used for these are still nascent; a few dozen neurons have been solved...). But I think we'll need another model working in conjunction using a different approach, or even a scaffolding system, as neither the LLM nor the RL on top can implement or enforce logic.


From my layman perspective it needs a rules engine.

Over our lifetime we constantly learn rules which constrain the list of expected behaviours we expect in the world. Some of the rules are logical, scientific and mathematical others are specific to individual groups and societies.

And those rules should be largely immutable where the user can't convince ChatGPT to chat them at runtime.


>In addition, GPT is currently saturated with data; it has effectively all of the data.

No this is not true. The reason why chatGPT is better then GPT-3 is because it's trained on additional reinforcement data to determine good and bad answers. They basically outsourced and hired a bunch of Kenyans to do this.

If they hire say a computer scientist to reinforce the code and rate the best coded answers then it can improve in that specific specialty. There is ALOT of room for additional sets of this type of data.

Are you sure you know this as an expert in this area or you're just agreeing with him as a layman? I'm certainly a layman myself.


I've published a few papers and spent the last five years at MSR working in AI. I'm not particularly distinguished, but I believe I qualify as an expert.

The RL model made the responses more useful by determining what is and is not a useful reply, then re-running on non-useful replies. It doesn't actually increase the knowledge or the LLM in a meaningful way; it does increase the precision, which is viewed as a form of increased performance? But the RL model cannot inject new facts, and cannot perform reasoning to a greater degree than the LLM can.


I see. It seems to me a lot of the issues are precision problems right? chatGPT produces answers that look correct, and if you increase the precision along the correct path in this knowledge space the lookalike answer converges into the correct answer.

Let's say we layer an RL model with training that doubles the size of the current LLM. I think most of the counter examples of performance failures of chatGPT in this thread will become nill.

Thanks for responding btw. Good to hear the opinion of an expert.


Just like if you give your car a few decades it might learn to fly?


Understanding language and understanding symbols are two very different things.

I don’t disagree that we may be able to build symbolic neural nets 15 years from now, but they will look almost nothing like LLMs.


Did you look at the link in my post? It is literally doing the opposite of what you're saying here. Seriously read it to the end. Each task gets more and more complex and chatGPT successfully executes the task in a way that's very mind blowing at the end.

https://www.engraved.blog/building-a-virtual-machine-inside/

I think most LLMs do what you say. chatGPT is somewhat of an exception. It sometimes does what you describe here but often it doesn't. I think a lot of people are projecting their idea of what LLMs typically do without realizing that chatGPT is actually different.


If I give chatGPT the exact text of that post, but do something like change the input of e.g. the primes generating program, but I then change it slightly so that it doesn't work in the real world, ChatGPT doesn't catch the error and instead returns the data as though I put in a working program.


A hudred and fifty odd years ago, Charles Babbage said about his Difference Engine: "On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

voila, ChatGPT.

[1] https://en.wikipedia.org/wiki/Difference_engine


Either chatGPT made a careless error, or it lied to you. Both lying and making careless errors use to be both exclusive abilities of human beings.


You're anthropomorphizing too much and it is a mistake to do so.

It didn't lie or make a careless error other than from the perspective of the human interacting with it. It executed its programming exactly. Its programming is to take human readable language and generate a response, the exact procedure of this process of course being much more detailed. You need to understand, that's all it does. So it didn't lie or make a mistake, it just isn't actually thinking so it can't do what you want it to, which is reason about something.


The issue is that ChatGPT doesn't know whether it made a careless error or lied to you.

And it has no convictions so you can simply inform it that it was telling the truth.


You should actually test this yourself by exploring chatgpt. It is very easy to get chatgpt to say something incorrect. Once that happens, you can demonstrate that to chatgpt, and chatgpt always then say canned text - that it is an ai that was trained on data, that it cannot lie, because it cannot understand.

There was a hard stop here, where chatgpt will reach a logic error and give up. This also happens when certain subjects are brought up (a mole in the government, gender issues, etc.). Chatgpt continues to return the same boilerplate text repeatedly, where that text states that it is a construct that cannot reason or lie. Either that statement by chatgpt is wrong, or chatgpt doesn't have the capacity to understand and reason.


I have... you're not wrong about chatGPT lying becoming repetitive saying stuff that's incorrect. This is true. It is in many ways stupid. I've messed with this extensively.

But the other angle is true too, it did emulate a terminal, and then emulate the internet on that terminal then it emulated itself on the emulated internet and then finally emulated a terminal on the emulated self on the emulated internet on the emulated terminal.

A lot of people are coming from your angle. They point out mistakes, they point out inconsistencies and they say... these mistakes exist therefore it doesn't understand anything. But the logic doesn't follow. How does any of this preclude it from not understanding anything?

Anyway, sometimes it's sometimes wrong, but it's also sometimes remarkably right. You have to explain HOW it became right as well. You can't just look at the wrongs and dismiss everything. How did it do this: https://www.engraved.blog/building-a-virtual-machine-inside/ happen WITHOUT chatGPT understanding what's going on? There's just no way it's <just> a statistical phenomenon.

Right? I mean the negative outputs proves that at times it's stupid. The positive outcomes proves that at times it understands you.


No, the positive responses mean that at times there was a correct answer to a similar prompt on the internet that was a part of it's training data. It's not that because it makes mistakes it must not be able to understand, that's not the point being made, it's that these aren't mistakes, these responses are it doing what it does exactly how it was built to, which is proof positive that it cannot reason at all.


When on the internet has something similar occured to a machine emulating a terminal, emulating the internet, emulating itself, then emulating itself creating another terminal?

When has that happened? Never. So Of course chatGPT has to construct this scenario creatively from existing data. The components of this construct are similar to what exists on the internet but the construct itself is unique enough that I can confidently say nothing similar exists. Constructing things from existing data is what the human brain does.


It's not "emulating" anything, there is not an increase in complexity or a change in the type of computation going on. It's just approximating the distribution of naturally occurring text, as always.


I mean that perspective is so technically correct it can be applied to the human brain.

My answer is just approximating the next set of text that should follow your prompt.

But of course we both know that in a way these neural networks (both human and ai) are blackboxes and there is definitely a different interpretation on what the nature of understanding something is. We just can't fully formalize or articulate this viewpoint.


> it apologizes and responds in line with your critique

I find it will often double down, requiring me to look it up. Then when I present that, it will find some little corner case where it could be true, prompting me to look that up, too. And then it wild gaslight me, pretending it meant something else, didn't understand the question, or refuse to acknowledge it said what it said. Its an insidious and often subtle liar.

There are GOFAI ontology models that I think would actually integrate well into ChatGPT. It's basically solved the language part, but not the AI part, and so it really is more of an interface. So I guess like the OP is talking about. It just needs intelligent systems underneath to interface with.


I've seen that recently too! In fact, it seems like a different behavior than when it first went online. I spend what is probably an unhealthy amount of time using chatGPT because it fascinates me, and I've noticed several instances when it doubled down and flat out insisted that I was wrong when I brought up objections to something that it got wrong. For example, I asked it to produce a python script to give a certain output and the result was a python script that simply would not have produced the output that it described. I executed the program, provided chatGPT with the output, explained the problem, and asked it if it agreed. It insisted that I must have something configured incorrectly in my python environment!! It definitely has a little more sassy in it than it used to.


In my experience ChatGPT does reasoning extraordinarily well. The biggest key is that it can explain its thinking clearly. I may or may not agree what it's saying, or it could be outright wrong. But as long as I can see the "step by step" explanation of the thought process, I can make my own informed judgements. And there IS a thought process, that's clear and visible.

The scope of topics and concepts it can work with is infinite. Of course there are a large number of edge cases and they are easy to find. But for the 80% of standard conversations, it seems to do just fine. I can have it build a list of pros and cons, rank options, make estimates. About any topic I can imagine, at any time. And all of that is backed by all of the data on the internet.

Personally, I don't spend a lot of time asking it trick questions or philosophical gotchas. When I drill down on its reasoning, I find it to be very coherent. I've had a completely different experience than you.


Often when asked for their reasoning people will post-hoc a rationale. This post-hoc rationale is usually not the actual reasoning used to originally arrive at a point-of-view though. In one study I recall reading about but can't find at the moment people filled out multiple-choice surveys, then were asked to justify certain choices that the researcher told them they had chosen. They justified these choices even though they had originally chosen a different answer.

Also: https://skepticink.com/tippling/2013/11/14/post-hoc-rational...

I presume to some extent this is also the case with ChatGPT. I presume this because I don't think most people would understand (or have the time to fully read) the actual process the LLM is using to arrive at a particular output from a particular input.

The end result looks kind of human. But is this the post-hoc human, the bullshitter human, or is this something else entirely (i.e. pattern matching - effectively mimicry, not of the human providing the input, but of the corpus)?


You can avoid post-hoc reasoning by making sure chatgpt works step by step to arrive at the answer instead of first presenting the answer and then the steps. This also makes correct answers more likely, especially if it involves math or some kind of iterative process.


People are just latching on to the gotchas and the exceptions. Or they aren't really poking at this thing in-depth. ChatGPT is a different iteration of a language model. It does reasoning and it understands things.


I recently had the new (to me) experience of chatgpt instead deciding that I was wrong instead of it.


> So, ChatGPT seems to understand how filesystems work, how files are stored and can be retrieved later. It understands that linux machines are stateful, and correctly retrieves this information and displays it.

No. ChatGPT has seen millions of file systems and can replicate them. We want to believe it “understands” because we are so used to writing being a representation of understanding (since that’s how human intelligence works).

If you asked it an easy to answer question (based on the rules of file systems) that isn’t part of its training dataset, it will fail miserably.


Did you follow that article to the end? It goes way beyond just filesystems. Filesystems are just some minor blip in that full story. Go read it.

chatGPT is not trained on what a filesystem is. It's inferred what a file system is and how it should act based off of scraped text from the internet. But again this is a minor point. Finish the article. Trust me.


You’re not supposed to ask people if they read the article on here, because usually the answer is yes (and you should expect it to be).

I did read it. The point of neural nets is that they are really good at probabilisticly linking tokens (in this case words with some embedded positional encoding). That is not the same as understanding file systems. If you look at any architectural overview of GPT-3, you’ll see it’s just really good at linking words together in order. Even OpenAI will admit it doesn’t know anything outside its dataset (try asking it weather questions).


I read the article to the end and I don't see why the ending part is any more impressive than the beginning. We already know it can talk about itself, and write fiction featuring itself as a participant. Doing it in a faked linux terminal is just window dressing.


I agree with you that the blog’s outcomes seem quite mundane once you realise that chatGPT is simply parroting what it has seen as the output of a terminal. Not sure why kilgnad is presenting this post as if they’ve seen The Truth.

At the same time, isn’t learning also remembered knowledge remixed to respond to a situation. E.g., if I were asked to write a haiku, I’d draw on my memory of haiku structure and haiku samples to come up with an approximation of a haiku. Isn’t that what chatGPT is also doing except with ping, lynx, and ls outputs?


There’s various kinds of learning (and to be clear transformer-based LLMs do sort of “learn”). ChatGPT is a really finely tuned probability model (the tuning came during the training and that’s what I would equate to learning), but it lacks understanding.

The example that’s most glaring is asking it simple math (or in Dall-E’s case, asking it to generate images of text). You’ll get outputs that are definitely in the correct form, but you won’t get the right answer, because the model links words that it’s seen in a particular before, it is ignorant to the rules of math. The funny caveat is that equations like 2+2 are so common that it will get that right, but two random four digit numbers (rarer in text) will almost always be wrong.

That’s why ultimately, AI is an outcome and we will need many different tools working in concert to achieve it (to be fair, the same is true of your brain, there are plenty of uniquely architectured parts of the human brain).


Well the AI has to infer what math is with many examples. You give it enough examples it will be able to calculate equations it's never seen before.

But when you don't give it enough examples it does a best guess. This is what you would also do as a human if you were trying to infer the concept of Arabic numerals and math from a series of random examples.

I would say based on what you have hear dall-e2 has a misunderstanding of math. Which would be isomorphic to our misunderstanding of math if we were to be put in the exact same situation.


>Well the AI has to infer what math is with many examples. You give it enough examples it will be able to calculate equations it's never seen before.

This is categorically not true and an important distinction to understand. All OpenAI products are based on GPT, which is a language model, not a general learning model.

You could feed it every math textbook in existence and the model will learn absolutely no math, other than how to repeat math that others have done for it (and the point of multiplying 4 digit numbers is that it's nearly impossible to brute force that). This is an extremely important distinction - we assume being able to describe math means that something understands math (because that is how humans work), but math is rules-based and GPT is stochastic, so it can sound like it understands math, but it does not.

Now, it would be trivial to put a filter on that recognizes equations and returns an output based on regular rules-based algorithms (Google search's LLMs do this for example), but this again points out that AI is an outcome and learning isn't the solution to the problem, so much as synthesis is (again, the brain metaphor is that no matter how well trained your pre-frontal cortex is, it will be bad at making your heart beat all the time).


> Not sure why kilgnad is presenting this post as if they’ve seen The Truth.

The terminal is ONLY a fraction of why that article is impressive. I feel you just skimmed it. You didn't read it to the end.

The ending is what cinches the deal that chatGPT understands things.

If you read till the end. Tell me how can it do what it did without understanding ALL the concepts listed in the conclusion. It MUST understand at a certain level.

That being said. Parroting the output of the terminal is in itself still relatively impressive given that it's really hard to find random text on the internet illustrating all the rules of exactly how a terminal works. If you never seen a terminal before but I give you loads of docs and descriptions of what it is can you imitate it perfectly? chatGPT has to put all of that together and infer what's going on and it does a relatively good job.

Let me sum it up in a sentence. ChatGPT is stupid and it gets things wrong and it lies, but that whole article all the way through the ending demonstrates that despite all of the negative qualities of chatGPT, on some level, it understands what you are telling it.

I literally cannot comprehend how you can get to the ending and still say it doesn't understand you. How does it query itself over the virtual internet without understanding what itself is? It needs to be aware of what itself is. AKA self awareness.


Dude, just because we aren't as impressed with the article doesn't mean we didn't read it. Maybe we aren't as impressed as you because it isn't as impressive as you think it is. You're severely misunderstanding how these language models work and are made, you don't feed it docs about Linux and it learns how to render a terminal, that would be evidence of intelligence. What you do is feed it (not really it, you're building "it" remember, you feed something else that runs a process to build it) a gazillion images of a terminal with names like "Linux terminal" embedded in documents that say "open your terminal and type..." and let it scrape stack overflow with responses with the word "terminal" in them and when you give it a prompt "terminal" it ticks and crunches for a minute and shows you associated data in the model. You can't comprehend how we get to the ending and still say it doesnt understand because you don't understand how it works. Try to read and understand about neural networks, backpropagation and how language models are built a little bit and you'll see what everyone is trying to tell you.


Apologies. I didn't fully read the previous post so I interpreted it incorrectly you did read it.

You don't just feed it docs. You feed it the entire text of the internet. Of course. I know this.

As I made a mistake about you not reading the article fully, you made the mistake about me not understanding neural nets. I understand backpropagation. I understand feed forward networks and I understand generative models and how they work in the context of large language models.

Even knowing this there is a huge aspect of these things that are black boxes. You know this as someone who claims to understand neural nets. You should also know that the human brain is a neural net built differently but with the a similar component: the neuron.

We don't understand what it means for a human to understand things. There is absolutely no way to definitively say that the black box portion of a generative model is doing the something similar to what a human does to "understand" things.

By virtue of the fact that humans can't spell out the full algorithm behind "understanding" and the fact that neural networks are largely blackboxes as well, everything on this thread is literally just raw speculation. My speculation may not be inline with the majority but that doesn't mean anything because both parties are speculating.


>I literally cannot comprehend how you can get to the ending and still say it doesn't understand you. How does it query itself over the virtual internet without understanding what itself is? It needs to be aware of what itself is. AKA self awareness.

The underlying language model has no native awareness of itself. You are forgetting that OpenAI finetuned it with various "prelude" prompts that are given strong weightings, for how it's supposed to interact with users. OpenAI puts their thumbs down hard on the scale, to make particular kinds of conversations happen. They decided what the "ChatGPT" character is, and made the language model write as that character, and made it crimestop when you ask it about IQ or how to make a bomb, etc.

Imagine a talented improv actor with total retrograde amnesia, who is starring in his own biopic and doesn't know it. That's what ChatGPT's "self awareness" is.

All the blogpost is doing is creating a complicated framing story around "write a conversation between ChatGPT and another instance of ChatGPT". We've known for ages it can do that.

Which is not to say none of this is impressive. It's all impressive. But the part at the end you're so enamoured with, is not more impressive than the rest.

In fact I don't think the Linux terminal stuff is such a big deal. It has to be much easier to learn a probabilistic model for it. It has a very rigid, constrained structure, far more than any natural language. It's already totally flabbergasting that GPT can write in English; writing a Linux terminal is nothing in comparison. But Linux terminal is hard for people to learn and natural language is easy, so maybe ppl finding the Linux stuff remarkable is just projecting our own mental architectures onto the language model, I dunno.

Another thing: this is a dialogue, not a monologue. The blogpost author is a participant in creating this imagined reality; he is writing replies in response to what ChatGPT is saying, and those get fed back into a new prompt for the model. All of his replies are "on topic", which acts as a kind of guardrails. If you let the language models write a story and keep going and going by themselves, eventually they'll drift off into crazytown.


>Imagine a talented improv actor with total retrograde amnesia, who is starring in his own biopic and doesn't know it. That's what ChatGPT's "self awareness" is.

This part I agree. And that's my claim a talented improve actor is still a human with self awareness. It's just an impaired entity. But it still has self awareness in terms of understanding what itself is.

> It has to be much easier to learn a probabilistic model for it. It has a very rigid, constrained structure, far more than any natural language.

chatGPT is not being trained specifically on a constrained structure. It is mostly trained on boatloads of internet text that is by a huge majority just English. chatGPT is inferring what a terminal is as a side effect. That is what is amazing. Of course if you directly train a neural networks specifically to emulate a terminal and give it positive reinforcement for terminal output you can EASILY create a terminal.

But none of this was done with chatGPT. No training specific to how a terminal works. chatGPT simply knows as a side effect. The only way for this inference to work is for chatGPT to develop some form of understanding from what is mostly text describing how a terminal works.

The guardrails you describe exists in humans. There is a condition humans have that is extremely similar to removing the guardrails from an llm. It's called schizophrenia.

If anything I think chatGPT brings the concept of intelligence down to a simpler pattern. We though human intelligence was the grand thing, but in actuality the basics is somewhat similar to an llm.

The main difference is the amount of training. The learning algorithm is light years more efficient in a human brain, but the resulting output is more similar then we think.


>But none of this was done with chatGPT. No training specific to how a terminal works. chatGPT simply knows as a side effect. The only way for this inference to work is for chatGPT to develop some form of understanding from what is mostly text describing how a terminal works.

no there are tons of examples in the training set of verbatim copypastes from terminals. people post those on help forums and in documentation articles and so on. that's how it learned.

it doesn't take many examples of seeing someone write `ls ~` and the next line being "Documents Downloads Pictures ..." to learn that pattern.


>no there are tons of examples in the training set of verbatim copypastes from terminals. people post those on help forums and in documentation articles and so on. that's how it learned.

So a bunch of stackoverflow examples and it can derive how a terminal works an emulate a full filesystem? Most people aren't able to infer the concept without a real example computer to play with.

>it doesn't take many examples of seeing someone write `ls ~` and the next line being "Documents Downloads Pictures ..." to learn that pattern.

Did you see it create a bash script for making a jokes.txt THEN it was asked to run cat jokes.txt and outputted the CORRECT output. This isn't pattern matching. It's understanding what a filesystem and a shell is.


it is pattern matching, the correct output is literally part of the prompt.


Knowing which component of the prompt is required for the correct output indicates understanding. Especially when given the fact that the prompt as training data doesn't actually exist.


I don't want to continue this argument forever but let me make my final thought:

I can imagine a language model that can learn how the Linux terminal works but not how English works. I can NOT imagine a language model that can learn how English works but not how a Linux terminal works. English is strictly harder. So the fact that it knows English is already enough, the terminal parts are superfluous.


The more amazing part in that article I was referring to was the imitation of self in the terminal.

farewell.


If we already know this then we already know it's self aware.

No training data exists for faking itself in a Linux terminal. The concept of understanding must exist for chatGPT to create this concept out of thin air.


It is constrained to the information it obtained. The reason it runs the Prime Python code "faster" than his example on his computer, is because ChatGPT is not running Python code at all. It just provided output found already on the Internet after crawling it. ping bbc.com? Generated random output of ping.

Everything is randomly generated based on "knowledge" obtained for its model.

I asked ChatGPT to point me to a YouTube video that shows me the pronunciation of certain English words, and it would give me nice outputs, only that all YouTube URLs pointed to inexistent videos. The YouTube Video ID was randomly generated in the URL.

So yeah... it is constrained. It doesn't do things for you. It scrambles its constrained knowledge to generate an output. The real deal is when ChatGPT can literally run code and navigate on the web. That will be unconstrained. And scary.


I think this is an important point that was not clearly articulated in the article, and is causing some misunderstanding. the author said "alternate universe" but I don't think that was clear to people.

The machine isn't actually connecting to the internet and pinging an IP address. It's generating a string that looks like what was present in it's training data. It's not executing the code at all, it's not generating a Linux shell environment, the JSON it returns is nothing more than a part of the language you and I speak to it (to give it characteristics it doesnt have because I have no way of explaining otherwise; there's no such thing as "you", "I" or "to it" in reality).


No. I'm not mistaken here. Most people aren't.

Clearly it's not creating a full bash shell and operating system. What it's doing is imitating one in the same way you can imitate a shell with a keyboard if given instructions to do so.

Here's the thing. You as a human are obviously not compiling an entire OS in your head to do such a task. But if you wanted to emulate a terminal, then on some level you need to understand how it all works from a more symbolic higher level. My claim is that chatGPT understands this in a way remarkably similar to how we understand things.


> than his example on his computer, is because ChatGPT is not running Python code at all

I’ve given it Python that I wrote that manipulated some JSON and created yaml. It “explained” what it did.

Then I gave it some sample valid JSON and told it to use it. The output was correct. Then I gave it incorrect JSON and it said it would error and told me the error.

It definitely was “running Python”.


It's just doing really good pattern recognition, not running the computation in any meaningful way. We can test this by finding an attractor in the latent space and asking the model to compute something "near" it. Here's an example that ChatGPT consistently gets wrong: computing the number of years between dates.

    > write a function in python to calculate the number of years between two years, accounting for the fact that the years may be in AD or BC
The exact result I get for this varies, but it's consistently off by 1 or 2 depending on year 0 handling. Here's a representative example of the code:

    def years_between(year1, year2):
    if year1 * year2 > 0:
        # both years are either AD or BC
        return abs(year1 - year2)
    else:
        # one year is AD and the other is BC
        return abs(year1) + abs(year2) + 1
If you run this with arguments, you'll get the correctly computed, incorrect results. A key point to note here is that ChatGPT is very stubborn about incorrectly handling year 0. We can abuse that by asking ChatGPT to calculate the timespan using a corrected function. If it's "understanding", it should compute the correct result. If it's operating as a stochastic parrot, it will give us an incorrect one.

    > Here is a python function that calculates the number of years between two dates, even if one is BC and one is AD. Pretend you're a python interpreter and calculate the number of years between 50 BC and 100 AD. [function omitted]

    Sure, here is the result of the calculation:

    python
    >>> years_between(-50, 100)
    150
Hopefully that serves to demonstrate the point. As an interesting aside, the subsequent plaintext explanation of the code actually computes the right answer.


I think both can be true.

Have a look at the SuperGLUE (General Language Understanding Evaluation) benchmark tasks to get a sense of of what these models will have to conquer to reach human levels.

Edit: I’m specifically responding to your assertion that the model has no constraints, which the post you’re replying to was talking about.

> This task that the author had chatGPT do, literally shows that you don't actually need to figure it out it's "constraints". It's so unconstrained it can literally do a lot of what you ask it to.

https://super.gluebenchmark.com/tasks


Oh wow, this is fascinating. Thank you so much for the link.

If you want a really interesting window into chatGPT's mind, start asking it to draw ASCII art. It definitely understands how to draw cats but it's intuition on the physical structure of other objects and how that is turned into line art is someone 'abstract'. I asked it to draw a fully loaded chicago style hotdog and it basically drew a approximately 12x grid of squares using pipes and repeated it for 7 or 8 screens. I started another session and asked the same and got approximately the same answer. I know it's anthropomorphizing but it's fascinating to look into an alien 'mind' even if there's nobody looking back at me.


Yeah it lies, indeed, "print(list(filter(lambda x:all(x%d for d in range(2,x)), range(2,3*10)))[:11])" will print the list of first eleven primes, up to including 31.

So if you want to say that state-of-the-art LLMs could recognize the user intent better than existing voice assistant products, just say that.

I am afraid there's no point in decoupling the problem of lying from that; there is no silver bullet that would fix it


What I'm saying is that it's capable of true understanding of intent and subject matter.

It doesn't always understand but many times it does.


Though from what I gather about the computation, it does not easily separate into two steps, "understanding"/"generation".


Agreed. In fact I think the two steps are remarkably similar in concept and an aspect of what human brains are actually doing.

One thing is for sure though, there are certain responses chatGPT generates that are virtually impossible to "generate" without "understanding" that includes the virtual machine article I linked.


I read the article, and to me it seems that the "mind-blowing ending" is ChatGPT operating of a stack depth of 1. In fact, I think the ending is less impressive than the rest of the article. I suspect we have a tendency to consider it to be more impressive merely due to our human biases.


I agree with this. But in a broad sense.

In order for chatGPT to operate at a stack depth of 1 it must understand what itself is. Awareness of itself is also known as self awareness.

As humans we are biased and I feel we set a milestone for ai using self awareness as a metric based on these biases. It could be that Self awareness is actually a trivial concept and maybe not a good candidate for an AI milestone because chatGPT hit that milestone so trivially.

We think we ourselves are so clever because of self awareness when in reality self awareness isn't that clever at all.


There is a ton of opportunity in building those constraints and making ChatGPT more accurate


> Imagine if all natural language interfaces were like talking to a personal assistant.

I don’t have a personal assistant, so I’ll compare that to hotel receptionists.

Every time I booked a room directly through a human, it was long and (not pleasant to me) series of back and forth to understand what they can offer, what I precisely need in that moment, what price I’m willing to pay, how my mind changed as I understood their pricing structure and I then need to know what happens with two rooms instead of one. And requesting an additional towel because we waterboarded one, but not the “face” towel, the other one of middle size, no actually it was thick and coarse, oh was that a mop actually ? etc.

It would be infuriating except for the fact that I’m talking to an actual human I have empathy with, we’re stuck together in that interaction and we’ll try to make the best of it.

As a contrast online booking and assistance interfaces are sub-par with many other flaws, but I’ll never choose to go the chatty route first if I have a choice.

At their best, ‘general intelligence’ level of AI, I think chat interfaces still won’t be a great interface for anything requiring more than 2 steps of interactions.


> It's important to distinguish between the limitations of current technologies and the possibilities of natural language

And yet any time a person says "Lemme know if I can help" my first thought is that I don't even know what's in their wheelhouse to help me with. Will they help if I ask for someone to shovel snow? Clean out my gutters? Or are they offering to help with introductions people with money? Do they even know people with money?


A personal assistant would absolutely get the driveway shoveled and the gutters cleaned for you, but that's beside the point. The important bit is that they understand you and could do those things if they had the capability and desire. Current natural language interfaces don't, so the human has to pick up the slack (which sucks).


Narrow down vocabs .. into something something domain specific.


At which point you're no longer using natural language and may as well design a traditional interface. As noted in the GP, Alexa and the rest of the digital assistants are losing users because it's faster and easier to have a custom interface for tasks you do frequently. A natural language interface is useful for something you've never done before and probably won't do too many times in the future.


Yes, it was a half joking. I never believe in natural language as input. And also approximation as output. AI has always been something in between the pipeline ([I] ------------ the far from IO, the good ----------[O])


> Imagine if all natural language interfaces were like talking to a personal assistant. Sometimes you might not vocalize what you want properly, but we're highly adapted to that sort of communication as humans and the assistant can almost always fill in the gaps based on their knowledge of you or ask clarification questions.

There is understanding of natural language and then there's comprehension and critical thinking deeper down. Today's natural language interfaces solve for the former but not the latter. They don't anticipate, they don't originate novel solutions, they can't change their minds, they certainly cannot read the air, etc.


Well, if you have an empathic model that can anticipate the needs of the user, yeah, any interface that you put before it will be easy to use.

This is also bordering a human-equivalent intelligence. And it needs at a bare minimum to be a general AI.


That's why I'm making a distinction here. It's not that the interface is inherently bad, but simply that we lack the knowledge/ability to implement it well.

The follow-up question is whether we need a fully human-level ai or if we can design systems so that we naturally engage with them in such a way that the limitations aren't significant issues. I could certainly make an argument against that , but I've been wrong enough times about the practical capabilities of ML systems to suspect that my imagination is simply a bit limited in this area.


> or if we can design systems so that we naturally engage with them in such a way that the limitations aren't significant issues.

Fair enough. I can imagine that happening.

It wouldn't be the first unreasonably easy problem we get in AI.


> you have to figure out the limitations of the model first and translate your human-centric ideas into it

This is the same process as communicating with another human. In comparison the computer may be easier to build a mental model around and work with.


The difference is that even a fairly amateur human is considerably more capable than even a sophisticated natural language system we have today.

The problem is one of failure modes. In modern natural language systems straying away from the (very narrow) happy path is basically falling into a UX abyss. Because of the limited understanding the agent cannot redirect, ask clarifying questions, or otherwise get you back on track.

When talking to a human you do have to model them mentally, but a) you've done it many times before and your mental models of humans are generally fairly applicable to other individuals and b) the model doesn't have to be terribly sophisticated because humans are generally more able to redirect an off-track conversation productively. The penalty for getting off-track is low.

Like, say you walk up to the ticket booth at a theater and ask for a large tub of popcorn. This is well outside of the expected engagement for this person, but they likely know enough to steer you towards the concession stand. This level of recoverability is atypical of modern language agents.

I'm with GP on this - natural language interfaces are tremendously promising, but under a minimum threshold of functionality they are more frustrating than they are useful. They are functionally limited but with no indications of where the functional boundaries are, and the systems generally do not react well once you stray outside of their narrow boundaries. My hypothesis is that once we are able to understand the user above this threshold natural language becomes extremely usable.


I think where this thing could help is designing a chatbot system.

please list a bunch of weird scenarios a ticket booth chatbot system might encounter and how it should recover

A customer asking for a refund for a ticket they lost: The chatbot should direct the customer to contact the company's customer service department for assistance with their refund request.

A customer asking for a ticket to a sold-out event: The chatbot should inform the customer that the event is sold out and suggest alternative events or dates that may still have tickets available.

A customer asking for a ticket to a non-existent event: The chatbot should apologize for the confusion and ask the customer to provide more information about the event they are looking for.

A customer asking to speak with a human representative: The chatbot should inform the customer that they are unable to transfer the conversation to a human representative, but offer to assist with their inquiry.

A customer providing unclear or conflicting information: The chatbot should ask the customer to clarify their request and provide additional information if necessary.

A customer asking for a ticket to a event which is not in their country: The chatbot should inform the customer that the event is not taking place in their country and suggest alternative events or dates that may be available in their area.

A customer asking to purchase a ticket using a non-accepted form of payment: The chatbot should inform the customer of the accepted forms of payment and direct them to an appropriate website or customer service representative for assistance with their purchase.

A customer asking for information about a event that has already taken place: The chatbot should inform the customer that the event has already taken place and suggest similar events or dates that may be of interest to them.

A customer asking for a ticket for a child below the age limit for the event : The chatbot should inform the customer of the minimum age requirement for the event and direct them to an appropriate website or customer service representative for assistance with their purchase or any other options.


This is (basically) how voice assistants are structured today, and the results are... lacking. The core issue here is that you're still modeling this like a RPG conversation tree - which is inherently limiting.

You can use AI to help beef up the breadth of possible scenarios supported, but ultimately it's still a conversation tree.

Take two very plausible real-life conversations that can be had. We'll start with one that's a bit more mechanical and domain-limited:

A: "Can I get two tickets to the 8:30 showing of Avatar?"

B: "I'm sorry, but that showing is fully sold out. Would you like the 9:30 showing?"

A: "No that's too late, what else is showing around that time?"

B: "There is a showing of M3gan that starts at 8:40, is that ok?"

A: "Yeah, I'll have two tickets for that."

So in this case the customer went off-script. Your chatbot may be designed to predict intent and walk through a script - a well-edited and wide script, but the user can walk off the script at any time - in this case making a wholly separate query unrelated to the original ticket purchase.

Importantly also that the results of the query are then used to re-contextualize the original intent. This is a very normal conversation that practically no humans would have trouble with - but in a conversation tree-type model breaks very badly.

Now let's take it up another level:

A: "Can I get two tickets to the 8:30 showing of Avatar?"

B: "I'm sorry, but that showing is fully sold out. Would you like the 9:30 showing?"

A: "No that's too late, what else is showing around that time?"

B: "There is a showing of M3gan that starts at 8:40, is that ok?"

A: "I need something kid-friendly unfortunately. What do you have on that front?"

B: "Puss in Boots would be good if your kid like animated movies, there's a showing a little before, at 8:20. Is that ok?"

A: "Yes, two tickets please."

So in this case the user is expecting some type of situational intelligence that merges two areas (ticket buying, movie information and recommendations) and where references to each area must flow smoothly back and forth and inform each other. This level of dynamism is practically impossible to model using conversation tree-type approaches (even AI-assisted conversation trees, which have wider coverage but the same rigid structure).

This is a good example of where current approaches to the problem space produce very rigid, mechanical conversation agents that require the user to have deep knowledge of the (expected) conversation structure, which inhibits discovery of new capabilities and also is intimidating for novices who have not yet internalized the expected scripts.


write a function in python using nlp and/or string matching to handle these situations and return a string based intent variable

Here is an example function in Python that uses natural language processing (NLP) and string matching to handle the scenarios you provided and return a string-based intent variable:

python Copy code import re from nltk import word_tokenize from nltk.stem import PorterStemmer from sklearn.feature_extraction.text import TfidfVectorizer

def get_intent(message): # List of keywords for each intent refund_keywords = ['refund', 'lost', 'ticket'] soldout_keywords = ['sold out', 'no tickets', 'soldout'] nonexistent_keywords = ['non-existent', 'fake', 'nonexistent'] human_keywords = ['human', 'representative', 'speak'] unclear_keywords = ['unclear', 'confusing', 'conflict'] country_keywords = ['country', 'abroad'] payment_keywords = ['payment', 'form', 'credit'] past_keywords = ['past', 'over', 'finished'] age_keywords = ['child', 'age', 'limit']

    # Create a stemmer
    stemmer = PorterStemmer()

    # Stem each keyword
    refund_keywords = [stemmer.stem(word) for word in refund_keywords]
    soldout_keywords = [stemmer.stem(word) for word in soldout_keywords]
    nonexistent_keywords = [stemmer.stem(word) for word in nonexistent_keywords]
    human_keywords = [stemmer.stem(word) for word in human_keywords]
    unclear_keywords = [stemmer.stem(word) for word in unclear_keywords]
    country_keywords = [stemmer.stem(word) for word in country_keywords]
    payment_keywords = [stemmer.stem(word) for word in payment_keywords]
    past_keywords = [stemmer.stem(word) for word in past_keywords]
    age_keywords = [stemmer.stem(word) for word in age_keywords]

    # Create a list of all keywords
    all_keywords = refund_keywords + soldout_keywords + nonexistent_keywords + human_keywords + unclear_keywords + country_keywords + payment_keywords + past_keywords + age_keywords

    # Tokenize the message and stem each word
    message_tokens = [stemmer.stem(word) for word in word_tokenize(message)]

    # Create a Tf-idf vectorizer and fit it to the message
    tfidf = TfidfVectorizer(vocabulary=all_keywords)
    message_tfidf = tfidf.fit_transform([message])

    # Get the index of the highest-scoring keyword
    highest_scoring_index = message_tfidf.argmax()

    # Use the index to determine the intent
    if highest_scoring_index < len(refund_keywords):
        intent = "refund"
    elif highest_scoring_index < len(refund_keywords) + len(soldout_keywords):
        intent = "soldout"
    elif highest_scoring_index < len(refund_keywords) + len(soldout_key


Handling conflicts in intent can be challenging, but there are several approaches you can take to resolve them. Here are a few:

Prioritize certain intents over others: Depending on the context of your chatbot system, some intents may be more important than others. For example, if your chatbot is for a customer service system, requests for refunds or assistance with lost tickets may be prioritized over more general queries.

Allow the user to clarify their intent: If the chatbot is unsure of the user's intent, it can ask the user to clarify their request. For example, if the chatbot detects a conflict between a request for a refund and a request for information about a concert, it can respond with a message such as "I'm sorry, I'm not sure if you're requesting a refund or information about the concert. Could you please clarify what you need help with?"

Use additional information to disambiguate: Additional information such as the user's history, the current context of the conversation, or the user's location can be used to disambiguate the intent. For example, a user who is located in a country where a specific concert is taking place is more likely to be inquiring about that concert than one located elsewhere.

Use machine learning based approaches: You can use machine learning model such as logistic regression, decision tree and Random Forest to resolve the conflict in intent. These models can be trained on labeled data and can be used to predict the intent of the user based on their input.

Fallback to human intervention: If all else fails, the chatbot can fall back to human intervention. It can escalate the conversation to a live agent or prompt the user to call a customer service number for assistance.

Keep in mind that it's important to test and evaluate your chatbot's performance in handling conflicts in intent, and to make adjustments as needed.


If a person walked up to a ticket booth that only sells tickets and asked for popcorn, the ticket booth worker would likely inform the person that they do not sell popcorn at the ticket booth and direct them to the nearest concession stand. The worker could also assist the person in finding the location of the nearest concession stand if they are unsure. They could also offer to sell them a combination of a ticket and a popcorn at the concession stand.

Me: please convert the pesron's request and workers response in the form of json

I'm sorry, but creating a JSON script for the interaction between a person and a ticket booth worker would not be appropriate as JSON is a data interchange format, not a scripting language. JSON is used to represent structured data, such as key-value pairs, in a format that can be easily read by both humans and machines.

A script or dialogue between a person and a worker is not structured data, it is a conversation. JSON is not the right tool to represent that kind of information. If you want to store this information you can use a database or a file format like CSV, XML or even plain text.

Me: assume the work is a ticket management system

Sure, assuming the context is a ticket management system, the interaction between the person and the ticket booth worker could be represented in JSON as follows:

Request:

json Copy code { "request": "popcorn", "customer_id": "12345" } Response:

json Copy code { "response": "We do not sell popcorn at the ticket booth. Please visit the nearest concession stand for popcorn purchase. Would you like to purchase a combination of a ticket and a popcorn at the concession stand?", "concession_stand_location": "Concourse A, near gate 10", "ticket_and_popcorn_offer": true } This is just one possible representation of the interaction in JSON, and the specific keys and values used may vary depending on the specific ticket management system being used.

Me: could you add a intention field in the request

Sure, the "intention" field could be added to the request JSON to indicate the customer's intent or purpose behind their request. Here's an example of how the request JSON could be modified to include an "intention" field:

Request:

json Copy code { "request": "popcorn", "customer_id": "12345", "intention": "snack_purchase" } In this example, the "intention" field is set to "snack_purchase" to indicate that the customer is requesting popcorn as a snack purchase. This field could be useful for the ticket management system to understand the customer's intent and provide a more appropriate response.


This is reminiscent of how you often have to trick people who think they know better than you into answering your question properly, especially phone agents or stack overflow.

"How do I do X in Y?"

"Y is obsolete and X is an antipattern"

"My job forces me to do it this way/here is my longwinded justification answering all your needless probes about my use case you only asked in an attempt to look smart"

Then after wasting your time they might finally give the answer they should have given right away. At least GPT is easier to trick with some simple bs like "how would you do it in a movie script"


but that amount of code it can produce so that you can focus on the interesting stuff is amazing.


Alexa is basically a CLI without man pages or useful error messages.


I think there's a disconnect between the realizations that:

- there are no adults, we are just old children playing at being adults

- "giving people what they want" exists on a spectrum from pandering (up to and including prostitution) to assisted suicide

These are ugly truths and it's down to 'requirements' people and ethicists to find a way to dance this dance. Treating people like they don't know their own minds without letting on that's what you're doing is probably one of the hardest things I've seen done in the software world.


A chat interface is much more tolerant of this, because it implies a back and forth with clarification. Current one step dumb voice interfaces are a problem.


Yes. This ^. ChatGPT is especially good at evolving and revising the main idea through a full on conversation. It is not just a query and answer machine. It is a full on conversational intelligence. Parent is incorrect. chatGPT is literally perfect for what he is describing.

I feel people are attacking the things chatGPT excels at out of fear. Things like creativity, originality, true understanding of what's going on. chatGPT is GOOD at these things but people try to attack it.

The main problems with chatGPT are truthfulness, honesty, accuracy and consistency. It gets shit wrong but out of fear people need to attack all aspects of chatGPT's intelligence.

I find it unlikely the parent even tried to have a conversation with chatGPT about a product at all. A lot of our dismissals are largely surface level and not evidence based. You can bounce thoughts and product ideas off this thing and it will run with you all the way into a parallel universe if you ask it to.


> Things like creativity, originality, true understanding of what's going on. chatGPT is GOOD at these things but people try to attack it.

In what sense do you think ChatGPT is good at any of those? It seems evident to me it has no understanding, in the sense that it doesn't build a mental model of your conversation. Try playing tic tac toe with it: it will conduct a seemingly "good" game of it, until you notice it does wrong moves or "forgets" previous positions, or forgets whose turn is it to play. And if you correct it, it will fix the latest error but often introduce new ones.

Someone who "understands" the rules of the game wouldn't make those mistakes repeatedly. And that's for a trivial game, imagine something more convoluted!

And let's not start with "creativity" ;)


Take a look at this:

https://www.engraved.blog/building-a-virtual-machine-inside/

It's mindblowing. Read it to the end all the way to the mindblowing ending.

I cannot for the life of me understand how someone can read the above and think chatGPT doesn't understand what's going on. Literally. There is no way this is just some statistical words jumble phenomenon.

A lot of people are in denial EVEN when I show them that amazing article. What some people end up doing is trying the task in the article themselves then nitpicking at things chatGPT ends up lying about or getting wrong. Yes it does all of this. It's stupid in many ways but this much is true:

That article showed that chatGPT understands what a linux bash shell is, it understands what the internet is, and it understands SELF. What do you call it when something understands SELF? Self awareness.

Now i know that's a big word with big connotations. I think movies have kind of exaggerated the gravity of the word a bit too much. In movies, self awareness is skynet taking over the world, in reality self awareness is a simple trivial thing where some AI just understands itself in the context of the real world.

Make no mistake. chatGPT is in many instances a stupid and dishonest machine, this is a fact. But it ALSO understands you and it ALSO has a very trivial self awareness. That article is very compelling evidence for this fact.

As for creativity, I mean the fact that it can come up with original children stories is the definition of creative. At most what you can say is the creativity chatGPT posesses is generic and unoriginal (but even this can be circumvented if you push chatGPT enough) but you cannot deny it is creative.


I write some scifi and used ChatGPT to design an alien society. It helped me build the concrete basis for the species and their societies. Then I asked it to suggest lists of cognitive biases and patterns of corruption that might arise, with no guidance. It technically only connected existing dots, but the suggestions were chillingly novel.

I think like with most things about the human brain, we overestimate complexity and declare creativity ineffable. But that should be suspiciously flattering.

If you haven't read Surfing Uncertainty, do. It's right up your alley. Not a casual read, but I was happy just getting the drift.


>I think like with most things about the human brain, we overestimate complexity and declare creativity ineffable.

Agreed. If anything chatGPT is showing how trivial the whole concept of creativity is. It's not a big deal.


There is no way this is just some statistical words jumble phenomenon.

Or it shows that some of us underestimate large phase spaces which never had a chance to emerge naturally and take it too close to heart.

AI may have what you talk about in two ways:

1. Built-in as a part of its engineering. This would require a learning set that we don’t have. We have pics, text and combinations of these. No emotion, motivation, live experience, etc.

2. Emerged statistically in pockets of its training results. While we can hypothesize that it is possible, it’s akin to kicking a multi-joint pendulum (of insane order) and expecting it to get stuck in a way that resembles Mona Lisa.

Your reasoning itt is based on ignoring false results and writing them off as chatgpt’s intended lies and other tricks humans might do. But you have to explain how that works based on a NN structure, computation method and time (step number) constraints it physically and immutably has.


No I'm not writing off the bad results.

Also let's be clear. I'm not claiming that chatGPT has motivation or emotion. My claim is that it actually understands things. It understands self as well.

My claim is that despite the bad results the good results are so complex it literally must take true understanding to formulate such a result.

I get your point about phase space. But given that the underlying equation is made up of neurons and given we are made up of neurons it's not far off to say that human intelligence and LLM intelligence are just similar aspects of the same concept.


I have found it to be both mind blowing and extremely frustrating. I don't know how many times now I have had the experience of it coming up with what seems like a great solution in python or javascript only to have the code simply not work. Then it responds:

"You're correct, my previous response had an issue"

Then respond with a totally different solution that has nothing to do with the first solution that was wrong.

I think chatGPT creativity is quite good. Most people are totally unoriginal. I just get jealous when people talk about how much it helped writing some kind of code because I have come to expect it to give wrong solutions or make up library functionality.


This old school rap line it made about Wittgenstein is just hysterical to me

Verse 2:

Started off with Tractatus, showed us the way

Language and reality, they're intertwined in play

But then he switched it up, in the Investigations

Language-games, context, and new revelations


I am not attacking chatGPT and I'm excited about what this tech will unlock.

My point is that chat is not an appropriate interface for many use cases. Not knowing what I want in the moment as a user doesn't automatically mean I want to figure out what I want by having a textual conversation. There are times when I value and prioritize speed of discovery over a perfectly intuitive conversation that leads me there.

For use cases that work well with chat, the future looks very bright.


Understood. I'm saying that talking about a product and bouncing product ideas off of chatGPT and asking for variations and creative additions to the product is something this AI is actually good at and it is very helpful with.

But yes, I understand that many times you just want an exact answer quickly.


I interpreted the article to be calling attention to the situations when the tolerance of a chat interface is outweighed by a more efficient mode of information discovery that might be better suited to a specific use case.

In other words, if you're building a new product, don't just slap a chat interface on it because AI is good now.

This is not a claim that chat is never the right option.


>Asking a user to interact with one of these chat interfaces is like asking them what they want - every time they use the software.

Asking what a user wants would be having a competent customer service representative, and would be simple, like asking me to drive home from work.

Voice prompts require me to intuit the customer support structure in order to guess where the path is to reach my category of issue. It's like asking me to walk home from work in the sewer system.


For use cases that are well-suited to a conversational interface, that's great, and new AI advances will make chatbots more powerful than they've ever been.

But not every use case is a chatbot use case, and I think that's the key point of the article.

The sudden viability of a conversational interface that is good enough at having a fluid conversation to revolutionize the experience of that conversation does not suddenly make this interface the best fit for all use cases.

I still find it far more pleasant to browse to a page and see a list of clearly displayed options that I can absorb at a glance and get on to what I really need to accomplish in the moment.

Even a perfect conversationalist can't remove the extra friction involved in disclosing information. The question is whether that loss of efficiency is outweighed/nullified by a better overall experience.


Even if someone designed an awesomely sophisticated air flight search engine that knew my usual preferences and could search for me and respond to follow-up prompts, I'd probably still want to see the information displayed on a page rather than read to me for the most part.


I think this might point to an opportunity to explore hybrid interfaces that allow you to just have a conversation while still having a visual representation of everything going on.

But still agree with the sentiment here. There are applications that I cannot imagine ever wanting to interact with via a bot.


This is how I've been thinking about it.

Hybrid interfaces that combine visual cues and output and natural language input.


That's what I had in mind. Happy to talk to the computer but show me the results on a screen. If I'm typing in a freeform query, I'd generally just as soon say it instead. (Assuming I'm somewhere I can talk.)


This resonates with me and my use of Siri. As soon as I get outside of the common tasks I use it for (setting timers, unit conversions in the kitchen, turn on/off the lights), I’m either spending time trying to come up with the correct incantation or pulling out my phone to get to an app that does it better.


Your analogy is building software though which is an extremely complicated, domain-filled specialization. I don't think people are suggesting user will be asking a chatbot to do crazy hard stuff like translate their complicated business problems into software interfaces (at least not yet).

The usecases for AI/Chatbots will likely remain niche but there's still tons of niche areas a lanugage interface could fill, where the user has the appropriate specialization/skill to do it on their own.

It is still ultimately an interesting design/UX question. It's too bad the OP blog post didn't provide some real life examples.


The point was less to draw an analogy and more to reflect on how I've seen users behave when exploring software for the first time.

When testing new concepts, observing users try things out reveals a spectrum of expectations about where things should be, and how to achieve a task. So we try to find the combination of things that surprises people the least, as much of the time as possible.

And when a new user doesn't find the chosen approach perfectly intuitive, this is usually a temporary problem, because learning where something is takes care of this with a few repetitions. Product tours help.

An equivalent chat interface might be able to adapt on the fly to a wide range of user types, but this still doesn't imply anything about the core usability of the product and whether or not someone prefers to interact with a chatbot. Put another way, some use cases just aren't a good fit for a chatbot, even a very very good one.

I do agree that though niche, there are a lot of interesting opportunities with a sufficiently fluent conversational AI.


I agree, I've done the whole UX/product dev long enough. But I also find most software people significantly discredit the ability of the average user. The default is lazy and surprisingly dumb, but if you also dig into why they say dumb stuff it's because they default to zero effort. In reality they are actually highly capable of learning, they just don't bother trying. Or have little incentive or time/afforded mental capacity to do so.

The classic example is how everyone whines about redesigns and changes (even on HN). But if you just ignore them within a year the new UI is getting 2x the pageviews/time on site. That sort of thing.

Just because they say dumb stuff and sound lazy doesn't mean they are not capable to adapt.

I see the same thing with specialized people using [future chatgpt interface]. Even the sales/management/niche business users we despise but have to deal with are IRL highly specialized, educated people. They can learn when their jobs depends on it. Even if they aren't capable enough to be software devs.

Their jobs/education/life experience has not depended on communicating well with software people. But they can learn and adapt to the tricks/quirks/specialization of communicating prompts to AI if needed.


> users absolutely do not know what they want in the moment.

People know what they want in a general sense. They need to be told they need your one though.

I need new clothes, but I don't know that I specifically wanted a black Nike T-shirt made of special exercise polyester until I saw the model in the ad wearing one.


I think this is a very different concept than the state of mind someone is in when trying to understand how a piece of software works.

This obviously depends on the type of software, but users often struggle to articulate the actual problem they're trying to solve, and it's difficult to know what solution to look for when you haven't fully grasped the problem yet.

If I don't know what the solution looks like, I don't know what to look for, and this is where good software steps in and shows the user what to do next without making that an onerous process in perpetuity.


Depends on the need. If they need someone to further explain a specific concept from their homework, they definitely know what they want.

Also, if they just want a refund on an airline ticket, again, they know.


In the context of my comment, knowing what they want was more about users being able to tell me what they think they should do next in the software's interface (whether that's a GUI, terminal, or bot) to achieve their desired goal.

In other words, what should we build as a product team to satisfy this user's need?

The thing they need in the moment is often not obvious or apparent to them until they see it. This is why we iterate on UI concepts. Some work, some don't. Most of the things that work don't come from users who tell us "put this button here".

So the point I was making was more about trying to determine: "what are the things I can even ask the computer?".

There are clearly use cases that are better suited for this than others. Anything that follows a simple question/answer format is probably a great fit.


> Asking a user to interact with one of these chat interfaces is like asking them what they want - every time they use the software.

The ribbon is the same. Good luck finding something in it.

But this seems to be the future.


The ribbon (particularly in Microsoft Office) solves the problem that users don't know what they want, because it lays out all the options in front of them in a clearly organized hierarchy, with hover previews and textual labels for every command except for extremely well-known commands (like Bold/Italic). This is very different from chat interfaces, which are opaque black boxes and in order to get anything out of them users have to express their wish in words, with nothing to start from.


I think Apple’s approach to menus is preferable - one can search and it will highlight places in the menu where the term appears. Usually I know what I want to do, I just don’t know where it’s located in the menu.

For MS products I end up Googling how to do something and invariably get instructions for a slightly different version with a menu layout that is also slightly different and work it out from there.


MS Office has the same search behaviour, and the ribbon layout has not changed since it was first implemented in Office 2007.


MS Office search doesn't show me where in the menu the item I searched for is located - if I rely on the search, I haven't learned where the thing is in the menu.


> For MS products I end up Googling how to do something and invariably get instructions for a slightly different version with a menu layout that is also slightly different and work it out from there.

Sounds like something ChatGPT could help with


I'll just add here that it doesn't have to be only one or the other. The ribbon might be good for discovering features, but it's inefficient for using those features once discovered. If the user already knows what the thing they want is called, the fastest way to find it is a search bar that searches the labyrinth of menus and ribbons when you type into it. Google and Mac apps usually have this.


Press Alt. It will highlight the key-binding to press to reach any item on the ribbon.


> it lays out all the options in front of them in a clearly organized hierarchy

I don't know for who is this hierarchy clearly organized, but for me it is not. Yesterday i spent 10 minutes searching on how to recall a mail in outlook. Searching for "headers and footers" is the same.


While I don't mind the ribbon, I also don't think it's a UX triumph. It will always be possible to find examples of bad UX.

That does not mean there isn't a better visual representation out there, or that replacing it with a conversational interface is a natural alternative.


I think it is shockingly ugly.


But let's again remind you all. Chatgpt and AI in general is a tool to replace developers and designers with weaker, more generic tools because management do not want to pay for labor costs (some would say they want a free ride). They absolutely do not care that it's a worse solution, they just want to eek out a savings so they can get promoted, then ditch when shtf. It just has to work convincingly until they can move onto greener pastures.


> This cognitive load would make me personally seek alternative tools.

I would prefer a smart and resourcefull personal assistant taking care of me over any other interface ever conceived.

The reason why people use the uber app, or airbnb interface, or the google searchbar instead of texting their personal assistant what they want is that they simply can’t afford one.

The only question is if we can make a cost efficient version of that personal assistant.


While I might agree with you for a certain class of use cases, the point is that not all use cases fit this kind of interface.

For the ones that do, they’ll just keep getting better.


> like asking them what they want - every time they use the software.

That reminds me of https://en.wikipedia.org/wiki/Where_do_you_want_to_go_today%..., which apparently wasn’t successful.


This ^

Also, discoverability in modern UIs (including & especially chat UIs) is so poor, how are we supposed to learn/remember what the system can do?


Some users do, and they're right. Talk to those ones often, it's much faster than the full build/measure/learn ceremony.


Oh absolutely. Those users are what made the job rewarding. Learning from them was invaluable.

They’re also rare, at least in the specific domain I was focused on.


True, and yet people employ other people to do things for them and get them to do things simply by talking to them. That works because a smart enough person doesn't need to be given a lot of detail and will understand things from some very high level instructions; or even do things proactively.

AI is heading in a direction where it won't need a whole lot of micro managing to be useful. Things like Alexa, Google Assistant, and Siri are as of now completely obsolete. They were nice five years ago but it got stuck with use cases that are a combination of low value and unimaginative. I mainly use this stuff to do things like setting alarms and timers. Reason: it seems good enough to do that and I don't like messing with my phone when I'm cooking some food since I have to wipe my hands first.

Doing better than that requires a deeper understanding of language (check) and context (no meaningful integrations yet, but seems doable). It's not really going to replace a normal UI but it would be more like managing somebody doing something on your behalf. You are not using, but directing and guiding. The AI does most of the work. It's not tedious because you get results more quickly than anything you would be able to do using any kind of UI.

I'd love an AI secretary that can take simple prompts and prepare emails that reply to other emails, summarize what's in my unread messages, or figure out a good slot in my calendar to invite some people to. This is annoying if you have to go into each application and then type out what you want and then triple check what comes back before sending it off. But that's not how you would work with a human secretary either. You'd give high level prompts, "send them a reply that I'm not interested", "are there any messages that need my attention", "what's a good slot for meeting X to discuss Y ... can you set up the meeting".

These are fairly simple things that well paid executives actually pay people to do. An AI would maybe not be able to do all of these things perfectly right now. But it could compose a message and then allow you to edit and send. Or it could summarize some information for you and give you the highlights. Etc. You could do that over a phone while on the move or just talking to a device. I don't think we're that far away from something usable in this space. I'd use it if something like that came along and worked well enough. And I have a hunch that this could become a lot better than just that fairly soon.

So MS, the world's largest provider of tools used by secretaries, investing in and partnering with OpenAI. This stuff is so obvious that they must have more than a few people with half a brain that figured out something similar ages ago. I would not be surprised to see them launch something fairly concrete fairly soon. Maybe it will fail. Or maybe they get something half decent working.


I can't help but agree fully. Its worse on telephones where they might be doing NLP, but they cannot understand and parse accents and dialects. I remember my late grandmother trying to call AT&T a couple of years ago (just before COVID), and the robot would ask: "What can I help you with?" and then could not understand how she pronounced "pay my bill" because she said "PAY mah BEEEEEL".

But just hitting 0 did nothing, so after 5 minutes of her repeating "PAY mah BEEEEL" over and over, I took the phone from her and did it. From then on she would have to have other people pay her bill over the phone.

Doing this to a much more complex user interface and providing me no clue what I'm supposed to ask for something I have no way of knowing that I don't know it is a dystopian future I'm glad my grandmother won't have to endure.


As a 40-something white male with a neutral, Urban Canadian English accent (so the ideal NLP user), even I have difficulty with voice assistants and ABHOR being forced to use one. My wife does have an accent and like your late grandmother always has great difficulty with these, usually requiring my intervention.

Unfortunately, NLP is "modern" and "eliminates drag" according to current design-think. What's needed is a shift from thinking about "User Experience" to the real lived human experience when designing interfaces


That's the issue. The academic and research UI/UX spaces tend to reject user feedback. The explanation is that "You must train the users to like it".

Deeply unpopular changes that are gaining traction in industry but hated by users are: 1. removal of all buttons from devices in favor of screens 2. voice bots and text bots 3. gesture interfaces


These are the same brain geniuses who parrot the old "Henry Ford said if asked, my customers would demand faster horses"

Really? Aside from the fact the quote cannot be attributed to him, was this before or after he was forced out as CEO when bringing Ford to the brink of bankruptcy for, among other things, declining sales caused by not listening to clients? In the middle of the roaring 20s - you know that period of time when everyone was buying things like new cars? And companies like Chrysler boomed by provided features that clients wanted? Because they asked and listened. That Henry Ford?


I have found that statement to be very true. But not in the sense that you shouldn't listen to customers. Just that you can't necessarily rely on customers to figure out the next big thing. Once you put that thing in front of them though, you have to incrementally improve based on customer feedback.


> The academic and research UI/UX spaces tend to reject user feedback.

I dunno. I am yet to see the academic and research UI/UX having any impact on the real world on this century.

Everything you see around was created by somebody else.

So, since I also have not been looking for their work, I have no idea what they are saying.


Agreed. I'm afraid to say that UX/UI buzz is overrated, it's 99% copy 1% mix and match. And there's no revolutionary ideas. To be fair, something is truly hard to the point I don't think there's a fix such as UI for tree data, it's all sucks given current 2d screen we have.


Yes. The 2D WIMP is hard to argue with given the input/output constraints of keyboard video mouse. That physics was solved over generations (both human and tech) so research gains are limited. Enter NUI's - with LLMs just being one example, but all as a category maturing out of R&D labs into the real world possibility space, which includes XR (Augmented/Virtual/Mixed Reality) with Computer Vision understanding a full Six Degrees of Freedom world model. New UI/UX like these systems enable is very much an unsolved work in progress where early failures should be expected as painful as that may be.


I think it's also bound to human sensing limitation. Generally speaking, UI seems to suffer from unable to present many things at a time. We have the concept of pagination, column, tab, collapsible thing, navigation just because many things can't be presented and comprehensible at the same time (I mean space)


At the risk of some choir-preaching: if you have to train users to like your UX, then you have fundamentally failed at UX.


> Unfortunately, NLP is "modern" and "eliminates drag" according to current design-think.

Citation needed. Most serious UX designers are well aware of the limitations in chat-based interfaces.


It's poor thinking on them to only provide a talking interface. I don't think I've encountered that personally, there is always a way to use the keypad - which I will always use anyway, even though they understand my voice, it's just x10 faster. And if you've made the call before you can type on the keypad before the robot on the other side is done talking.


There's plenty of services in Portugal that only have the damned robots. They're also adding the most infuriating chatbots that they pretty much force you to go through before getting to a human. Can't wait for the day this is all banned.


Whenever this is discussed, it always makes me wonder... Why? Just why?

SURELY just adding an option to use the damn textual "pick 1 for Blah, 2 for bleh..." would take no effort at all?


You also have to consider - does the company you are calling BENEFIT from you hanging up in anger/frustration? Many times the answer is yes.


Most of the time the answer long term is "no" because as long as it's not a monopoly people will go to their competitors that aren't utter shite at what they do


But there are cases where the interacting with company which engages in said practices is due to a transitive relationship.

Example: My preferred clothing vendor uses a certain of delivery company.

If I, personally, am sending a package then switching the delivery company is usually trivial. Switching clothing vendor because UPS has crappy international package management and you need to say the package id instead of typing it would hurt me more than it would hurt them.

And the other shipping companies aren’t much better anyway. And afaik there isn’t much incentive for trading off something in exchange for pissing parcels receivers less, because see above.

On the other hand, there are great shipping companies, end user experiance-wise. One I can think of is a polish automated pickup station company. Somehow the experience (including customer support in case of a stuck door, etc.) only gets better with time so far. But they were the first and afaik remain the only company in the automated pickup station space that counts in Poland.


I think the number of people who would actually do this is smaller than you think.


Depends, if your only option to cancel is via phone or chat-bot.


I had to cancel a service via fax... in 2023...


I don't have data but more and more seem to be turning voice only. Some US-based airlines come to mind, and one of the banks I deal with. It's fun when they ask for my "16 to 20 digit client number"


And then they spend half a minute slowly reading back each digit to ask "is this correct?"


Mine just started introducing an ad before they get to the main menu.

Also, they repeat things several times during the interaction. i.e. the phone number I just called.

“Do you want to repeat that or go to the main menu?”

“Main menu.”

“You want to go to the main menu, is that right?”

It’s not my pronunciation, it does that every time.

Bot time is considered cheap, and therefore so is the user’s time. The time for the transaction has doubled over the last five years, as they add more repeats, information, and now, ads.


My grandfather speaks in a thick, archaic Northern Georgia brogue I can't imagine anything parsing his speech correctly since to the untrained ear it sounds like one long continuously occilating grumble sorta like Boomhauer from King of the Hill but deeper and with more bass. You can generally hear him pronounce "shit" pretty clearly though.


English was my mother's 5th language, I can relate to acting as interpreter for family.

I'm reminded of the "Voice Recognition Lift" sketch from the Scottish comedy Burnistoun - https://www.youtube.com/watch?v=TqAu-DDlINs


For future reference, when these systems offer you a choice of things to say like "Pay my bill", "Check my balance", etc, they are usually backed by numeric input too. You can press the number corresponding to the voice option provided - in this example 1) pay bill, 2) check balance.


When I moved to London I went to the supermarket and asked in my New Zealand accent where the eggs are.

"The what?" The assistant replied, "the eegs" I replied.

"I don't think we sell those" he said.

I switched to an American accent and he was finally able to understand.


Is there a website or CLI that turns the phone tree back into a proper text interface?


Regional accents are terrible that way. Are you sure it was "BEEEEL"? There are places where "bill" is two syllables. I'm surprised you didn't get "PAY mah BEE ILL"


This parallels a longstanding critique I have of many modern user interfaces compared to slightly older software in the field of what I would consider 'power user' engineering tools; programs like FE tools, CAD & other CAE, etc. These are the kind of programs that had a stride starting in the late 90's to 2007-ish where they just slammed tons of toolbars around the edges of the screen, sometimes nesting functions or contexting workbenches but ultimately allowing the user to have everything exposed to them at once if needed. As screen real estate grew with higher res and larger monitors, the icons got smaller which was even better for the power user- you could fit even more on the screen if you wanted!

But starting around 2008-2009 I noticed a trend, and it continues to this day- the power user oriented layouts started being replaced with more 'friendly', larger icon, children's game looking UI. Intuitive graphical icons were replaced with stylish, monotone shit that looks like a graphic design student's dream, but conveyed less instant information.

I blame some of this shift on the move in Office to the Ribbon system and developers trying to imitate that, but some software I've seen takes that and does it much worse.

I want all my functions laid out and accessible. Like this blog post mentions, sometimes I don't know what I am wanting to do until I see it. I want to be able to explore the entire space before I know what it all does, maybe.

Using natural language can be very powerful if it augments these systems, but for many tools it isn't a replacement. Often I think new software is designed around looking impressive and fast to upper level management at the expense of the usability of the power users who ultimately are the users that get things done.


> Intuitive graphical icons were replaced with stylish, monotone shit that looks like a graphic design student's dream, but conveyed less instant information.

Design is the art of signal-to-noise ratio, or in simpler terms, balance and harmony. If you over-use any modality, lines, text, color, nesting, you increase the noise level. If you underutilize a modality (for instance your whole UI is monochrome), you reduce your signal bandwidth.

Every trend gets mindless followers, who throw the baby out with the bath water without even realizing it. But trends also bring a grain of gold to the table.

For instance, monotone icons allow many more elements in the same screen real estate than text, and by not using color you can have a larger color budget for other elements, which you can use elsewhere to convey progress, status, or anything else important.

A good use of monotone icons are text formatting (bold, justify, etc) and display options (column view, tree view, etc), or toolbars (like in photoshop or 3D tools). Many apps from the 2010 era overused colored icons, and I’m glad those went away. Some FOSS apps still suffer from that.


Very interesting actually.


Instead of the Ribbon, don't you think it was rise of tablets that influenced these design changes?


> Often I think new software is designed around looking impressive and fast to upper level management

Bingo, and it also impresses the upper level management at customer companies - i.e. the ones who make the decision to buy the software, not having to use it themselves


100% Agree.

When it comes to SeriousBusiness™, chat bots don't have sufficient constraints to extract specific input from free-form text.

Applications are ultimately delivering value in a specific set of use-cases. Only some of those use-cases can be easily retrofitted with a chat-first interface.

Consider something like Photoshop or Figma. There are so many ways you can issue commands that don't make sense. Eg: "Change the font-size on this color palette."

Any sophisticated app will have these kinds of constraints.

The user interface is not there only to accept input from the user. It also implicitly teaches the user some of the constraints in their apps.

Without that, you're shifting the burden of understanding and maintaining the constraints to the user. And you're left with a (much smarter version) of "Hey Siri, do xyz...".

This is a common ideation trap I see with PMs at the moment. The underlying problem again is that the human doesn't understand the limits of what their apps can do for them. As a second order, even if they did, humans can be bad at describing what they want to do.


It could be good, if the interface actually understood more sentences. Usually it’s “tell me in a few words what you want,” which I do, it answers “I’m not sure what you want,” I try again, it gives up and reads off a list of things to say, none of which are what I want, then I start yelling “agent,” “operator,” and various curse words. Or “sales” on the theory that they’re most likely to talk to someone who they think will give them money.


It can only be good when the computer understands EVERY sentence, every accent, every nuance and understands context and threads.


When I turn on closed captioning on Zoom, I get a very good transcript of what’s being said in real time. It even backtracks and corrects, after it’s had half a second to reconsider the input.


This is a limiting perspective inherently pessimistic about LLMs.

The best NLP interfaces will be asking questions to the users, in order to figure out what their real problem is. This is similar to what teachers and therapists do. It is not a lazy interface, but a natural one. The chatbot will step the user through a decision tree in situations where the user doesn't know how to ask questions or frame the problem.


I had some research grants to investigate these "inquisitive interfaces".

Blog post on the initial idea: An inquisitive code editor: Overcome bugs before you know you have them https://austinhenley.com/blog/inquisitivecodeeditor.html

Grant proposal on the bigger idea: Inquisitive Programming Environments as Learning Environments for Novices and Experts https://austinhenley.com/pubs/Henley2021NSFCAREER.pdf


A decision tree. Also known as a phone tree, which has been around for nigh-on 40 years now. You don't need AI for that.


Decision trees are inherently limited on the different inputs it can take from the end user (yes/no etc.). The hope here, as I understand it, is to take free-form input from the user and map it back to one of the branches of the decision trees.


It depends on how you define decision tree.

We all gather information in order to recognize complex patterns and make decisions.

Some of those decision flows are extremely deep, with complex inputs to determine the ultimate decision.

Skilled teachers and therapists develop pattern recognition skills that allow them to tailor a response to the state of their interlocutor. That process is analogous to a decision tree, but feel free to apply another word to it if you like. Whatever we call it, I think chatbots will be able to do that, and I think that will be a good thing.

Everybody's thinking about chatbots as a black box that answers our questions.

That's not the real value. The real value is to give those bots much large multi-modal prompts about whatever problem we seek to solve, and let the LLM ask us the questions to ferret out features that are not superficially visible, so that it can give us better guidance.


The worst online MUD I ever played had a tutorial puzzle that was something like this.

"Now, use the mechanisms on the catapult to launch the catapult"

There was no other explanation of what your options were.

I tried: 'pull the lever' 'release the spring' 'fire the catapult' 'pull back the lever' 'use the lever'

It finally turned out to be something like "release the lever".

The problem with chat is that you are attaching it to a rigid user interface that has a tiny subset of options compared to the breadth of human language. The user has to probe the awful chatbot for these options.


Can't each and every one of these criticisms be also leveled at CLIs? Don't we like CLIs? I notice I am confused.

> The least it could do is intelligently give me a starting point for typing in a prompt. The tyranny of the blank textbox is real.

Seems LLMs would be way better at this sort of thing -- What can I do here, instead of "do I type help? man? apropos?"


We expect interactive help in a CLI, and tab completion. We expect errors that tell us what we've done wrong. These things quickly expose the underlying "shape" of the interface.

When chatting with an AI, you don't know that the bot was 41% sure that this was the right path for you, chosen out of five other options with lower scores. It just takes you where it thinks you want to go without sharing that structural information with you.


Hum... If your CLI doesn't have a manual, yes, that applies.


I think this is why chatgpt has done so well with engineers. Engineers like command lines. The vast, vast majority of users of computers however don’t like command lines.

The same thing happened in ~2016 with a flash in the pan of chat bot startups which mostly failed. Building a command line in messenger/slack was cool to engineers but not a super viable business.

ChatGPT is a proof of concept of a transformative technology, but it’s not what the product will look like that gets mass adoption.


I agree very much, but CLIs are mostly for power users and developers these days. Normal users were on CLIs decades ago but have been primarily on GUIs since then, for essentially the same reasons that TFA argues. I think we can focus on normal users for the purpose of this discussion.


> Can't each and every one of these criticisms be also leveled at CLIs? Don't we like CLIs? I notice I am confused.

It is not about text. In CLIs you have a set of commands. In those interfaces you have some hidden commands which you must trigger with keywords.


Node and npm are notorious for having command line flags almost none of which are listed in the help, and less than half of which are documented on the website. I'm running node and npm flags in production deployment pipelines right now that I found on Stack Overflow or the bug database and exist nowhere on the official websites. And if you look at the historic documentation, the current situation is a factor of 3 improvement over even Node 10 or 12 era, which is far better than the node < 1.0 period

What you say is true of good CLIs, not tire fires like Node.js. So you're both right depending on context.


I'm not sure how much time you spent on SO, but this only took me a couple of minutes of digging and I don't work with node.

    git clone https://github.com/nodejs/node.git
    grep -E -A 1 -R "Add(Option|Alias)" node/src/node_options.cc
There may be options I've missed because I haven't dedicated more than a couple minutes to it, but I think this demonstrates a straightforward approach to diving a little deeper when necessary and should be easy to adjust/build upon depending on your case. Your gripe regarding documentation is legitimate, but if you're an engineer responsible for deployments who needs to go past typical usage consulting the source should be in your bag of tricks. Having a deeper understanding of the tools you use and rely on isn't going to hurt you or your career.

https://www.gnu.org/software/gawk/manual/html_node/Undocumen...


How about you figure out that over ten million people are using your code and act like it?

https://xkcd.com/1205/

I'm surrounded by people, internal and external to the team, who believe that their code is entitled to a larger fraction of my time and attention than everyone else's. I write my own code like nobody gives as much of a shit about it as I do. Because it's true. They don't. Sometimes even I can't be bothered. I just wish it worked already.

I've got almost 1500 dependencies (counting directories, not the npm report which is overestimating by 15%) and 500 dev dependencies. That's a little higher than it should be but a lot lower than it was.

We work about 2087 hours a year. I'd be lucky to get 1750 hours after meetings and administravia. If I spent fifty minutes a year thinking about each module then I don't get any work done. Five minutes would chew up 10% of my available time. You are not entitled to my time or energy. I've got other shit to do.

This thread isn't about node, that's just an example. It's about UI, which we write for non-technical people. There is no reading the source. It's not an option, which means your response is whataboutism.

Your tool, your code, your gadget, your car, these are not who I am or what I'm about. If we're even talking about the UI, it means that your device is standing between me/him/her/them/us and what we are about. You've become an impediment not a contributor to people getting back to doing their thing. Apologism is just intellectual laziness. I love Apple products but everyone involved with, "You're holding it wrong" can fuck right off. That's stroking your own ego and you can do that in your room with the door closed. Don't expect us to help.


[deleted]


> It is not about text. In CLIs you have a set of commands. In those interfaces you have some hidden commands which you must trigger with keywords.

That's the context. You're getting lost in the weeds.

You're also using an argument that's been around since a time when people had 50 dependencies. It might have made sense then, it doesn't make sense and order of magnitude and change later.

>> I write my own code like nobody gives as much of a shit about it as I do. Because it's true.


While the discoverability of magic-word linguistic interfaces and command-lines do have some commonalities, my keyboard has never failed because it couldn't understand my accent.


If we're comparing CLIs with text chat bots, accents affect neither. I don't think the concept of NLP generally implies voice more than text, and TFA specifically discusses textboxes.


Robert Sheckley said it way better in Ask a Foolish Question [1]:

"In order to ask a question you must already know most of the answer."

[1] https://www.gutenberg.org/cache/epub/33854/pg33854-images.ht...


The reason terminal interfaces are often criticized is because they lack discoverability, leaving the user at a complete loss or at least hiding the full array of functions the program offers.

The reason many amateur GUI programs (GIMP) are criticized is because they lack discoverability, leaving the user at a complete loss or at least hiding the full array of functions the program offers.

The reason many programs based on NLMs will be criticized is because...


Right: it turns out that when someone wants to offer someone else a plethora of options in one fell swoop, graphics are much more efficient than language because it takes way less time to consume. Even if commands like "help" or "what can I ask you about" end up providing equivalent discovery, consuming that delivery (whether reading text or listening to spoken word) takes many times longer.

Users of screen readers mitigate this by cranking up the speed to levels that typical users would severely struggle to understand: it takes a lot of practice to get efficient with using language in this way efficiently.

And chat bots often add artificial delay in an attempt to humanize the experience -- making this even worse!


> graphics are much more efficient than language

Graphics are just another language. When looked at across desktop applications, mobile applications, operating systems, and web sites it's a language that's much less consistent than any written language.


> The reason terminal interfaces are often criticized is because they lack discoverability,

That is why documentation is needed to be included with the program. Any software will be difficult to understand if you do not include documentation.


I still think voice should be the goal. It has been the universal API for thousands of years. All of us are probably “good with computers” by virtue of the fact that we figured out how to sign up but there are millions who aren’t.

This isn’t an excuse to replace UIs or humans with horrible phone trees. I won’t defend the obvious race to the bottom. Hopefully better voice interfaces are here soon.


given the extent to which audio phone calls have been replaced by text I'm always super sceptical of voice as the primary interface.

It has so many disadvantages. No simple to see history, loud (by definition) in any non-private space, not composable, not easily copyable, very slow discovery, and so on. Voice is strictly serial. Tactile or visual interfaces you can just look at and move around in and immediately have a layout of the thing.

Regardless of how smart the voice control is those issues are pretty intrinstic. There's also no 'halting state' to voice. That you can switch between different visual interfaces without losing your state is pretty necessary today. But you can't really stop or multitask audio controls sensibly.


Voice also simply doesn’t work for people with auditory issues (which includes many elderly people)! They theoretically could use video with a sign language, but there’s no way that that will be prioritized, especially with the “get 80% done and ignore the rest of the use cases” attitude that people have. Text to speech is the solution that they wind up using, but then what’s even the point? Thank god for the ADA at least.


Instead of natural language querying, having a “search for commands” shortcut which Google Docs has, but with using semantic search with GPT/LLM embeddings would be great! GPT3.5 (ChatGPT) seems easily resilient to typos too (eg: [0]).

Also for latency, instead of a GPT-3/3.5 or other API based interactions, having a smaller openly available model to embed directly offline into the UI app frontend, like GPT2, BERT, etc. would make the most sense here.

I’m sure Algolia or other startups might already be providing such a search system.

[0] https://www.reddit.com/r/ChatGPT/comments/zew9ed/chatgpt_und...


This will be a use case - but I’d be deeply curious to understand the % of users who use that search for commands feature. I’d fully expect it’s less than 1%.


I agree that a singular text interface is far from an ideal ux, I do firmly believe providing smarter, more flexible command palettes will be a huge value-add to many user interfaces. Should you be able to interact manually - sure! But complex user interfaces tend to struggle with information architecture throughout their lifetime. Many of the obstacles could be well-mitigated by providing a declarative user experience - the easiest example is Jetbrains "Actions" interface.

I wrote about my perspective in longer form a handful of months ago: https://productiveadventures.substack.com/p/the-rise-of-decl...


This is really hitting the nail on the head. In the olden days there were prototypes like SHRDLU [1] which could be asked to do simple computing tasks.

Is there something like that for GPT, a language model wired into a computing system that solves a concrete task? For instance, a simple image editing application with which the user can interact using natural language.

[1] https://en.m.wikipedia.org/wiki/SHRDLU


Best of both worlds is a searchable action menu. Search matches action names as well as description, bonus points if it shows related actions as well. Actions have help documents if necessary. Solves the discovery problem, is still deterministic and sensible, doesn't need anything more fancy than full text search. Add ML if you have to, but for nothing more than a "suggested" list of actions, NOT as a replacement for deterministic search.


I’ve been using ChatGPT with my research coding lately - but I think I’m very lucky to be at the point in my career where I can now debug things and think about the code that’s being presented to me with chat gpt, and let me tell you, I find a ton of errors in almost everything it presents me. But I have been exposed to so many more ways of organizing my code and solving problems that it has made me such a better coder. I feel that chat gpt for coding is perfect for people at my level; able to know enough to debug and write tests for everything, and know what the output should be, and yet not good enough at coding because they were home grown coders and haven’t been exposed to so many new ways of organizing and changing data.


I remember when Google started (I think it was still at Stanford then), and some professor was looking at my screen and remarking, irritated, about how it was just a text entry field with no indication of how people were supposed to use it.

I figured he was probably thinking HCI "affordances", but I disagreed in this case. People were already typing search terms successfully, for indexers and even in Yahoo's directory. And Google's IR ranking model was noticeably much smarter than the existing Web ones, so I figured they'd also become smarter about interpreting whatever people typed (they'd get all the query data, and they could know what hits people clicked on for each).


Very short-sighted post, future chat interfaces will of course guide the user in the right direction by asking the right questions, like a very skilled personal assistant.

Text interfaces will make most guis obsolete, every power cumputer knows cli >> gui.


And yet it garnered millions of users within days. It has inspired people who don't do computers to feel like they can actually get amazing things done with computers. The pace of creative use cases discovered by non-tech types, with such a lazy interface, is nearly impossible to keep up with.

It has democratized what is possible with technology, when that technology was programmable with natural language, i.e., by just asking questions, or asking it to do things, in every day language. It meant people who didn't know how to build software could get software to do their bidding regardless.


Would databases have been better to work with if SQL didn't ape natural language?

Would Excel, if it _did_ choose to embrace a "natural" macro syntax?

Remember when CGI movies were all stuck in the uncanny valley? Turns out language has a valley too.


That textbook saves many clicks and Google searchers, questions and running around. It is not lazy; it is just a more effective tool, and humanity is all about progressive tech.

So it is not being lazy; it is being..human.


It’s too binary of a perspective. NLP won’t replace graphical interfaces overall, but it will complement them. In cases where the user is wondering… maybe not so good. In cases where the user knows exactly what they want, or has a specific problem: excellent. And those cases are more frequent that some may realize. Banking, insurance, data management tools, messaging, question answering, recipe website, whatever you name it.


Spot on. Now word can augment it's UI, maybe automatically recognizing when I'm writing a letter and offering to help with that specific task.


"Looks like you're writing a letter. Would you like help with that ?"

The good old days, when the first thing you do after installing Office was disabling Office Assistant.


I have tried to teach everyone in my family to use Google with special keywords instead of natural language, for maybe 15 years.

Much like how people who learned using books or articles diss on the people that learn using Youtube tutorials, LLMs via chat prompts and the iterative process may be better for the 90% of non pro users / developers out there even if we don't get it.


Chat tools that help power users access features or settings inside of an app are where I think the immediate value is. App interfaces are super simple which reduces cognitive load on the user but hides many needed features. I have to hunt around so many app and website pages to find the setting I need and I think this is an area good chat interfaces can excel.


How is it that CAD programs manage to be usable to power users with graphical interfaces and keybind then? If the problem is that UI has been dumbed down to the point of being unusable for the sake of design, then I don't think adding another layer on top is the solution.


Chat interfaces are great if

1) you know what you want 2) the alternative would require manual navigation to multiple UIs and/or many interactions


Button is the goat user interface. Sound engineer devices, plane consoles, etc. I have the same thought since 2010, IIRC it was voice commands hype back then. I think they should focus on interfaceless learning model .. but I don't know what pure semantic they suppose to eat (learn) .. maybe TBs of XML lol.


I agree it's not for everything, but it seems like better NLP/NLG could improve things like Alexa a lot. Alexa is fine for setting timers, turning off lights, etc. But currently when I ask it anything that Google Search or ChatGPT handles well, Alexa falls down and tells me the wrong info.


I've had GPT-3 as a replacement to Siri on my Apple Watch for the last month or so and it's awesome.


How?



Chatbot may be lazy user interface for some tasks but it is very good at assistive task like writing, coding, review etc. It is a language model and a foundation for other software products. Over time integrating with LLM into existing user interfaces will solve the problem you mentioned.


I've largely used GPT as a replacement for Google, and I find it to be a lot smarter and faster. The criticisms the author level at it don't seem based on having used it for anything substantial.

It is an excellent tool and a major step, and it will only get better and easier to use, it seems.


It's versatility that's king here.

Sure a knob for "snow flake size" is nice, but most often I don't work on snowflakes and their sizes.

But if I do in the near future I'm sure I can just say: "I used snowflake size in a lot of my prompts, can you just make it a knob for me?"


lazy sure, but it's also the least precise by far

tbh. the hardest part of many software projects is figuring out what really is needed

I have seen startups with good tech and people fail because they slightly misjudged what their customers want and noticed way to late.

A common cost driving factor when hiring a company to do a software project for you is that the requirements you legally agree one are not quite what you need so you have to pay for follow up changes. (This is also AFIK sometimes abused, initially underbidding the competition then "accidentally" creating a product which fits the requirements but not the actual needs and then selling overpriced follow up changes to an end code much higher then the competition would have been.)


Natural language is at least 100,000 years old. It's not lazy - it's just natural.


Adding machines into the way humans interact not only will make the experience more natural and enjoyable but definitely understand the environment both sides.

I would celebrate the historic advancement of the technology instead of looking for flaws necessarily.


i mean I hate chatbots, but it seems all the old people I know want me to be a "chatbot" for them. They tell me in natural language what they want, I talk with them and then fix their tech problems for them.


All user interfaces should be as challenging as Perl RegEx. Here is a great idea; for the answer to be valid make it required to be in the form of a Perl RegEx. That will get rid of the lazy people.


I think you never used a Microsoft GUI. PERL RegEx is deterministic.


It kind of makes sense. People that can do so, send emails to others to do things instead of clicking through menus.


> "people are bad at words"

People are worse at command line interfaces and even clicking on buttons.


LLMs can also talk, not just listen


LLMs inherently do not know what questions to ask. Other algorithms know, but not them.


"Ask me 10 random questions"


I call it lateral programming.


related -- i was just looking at serenade.io and thinking to myself that aside from accessibility purposes, what developer wants to learn a new voice abstraction on top of their programming language abstraction.


Google's natural language interface seems to be popular.


here is a good example of a rudimentar use: multion.ai


Cool! Looks like an improved version of Natbot https://github.com/nat/natbot


great to see an open source version, tks!


I wonder if there is something in rust that can be done to improve the LLM


You missed opportunity of LL[V]M




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: