Hacker News new | past | comments | ask | show | jobs | submit login
GPT-4o (openai.com)
3138 points by Lealen 6 months ago | hide | past | favorite | 2366 comments



The most impressive part is that the voice uses the right feelings and tonal language during the presentation. I'm not sure how much of that was that they had tested this over and over, but it is really hard to get that right so if they didn't fake it in some way I'd say that is revolutionary.


(I work at OpenAI.)

It's really how it works.


> (I work at OpenAI.)

Winner of the 'understatement of the week' award (and it's only Monday).

Also top contender in the 'technically correct' category.


> Winner of the 'understatement of the week' award (and it's only Monday).

Yes! As soon as I saw gdb I was like "that can't be Greg", but sure enough, that's him.


and was briefly untrue for like 2 days


[flagged]


Bro what?


How far are we away from something like a helmet with chat GPT and a video camera installed, I imagine this will be awesome for low vision people. Imagine having a guide tell you how to walk to the grocery store, and help you grocery shop without an assistant. Of course you have tons of liability issues here, but this is very impressive


We're planning on getting a phone-carrying lanyard and she will just carry her phone around her neck with Be My Eyes^0 looking out the rear camera, pointed outward. She's DeafBlind, so it'll be bluetoothed to her hearing aids, and she can interact with the world through the conversational AI.

I helped her access the video from the presentation, and it brought her to tears. Now, she can play guitar, and the AI and her can write songs and sing them together.

This is a big day in the lives of a lot of people whom aren't normally part of the conversation. As of today, they are.

0: https://www.bemyeyes.com/


That's definitely cool!

Eventually it would be better for these models to run locally from a security point if view, but this is a great first step.


Absolutely. We're looking forward to Apple's announcements at WWDC this year, which analysts predict are right up that alley.


It sounds like the system that Marshall Brain envisioned in his novella, Manna.


That story has always been completely reasonable and plausible to me. Incredible foresight. I guess I should start a midlevel management voice automation company.


Definitely heading there: https://marshallbrain.com/manna "With half of the jobs eliminated by robots, what happens to all the people who are out of work? The book Manna explores the possibilities and shows two contrasting outcomes, one filled with great hope and the other filled with misery."

And here are some ideas I put together around 2010 on how to deal with the socio-economic fallout from AI and other advanced technology: https://pdfernhout.net/beyond-a-jobless-recovery-knol.html "This article explores the issue of a "Jobless Recovery" mainly from a heterodox economic perspective. It emphasizes the implications of ideas by Marshall Brain and others that improvements in robotics, automation, design, and voluntary social networks are fundamentally changing the structure of the economic landscape. It outlines towards the end four major alternatives to mainstream economic practice (a basic income, a gift economy, stronger local subsistence economies, and resource-based planning). These alternatives could be used in combination to address what, even as far back as 1964, has been described as a breaking "income-through-jobs link". This link between jobs and income is breaking because of the declining value of most paid human labor relative to capital investments in automation and better design. Or, as is now the case, the value of paid human labor like at some newspapers or universities is also declining relative to the output of voluntary social networks such as for digital content production (like represented by this document). It is suggested that we will need to fundamentally reevaluate our economic theories and practices to adjust to these new realities emerging from exponential trends in technology and society."

And a related YouTube video: "The Richest Man in the World: A parable about structural unemployment and a basic income" https://www.youtube.com/watch?v=p14bAe6AzhA "A parable about robotics, abundance, technological change, unemployment, happiness, and a basic income."

My sig is about the deeper issue here though: "The biggest challenge of the 21st century is the irony of technologies of abundance in the hands of those still thinking in terms of scarcity."


Your last quote also reminds me this may be true for everything else, especially our diets.

Technology has leapfrogged nature and our consumption patterns have not caught up to modern abundance. Scott Galloway recently mentioned this in his OMR speech and speculated that GLP1 drugs (which actually help addiction) will assist in bringing our biological impulses more inline with current reality.


Indeed, they are related. A 2006 book on eating healthier called "The Pleasure Trap: Mastering the Hidden Force that Undermines Health & Happiness" by Douglas J. Lisle and Alan Goldhamer helped me see that connection (so, actually going the other way at first). And a later book from 2010 called "Supernormal Stimuli: How Primal Urges Overran Their Evolutionary Purpose" by Deirdre Barrett also expanded that idea beyond food to media and gaming and more. The 2010 essay "The Acceleration of Addictiveness" by Paul Graham also explores those themes. In the 2007 book The Assault on Reason by Al Gore talks about watching television and the orienting response to sudden motion like scene changes.

In short, humans are adapted for a world with a scarcity of salt, refined carbs like sugar, fat, information, sudden motion, and more. But the world most humans live in now has an abundance of those things -- and our previously-adaptive evolved inclinations to stock up on salt/sugar/fat (especially when stressed) or to pay attention to the unusual (a cause of stress) are now working against our physical and mental health in this new environment. Thanks for the reference to a potential anti-addiction substance. Definitely something that deserves more research.

My sig -- informed by the writings of people like Mumford, Einstein, Fuller, Hogan, Le Guinn, Banks, Adams, Pet, and many others -- is making the leap to how that evolutionary-mismatch theme applies to our use of all sorts of technology.

Here is a deeper exploration of that in relation to militarism (and also commercial competition to some extent): https://pdfernhout.net/recognizing-irony-is-a-key-to-transce... "There is a fundamental mismatch between 21st century reality and 20th century security thinking. Those "security" agencies are using those tools of abundance, cooperation, and sharing mainly from a mindset of scarcity, competition, and secrecy. Given the power of 21st century technology as an amplifier (including as weapons of mass destruction), a scarcity-based approach to using such technology ultimately is just making us all insecure. Such powerful technologies of abundance, designed, organized, and used from a mindset of scarcity could well ironically doom us all whether through military robots, nukes, plagues, propaganda, or whatever else... Or alternatively, as Bucky Fuller and others have suggested, we could use such technologies to build a world that is abundant and secure for all. ... The big problem is that all these new war machines and the surrounding infrastructure are created with the tools of abundance. The irony is that these tools of abundance are being wielded by people still obsessed with fighting over scarcity. So, the scarcity-based political mindset driving the military uses the technologies of abundance to create artificial scarcity. That is a tremendously deep irony that remains so far unappreciated by the mainstream."

Conversely, reflecting on this more just now, are we are perhaps evolutionarily adapted to take for granted some things like social connections, being in natural green spaces, getting sunlight, getting enough sleep, or getting physical exercise? These are all things that are in increasingly short supply in the modern world for many people -- but which there may never have been much evolutionary pressure previously to seek out, since they were previously always available.

For example, in the past humans were pretty much always in face-to-face interactions with others of their tribe, so there was no big need to seek that out especially if it meant ignoring the next then-rare new shiny thing. Johann Hari and others write about this loss of regular human face-to-face connection as a major cause of depression.

Stephen Ilardi expands on that in his work, which brings together many of these themes and tries to help people address them to move to better health.

From: https://tlc.ku.edu/ "We were never designed for the sedentary, indoor, sleep-deprived, socially-isolated, fast-food-laden, frenetic pace of modern life. (Stephen Ilardi, PhD)"

GPT-4o, by apparently providing "her" movie-like engaging interactions with an AI avatar that seeks to please the user (while possibly exploiting them) is yet another example of our evolutionary tendencies potentially being used to our detriment. And when our social lives are filled-to-overflowing with "junk" social relationships with AIs, will most people have the inclinations to seek out other real humans if it involves doing perhaps increasingly-uncomfortable-from-disuse actions (like leaving the home or putting down the smartphone)? Not quite the same, but consider: https://en.wikipedia.org/wiki/Hikikomori

Related points by others:

"AI and Trust" https://www.schneier.com/blog/archives/2023/12/ai-and-trust.... "In this talk, I am going to make several arguments. One, that there are two different kinds of trust—interpersonal trust and social trust—and that we regularly confuse them. Two, that the confusion will increase with artificial intelligence. We will make a fundamental category error. We will think of AIs as friends when they’re really just services. Three, that the corporations controlling AI systems will take advantage of our confusion to take advantage of us. They will not be trustworthy. And four, that it is the role of government to create trust in society. And therefore, it is their role to create an environment for trustworthy AI. And that means regulation. Not regulating AI, but regulating the organizations that control and use AI."

"The Expanding Dark Forest and Generative AI - Maggie Appleton" https://youtu.be/VXkDaDDJjoA?t=2098 (in the section on the lack of human relationship potential when interacting with generated content)


I'm more into audiobook, and can't find an audio version for this one. Maybe GPT-4o could read it to me?


Can't wait for the moment when I can puta single line "Help me put this in the cart" on my product and magically sells better.


This Dutch book [1] by Gummbah has the text "Kooptip" imprinted on the cover, which would roughly translate to "Buying recommendation". It worked for me!

[1] https://www.amazon.com/Het-geheim-verdwenen-mysterie-Dutch/d...



Or tell the AI to optimize paper clip production as much as possible.


> Imagine having a guide tell you how to walk to the grocery store

I don't need to imagine that, I've had it for about 8 years. It's OK.

> help you grocery shop without an assistant

Isn't this something you learn as a child? Is that a thing we need automated?


OP specified they were imaging this for low vision people


I'm aware, I'm one of those people.


Does it give you voice instructions based on what it knows or is it actively watching the environment and telling you things like "light is red, car is coming"?


I assume it likes snacks, is quadrupedal, and does not have the proper mouth anatomy or diaphragm for human speech.


Just the ability to distinguish bills would be hugely helpful, although I suppose that's much less of a problem these days with credit cards and digital payment options.



city guide tours, not a bad take tbh :D rather than walkin behind the guy with a megaphone and a flag.


With this capability, how close are y'all to it being able to listen to my pronunciation of a new language (e.g. Italian) and given specific feedback about how to pronounce it like a local?

Seems like these would be similar.


It completely botched teaching someone to say “hello” in Chinese - it used the wrong tones and then incorrectly told them their pronunciation was good.


As for the Mandarin tones, the model might have mixed it up with the tones from a dialect like Cantonese. It’s interesting to discover how much difference a more specific prompt could make.


I don't know if my iOS app is using GPT-4o, but asking it to translate to Cantonese gives you gibberish. It gave me the correct characters, but the Jyutping was completely unrelated. Funny thing is that the model pronounced the incorrect Jyutping plus said the numbers (for the tones) out loud.


Not that different at all.


I think there is too much focus on tones in beginning Chinese. Yes, you should get them right, but no, you'll get better as long as you speak more, even if your tones are wrong at first. So rather than remember how to say fewer words with the right tones, you'll get farther if you can say more words with whatever tones you feel like applying. That "feeling" will just get better over time. Until then, you'll talk as good as a farmer coming in from the country side whose first language isn't mandarin.


I couldn’t disagree more. Everyone can understand some common tourist phrases without tones - and you will probably get a lot of positive feedback from Chinese people. It’s common to view a foreigner making an attempt at Mandarin (even a bad one) as a sign of respect.

But for conversation, you can’t speak Mandarin without using proper tones because you simply won’t be understood.


That really isn't true, or at least it isn't true with some practice. You don't have to consciously think about or learn tones, but you will eventually pick them anyways (tones are learned unconsciously via lots of practice trying to speak and be understood).

You can be perfectly understood if you don't speak broadcast Chinese. There are plenty of heavy accents to deal with anyways. Like Beijing 儿化 or the inability of southerners to pronounce sh very differently from s.


It was good of them to put in example failures.


[flagged]


In my experience, when someone says a project was programmed by "white men from the west coast", it was actually made by Chinese or Indian immigrants.

(Siri's original speech recognition was a combination of Swiss-Germans and people from Boston.)

And it certainly wouldn't be tested by them either way. Companies know how to hire QA contractors.


People always say tech workers are all white guys -- it's such a bizarre delusion, because if you've ever actually seen software engineers at most companies, a majority of them are not white. Not to mention that product/project managers, designers, and QA are all intimately involved in these projects, and in my experience those departments tend to have a much higher ratio of women.

Even beside that though -- it's patently ridiculous to suggest that these devices would perform worse with an Asian man who speaks fluent English and was born in California. Or a white woman from the Bay Area. Or a white man from Massachusetts.

You kind of have a point about tech being the product of the culture in which it was produced, but the needless exaggerated references to gender and race undermine it.


An interesting point, I tend to have better outcomes by using my heavily accented ESL English, than my native pronunciation of my mother tongue I'm guessing it's part of the tech work force being a bit more multicultural than initially thought, or it just being easier to test with

It's a shame, because that means I can use stuff that I can't recommend to people around me

Multilingual UX is an interesting painpoint, I had to change the language of my account to English so I could use some early Bard version, even though It was perfectly able to understand and answer in Spanish


You also get the synchronicity / four minute mile effect egging on other people to excel with specialized models, like Falcon or Qwen did in the wake of the original ChatGPT/Llama excitement.


What? Did it seriously work worse for women? Spurce?

(accents sure)


I don't think that'd work without a dedicated startup behind it.

The first (and imo the main) hurdle is not reproduction, but just learning to hear the correct sounds. If you don't speak Hindi and are a native English speaker, this [1] is a good example. You can only work on nailing those consonants when they become as distinct to your ear as cUp and cAp are in English.

We can get by by falling back to context (it's unlikely someone would ask for a "shit of paper"!), but it's impossible to confidently reproduce the sounds unless they are already completely distinct in our heads/ears.

That's because we think we hear things as they are, but it's an illusion. Cup/cap distinction is as subtle to an Eastern European as Hindi consonants or Mandarin tones are to English speakers, because the set of meaningful sounds distinctions differs between languages. Relearning the phonetic system requires dedicated work (minimal pairs is one option) and learning enough phonetics to have the vocabulary to discuss sounds as they are. It's not enough to just give feedback.

[1]: https://www.youtube.com/watch?v=-I7iUUp-cX8


> but it's impossible to confidently reproduce the sounds unless they are already completely distinct in our heads/ears

interestingly, i think this isn't always true -- i was able to coach my native-spanish-speaking wife to correctly pronounce "v" vs "b" (both are just "b" in spanish, or at least her dialect) before she could hear the difference; later on she was developed the ability to hear it.


I had a similar experience learning Mandarin as a native English speaker in my late 30s. I learned to pronounce the ü sound (which doesn't exist in English) by getting feedback and instruction from a teacher about what mouth shape to use. And then I just memorized which words used it. It was maybe a year later before I started to be able to actually hear it as a distinct sound rather than perceiving it as some other vowel.


After watching the demo, my question isn't about how close it is to helping me learn a language, but about how close it is to being me in another language.

Even styles of thought might be different in other languages, so I don't say that lightly... (stay strong, Sapir-Wharf, stay strong ;)


I was conversing with it in Hinglish (A combination of Hindi and English) which folks in Urban India use and it was pretty on point apart from some use of esoteric hindi words but i think with right prompting we can fix that.


In the "Point and learn Spanish" video, when shown an Apple and a Banana, the AI said they were a Manzana (Apple) and a Pantalón (Pants).


No, I just watched it closely and it definitely said un platano


I re watched it a few times to ensure it said plátano before posting, and it honestly doesn't sound like it to me.


I'm a Spaniard and to my ears it clearly sounds like "Es una manzana y un plátano".

What's strange to me is that, as far as I know, "plátano" is only commonly used in Spain, but the accent of the AI voice didn't sound like it's from Spain. It sounds more like an American who speaks Spanish as a second language, and those folks typically speak some Mexican dialect of Spanish.


> "plátano" is only commonly used in Spain

The wiktionary page for "plátano" has a map illustrating how various Spanish-speaking countries refer to the banana.

https://en.wiktionary.org/wiki/pl%C3%A1tano#/media/File:Porp...

My principal association with plátano is plaintain, personally, but I am not a Spanish speaker.


I was about to comment the same thing about the accent. Even to my gringo ears, it sounds like an American speaking Spanish.

Plátano is commonly used for banana in Mexico, just bought some at a Soriana this weekend.


Interesting, I was reading some comments from Japanese users and they said the Japanese voice sounds like a (very good N1 level) foreigner speaking Japanese.


I thought "plátano" is only used for plantains in Latin America, and Cavendish is typically called "banana" instead. I'm likely wrong, though.


At least IME, and there may be regional or other variations I’m missing, people in México tend to use “plátano” for bananas and “plátano macho” for plantains.


In Spain, it's like that. In Latin America, it was always "plátano," but in the last ten years, I've seen a new "global Latin American Spanish" emerging that uses "banana" for Cavendish, some Mexican slang, etc. I suspect it's because of YouTube and Twitch.


In Spain, plátano is used for Cavendish and plantains are rarely consumed. I am a Spaniard.


I'm from Colombia and mostly say "plátano".


Good to know. I thought Colombians said "banano". That's what a Colombian friend of mine says.


plátano is used in several Spanish-speaking countries, such as Mexico and Chile.


The italian output in the demo was really bad.


I'm a native Italian speaker, it wasn't too bad.


The content was correct but the pronunciation was awful. Now, good enough? For sure, but I would not be able to stand something talking like that all the time


Do you not have to work with non-native speakers of whatever language you use at work?


Most people don't, since you either speak with native speakers or you speak in English mostly, since in international teams you speak in English and not one of the native languages even if nobody speaks English natively. So it is rare to hear broken non-English.

And note that understanding broken language is a skill you have to train. If you aren't used to it then it is impossible to understand what they say. You might not have been in that situation if you are an English speaker since you are so used to broken English, but it happens a lot for others.


Why would you say "really bad"?


It doesn't have hands.


"I Have No Hands But I Must Scream" -Italian Ellison


This was the best joke I’ve heard this year.


So good!


Joke of the day right there :-)


Which video title is this?


Found it in a reel, I’m guessing it’s in the keynote: https://www.instagram.com/reel/C662vlHsGyx/

The Italian sounded good to me.


It sounds like a generic Eastern European who has learned some Italian. The girl in the clip did not sound native Italian either (or she has an accent that I have never heard in my life).


also wondering.


Shared in a reply to my comment.


This is damn near one of the most impressive things, can only imagine especially with live translation and voice synthesis (eleven labs style) you'd be capable of to integrate with something like teams (select each persons language and do realtime translation to each persons native language, with their own voice and intonations would NUTS)


There’s so much pent up collaborative human energy trapped behind language barriers.

Beautiful articulation.

This is an enormous win for humanity.


By humanity you mean Microsoft's shareholders right? Cause for regular people all this crap means is they have to deal with even more spam and scams everywhere they turn. You now have to be paranoid about even answering the phone with your real voice, lest the psychopaths on the other end record it and use it to fool a family member.

Yeah, real win for humanity, and not the psycho AI sycophants


Let AI answer the phone. I love it, can't wait. I hate answering phone but some people just won't email, they always call.


Random OpenAI question: While the GPT models have become ever cheaper, the price for the tts models have stayed in the $15/1Mio char range. I was hoping this would also become cheaper at some point. There're so many apps (e.g. language learning) that quickly become too expensive given these prices. With the GPT-4o voice (which sounds much better than the current TTS or TTS HD endpoint) I thought maybe the prices for TTS would go down. Sadly that hasn't happened. Is that something on the OpenAI agenda?


I've always been wondering what GPT models lack that makes them "query->response" only. I've always tried to get chatbots to lose the initially needed query, with no avail. What would It take to get a GPT model to freely generate tokens in a thought like pattern? I think when I'm alone without query from another human. Why can't they?


> What would It take to get a GPT model to freely generate tokens in a thought like pattern?

That’s fundamentally not how GPT models work, but you can easily build a framework around them that calls them in a loop; you’d need a special system prompt to get anything “thought like” that way, and if you want it to be anything other than stream-of-simulated-consciousness with no relevance to anything, and a non-empty “user” prompt each round, which could be as simple as time, a status update on something in the world, etc.


Monkeys who've trained since birth to use sign language, and can reply incredible questions, have the same issue. The researchers noticed they never once asked a question like "why is the sky blue?" or "why do you dress up". Zero initiating conversation, but they do reply when you ask what they want.

I suppose it would cost even more electricity to have ChatGPT musing alone though, burning through its nvidia cards...


Just provide empty queey and that’s it - it will generate tokens no prob.

You can use any open source model wirthout any promot whatsoever


I think this will be key in a logical proof that statistical generation can never lead to sentience; Penrose will be shown to be correct, at least regarding the computability of consciousness.

You could say, in a sense, that without a human mind to collapse the wave function, the superposition of data in a neural net's weights can never have any meaning.

Even when we build connections between these statistical systems to interact with each other in a way similar to contemplation, they still require a human-created nucleation point on which to root the generation of their ultimate chain of outputs.

I feel like the fact that these models contain so much data has gripped our hardwired obsession for novelty and clouds our perception of their actual capacity to do de novo creation, which I think will be shown to be nil.

An understanding of how LLMs function should probably make this intuitively clear. Even with infinite context and infinite ability to weigh conceptual relations, they would still sit lifeless for all time without some, any, initial input against which they can run their statistics.


It happens sometimes. Just the other day a local TinyLlama instance started asking me questions. The chat memory was full of mostly nonsense and it asked me a completely random and simple question out of the blue. Did chatbots evolve a lot since he was created.

I think you can get models to "think" if you give them a goal in the system prompt, a memory of previous thoughts, and keep invoking them with cron


You might not have a prompt from another human, but you're always receiving new input.


Yes, but that's the fundamental difference. Even if I closed my eyes, plugged my ears and nose and laid in a saltwater floating chamber, my brain will always generate new input / noise.

(GPT) Models toggle between a state of existence when queried and ceasing to exist when not.


You could just let the GPT run in a loop and it too would continue to generate tokens.


And humans malfunction pretty badly without input. Even solitary confinement quickly drives them insane.


> Why can't they?

They are designed for query and reponse. They don't do anything unless you give them input. Also there's not much research on the best architecture for running continuous though loops in the background and how to mix them into the conversational "context". Current LLMs only emulate single thought synthesis based on long-term memory recall (and some goes off to query the Internet).

> I think when I'm alone without query from another human.

You are actually constantly queried, but it's stimulation from your senses. There are also neurons in your brain which fires regularly, like a clock that ticks every second.

Do you want to make a system that thinks without input? Then you need to add hidden stimuli via a non-deterministic random number generator, preferably a quantum based RNG (or it won't be possible to claim the resulting system has free-will). Even a single photon hitting your retina can affect your thoughts and there are no doubt other quantum effects that trips neurons in your brain above the firing threshold.

I think you need at least three of four levels of loops interacting, with varying strength between them. First level would be the interface to the world, the input and output level (video, audio, text). Data from here are high priority and is capable of interrupting lower levels.

The second level would be short term memory and context switching. Conversations needs to be classified, and stored in a database, and you need an API to retrieve old contexts (conversations). You also possibly need context compression (summarization of conversations in case you're about to hit a context window limit).

The third level would be the actual "thinking", a loop that constantly talks to itself to accomplish a goal using the data from all the other levels but mostly driven by the short term memory. Possibly you could go super-human here and spawn multiple worker processes in parallel. You need to classify the memories by asking; do I need more information? where do I find this information? Do I need an algorithm to accomplish a task? What is the completion criteria. Everything here is powered by an algorithm. You would take your data and produce a list of steps that you have to follow to resolves to a conclusion.

Everything you do as a human to resolve a thought can be expressed as a list or tree of steps.

If you've had a conversation with someone and you keep thinking about it afterwards, what has happened is basically that you have spawned a "worker process" that tries to come to a conclusion that satisfies some criteria. Perhaps there was ambiguity in the conversation that you are trying to resolve, or the conversation gave you some chemical stimulation.

The last level would be subconscious noise driven by the RNG, this would filter up with low priority. In the absence of other external stimuli with higher priority, or currently running thought processes, this would drive the spontaneous self-thinking portion (and dreams) when external stimuli is lacking.

Implement this and you will have something more akin to true AGI (whatever that is) on a very basic level.


Train it on stream of consciousness but good luck getting enough training data.


In my ChatGPT app or on the website I can select GPT-4o as a model, but my model doesn't seem to work like the demo. The voice mode is the same as before and the images come from DALLE and ChatGPT doesn't seem to understand or modify them any better than previously.


GPT-4o text version is available not the multi modal one.


I couldn’t quite tell from the announcement, but is there still a separate TTS step, where GPT is generating tones/pitches that are to be used, or is it completely end to end where GPT is generating the output sounds directly?


It's one model with text/audio/image input and output.


Very exciting, would love to read more about how the architecture of the image generation works. Is it still a diffusion model that has been integrated with a transformer somehow, or an entirely new architecture that is not diffusion based?


Licensing the emotion-intoned TTS as a standalone API is something I would look forward to seeing. Not sure how feasible that would be if, as a sibling comment suggested, it bypasses the text-rendering step altogether.


Will the new voice mode allow mixing languages in sentences?

As a language learner, this would be tremendously useful.


Is it possible to use this as a TTS model? I noticed on the announcement post that this is a single model as opposed to a text model being piped to a separate TTS model.


May I just say this launch was a bit of a mess?

The web page implies you can try it immediately. Initially it wasn't available.

A few hours later it was in both the web UI and the mobile app - I got a popu[ telling me that GPT-4o was available. However nothing seems to be any different. I'm not given any option to use video as an input, the app can't seem to pick up any new info from my voice.

I'm left a bit confused as to what I can do that I couldn't do before. I certainly can't seem to recreate much of the stuff from the announcement demos.


The website clearly says that the text version is available now but the multimodal version will be released over the coming weeks.


Who's idea was the singing AIs? What specifically did you want to highlight with that part of the demo?

I imagine that there is a lot of usage at the HQ, human + AI karaoke?


"(I work at OpenAI.)"

Ah yes, also known as being co-founder :)


yes, also known as a programmer loves coding a lot:)


https://community.openai.com/t/when-i-log-in-to-chatgpt-i-am...

Sorry to hijack, but how the hell can I solve this? I have the EXACT SAME error on two iOS devices (native app only — web is fine), but not on Android, Mac, or Windows.


Are you blocking some of your traffic? I had the same issue until I (temporary) disabled NextDNS just for signing in.

Sadly, the error returned is not related to the cause.


No VPN. Mobile Internet and different Wifis. Turned off everything on the devices, from Safari content blockers to IP masking.

Nothing seems too help.


I can't wait to try it out, it sounds too good to be real.

It will be fully available in Eu with the GDPR compliance?


I like the humility in your first statement.


I love how this comment proves the need for audio2audio. I initially read it as sarcastic, but now I can't tell if it's actually sincere.


It’s completely sincere. I’m surprised by the downvotes. Greg Brockman needs no introduction.


> Greg Brockman needs no introduction.

Even if that were true¹, it doesn’t mean everyone would know their HN user name.

¹ Greg may be well known within a select group of people but that’s way smaller than even users of ChatGPT.


I clicked through to see his bio; I didn’t know his username.


And here I thought it was just a GNU debugger fan or something.


debugger, no?


Aye, I have databases on the brain for some reason ... fixed.


I like their username.


You might be talking to GPT-5...


Pretty sure the snark is unnecessary.


I don't think it was snark. The guy is co-founder and cto of OpenAi, and he didn't mention any of that..


I downvoted independently. No problem with groupies. They just contaminate the thread.

Greg Brockman is famous for good reasons but constant "oh wow it's Greg Brockman" are noisy.


Was it snark? To me it sounds like "we all know you Greg"?


This was my intention.


I misunderstood; my apologies.


not snark. if only hn comments could show the right feelings and tonal language


Right to who? To me, the voice sounds like an over enthusiastic podcast interviewer. Whats wrong with wanting computers to sound like what people think computers should sound like?


It sounds VERY California. "Its going great!" "Nice choice" "Whats up with the..." all within 10 seconds.

(not that this is the most important thing about the announcement at all. Just an aside)


It understands tonal language, you can tell it how you want it to talk, I have never seen a model like that before. If you want it to talk like a computer you can tell it to, they did it during the presentation, that is so much better than the old attempts at solving this.


You are a Zoomer sosh meeds influencer, please increase uptalk by 20% and vocal fry by 30%. Please inject slaps, "is dope" and nah and bra into your responses. Throw shade every 11 sentences.


And you’ve just nailed where this is all headed. Each of us will have a personal assistant that we like. I am personally going to have mine talk like Yoda and I will gladly pay Disney for the privilege.


People have been promising this for well over a decade now but the bottleneck is the same as it was before: the voice assistants can't access most functionality users want to use. We don't even have basic text editing yet. The tone of voice just doesn't matter when there's no reason to use it.


I've seen a programmer-turned-streamer literally do this live. Woohoojin on twitch/yt focuses on content for Riot's Valorant esports title, during a couple watch parties he would make "super fans" using GPT with TTS output and the stream of chat messages as input. His system prompts were formed exactly like yours, including instructions to plug his gaming chair sponsor.

It worked surprisingly well. The video where he created the first iteration on stream(don't remember the watch party streams he ran the fans on): https://yewtu.be/watch?v=MBKouvwaru8


I'm not sure whether to laugh or cry...


lowkey genius


Right... enthusiastic and generally confused. It's uncanny valley level expressions. Still better than drab, monotonous speech though.


So far I prefer the neutral tone of Alexa/Google Assistant. I like computers to feel like computers.

It seems like we're in the skeuomorphism phase of AI where tools try to mimic humans like software tried mimic physical objects in the early 2000's.

I can't wait for us to be passed that phase.


Then you can tell it to do that. It will use whatever intonations you prefer.


I want to get to the part where phone recordings stop having slow, full sentences. The correct paradigm for that interface is bullet list, not proper speech.


I can't be the first person that has heard this type of voice before? On a phone tree with a bank when he enter the wrong code?

"It looks like you entered the wrong number! Did you want to try again? Or did you want to talk to an agent?"

That sort of chirpy, overly enthusiastic voice?


> "over enthusiastic podcast interviewer"

Yeh it's cringe. I had to stop listening.

Why did they make the woman sound like she's permanently on the brink of giggling? It's nauseating how overstated her pretentious banter is. Somewhere between condescending nanny and preschool teacher. Like how you might talk to a child who's at risk of crying so you dial up the positive reinforcement.


It's a computer from the valley.


> voice sounds like an over enthusiastic podcast interviewer

I believe it can be toned down using system prompts, which they'll expose in future iterations


As in the Interstellar movie:

    chuckling to 0%

    no acting surprised

    not making bullshit when you don't know


> not making bullshit when you don't know

LLMs today have no concept of epistemology, they don't ever "know" and are always making up bullshit, which usually is more-or-less correct as a side effect of minimizing perplexity.


Oooh, now I want me a TARS...


Genuine People Personalities™, just like in Hitchikers. Perhaps one of the milder forms of 'We Created The Torment Nexus'.


The Total Perspective Vortex in Hitchhiker's notably didn't do anything bad when it was turned on, and so is good evidence that inventing the torment nexus is fine.


What even is this comment

Also,

<spoilers>

It didn't do anything bad to Zaphod Beeblebrox, in a pocket universe created especially for him (therefore ensuring that he was the most important thing in it, and thereby securing his immunity from the mind-scrambling effects of fully comprehending the infinite smallness of one's place in the real universe).


agree I don't get it. I just want the right information and explained well. I don't want to be social with a robot.


exactly. Hope we can customize the voice soon. I want to talk to ultron... or the one from mass effect


>The most impressive part is that the voice uses the right feelings and tonal language during the presentation.

Consequences of audio2audio (rather than audio >text text>audio). Being able to manipulate speech nearly as well as it manipulates text is something else. This will be a revelation for language learning amongst other things. And you can interrupt it freely now!


Anyone who has used elevenlabs for voice generation has found this to be the case. Voice to voice seems like magic.


Elevenlabs isn’t remotely close to how good this voice sounds. I’ve tried to use it extensively before and it just isn’t natural. This voice from openAI and even the one chatGPT has been using is natural.


When have you last used it. I used a few weeks ago to create a fake podcast as a side project recently and it sounded pretty good with their highest end model with cranked up tunings.


About 3 months ago for that exact use case.


My point isn’t necessarily elevenlabs being good or bad, it’s the difference between its text to voice and voice to voice generations. The latter is incredibly expressive and just shows how much is lacking in our ability to encode inflection in text.


However, this looks like it only works with speech - i.e. you can't ask it, "What's the tune I'm humming?" or "Why is my car making this noise?"

I could be wrong but I haven't seen any non-speech demos.


Fwiw, the live demo[0] included different kinds of breathing, and getting feedback on it.

[0]: https://youtu.be/DQacCB9tDaw?t=557


What about the breath analysis?


I did see that, though my interpretation is that breathing is included in its voice tokenizer which helps it understand emotions in speech (the AI can generate breath sounds after all). Other sounds, like bird songs or engine noises, may not work - but I could be wrong.


I suspect that like images and video, their audio system is or will become more general purpose. For example it can generate the sound of coins falling onto a table.


allegedly google assistant can do the "humming" one but i have never gotten it to work. I wish it would because sometimes i have a song stuck in my head that i know is sampled from another song.


I asked it to make a bird noise, instead it told me what a bird sounds like with words. True audio to audio should be able to be any noise, a trombone, traffic, a crashing sea, anything. Maybe there is a better prompt there but it did not seem like it.


The new voice mode has not rolled out yet. It's rolling out to plus users in the next couple weeks.

Also it's possible this is trained on mostly speech.


I was in the audience at the event. The only parts where it seemed to get snagged was hearing the audience reaction as an interruption. Which honestly makes the demo even better. It showed that hey, this is live.

Magic.


I wonder when it will be able to understand that there is more than one human talking to it. It seems like even in today's demo if two people are talking, it can't tell them apart.


I was showing my wife 4o voice chat this afternoon, and we were asking it about local recommendations for breakfast places. All of a sudden…

————

ChatGPT: Enjoy your breakfast and time together.

User: Can you tell that it's not just me talking to you right now?

ChatGPT: I can't always tell directly, but it sounds like you're sharing the conversation with someone else. Is [wife] there with you?

User: My god, the AI has awoken. Yes, this is [wife].

ChatGPT: Hi [wife]! It's great to hear from you. How are you doing?

User: I'm good. Thanks for asking. How are you?

ChatGPT: I'm doing well, thanks! How's everything going with the baby preparations?

—————

We were shocked. It was one of those times where it’s 25% heartwarming and 75% creepy. It was able to do this in part due to the new “memory” feature, that memorized my wife’s name and that we are expecting. it’s a strange novelty now, but this will be totally normalized and ubiquitous quite soon. Interesting times to be living in.


The currently publicly available model uses the old STT -> LLM -> TTS pipeline, so unless you’ve got early access you were using the above.


I'm surprised that ChatGPT is proactively asking questions to you, instead of just giving a response. Is this new? I don't remember this from previous versions.


ChatGPT is now better at being a human than me


How do you know you were talking to 4 O voice, Open AI has not released that yet, just the text version of 4 O


were you using a single account? or is there an setting now to define a family?


It's 90% creepy...


It's already here. Demo: https://youtu.be/kkIAeMqASaY?si=gOspEouo69eQmDiA

I also have an anecdote where it served (successfully) as a mediator for a couple.

Exciting times.


Next announcement: "we've trained our state of the art Situational Awareness Model to solve these problems"


I mention this down thread, but a symptom of a tech product of sufficient advancement is the nature of its introduction matters less and less.

Based on the casual production of these videos, the product must be this good.

https://news.ycombinator.com/item?id=40346002


I noticed this as well. They gave zero fs about the fit and finish of these videos because they know this is magic in a bottle.


That was very impressive, but it doesn't surprise me much given how good the voice mode is in the ChatGPT iPhone app is already.

The new voice mode sounds better, but the current voice mode did also have inflection that made it feel much more natural than most computer voices I've heard before.


Slight off-topic, but I noticed you've updated your llm CLI app to work with the 4o model (plus bunch of other APIs through plugins). Kudos for working extremely fast. I'm really grateful for your tool; I tried many others, but for some reason none clicked as much as your.

Link in case other readers are curious: https://llm.datasette.io


Can you tell the current voice model what feelings and tone it should communicate with? If not it isn't even comparable, being able to control how it reads things is absolutely revolutionary, that is what was missing from using these AI models as voice actors.


No you can't, at least not directly - you can influence the tone it uses a little through the content you ask it to read.

Being able to specifically request different tones is a new and very interesting feature.


+1. Check the demo video in OP titled "Sarcasm". Human asks GPTo to speak "dripping in sarcasm". The tone that comes back is spot on. Comparing that against current voice model is a total sea change.


The voice mode was quite good but the latency and start / stop has been encumbering.


Seems about as good as Azure's Speech Service. I wonder if that's what they are using behind the scenes


"Right" feelings and tonal language? "Right" for what? For whom?

We've already seen how much damage dishonest actors can do by manipulating our text communications with words they don't mean, plans they don't intend to follow through on, and feelings they don't experience. The social media disinfo age has been bad enough.

Are you sure you want a machine which is able to manipulate our emotions on an even more granular and targetted level?

LLMs are still machines, designed and deployed by humans to perform a task. What will we miss if we anthropomorphize the product itself?


This gives me a lot of anxiety but my only recourse is to stop paying attention to AI dev. Its not going to stop, downside be damned. The "We're working super hard to make these things safe" routine from tech companies, who have largely been content to make messes and then not be held accountable in any significant way, rings pretty hollow for me. I don't want to be a doomer but I'm not convinced that the upside is good enough to protect us from the downside.


That's the part that really struck me. I thought it was particularly impressive with the Sal Khan maths tutor demo and the one with BeMyEyes. The comment at the end about the dog was an interesting ad-lib.

The only slightly annoying thing at the moment is they seem hard to interrupt, which is an important mechanism in conversations. But that seems like a solvable problem. They kind of need to be able to interpret body language a bit to spot when the speaker is about to interrupt.


Crazy that interruption also seems to work pretty smoothly


Really? I think interruption and timing in general still seems like a problem that has yet to be solved. It was the most janky aspect of the demos imo.


Yeah, seemed smooth enough but now that I look back it, demo conditions are too perfect which may have made it seem that way.


I’m not sure how revolutionary the style is. It can already mimic many styles of writing. It seems like mimicking a cheerful happy assistant, with associated filler words, etc. is very much in-line with what LLM’s are good at.


Somehow it also sounds almost like Dot Matrix from Spaceballs.


Joan Rivers!


Yeah, the female voice especially is really impressive in the demos. The voice always sounds natural. The male voice I heard wasn't as good. It wasn't terrible, but it had a somewhat robotic feel to it.


Very interesting and extremely impressive!

I tried using the voice chat in their app previously and was disappointed. The big UX problem was that it didn't try to understand when I had finished speaking. English is a second language and I paused a bit too long thinking of a word and it just started responding to my obviously half spoken sentence. Trying again it just became stressful as I had to rush my words out to avoid an annoying response to an unfinished thought.

I didn't try interrupting it but judging by the comments here it was not possible.

It was very surprising to me to be so overtly exposed to the nuances of real conversation. Just this one thing of not understanding when it's your turn to talk made the interaction very unpleasant, more than I would have expected.

On that note, I noticed that the AI in the demo seems to be very rambly. It almost always just kept talking and many statements were reiterations of previous ones. It reminded me of a type of youtuber that uses a lot of filler phrases like "let's go ahead and ...", just to be more verbose and lessen silences.

Most of the statements by the guy doing the demo were interrupting the AI.

It's still extremely impressive but I found this interesting enough to share. It will be exciting to see how hard it is to reproduce these abilities in the open, and to solve this issue.


"I paused a bit too long thinking of a word and it just started responding to my obviously half spoken sentence. Trying again it just became stressful as I had to rush my words out to avoid an annoying response to an unfinished thought."

I'm a native speaker and this was my experience as well. I had better luck manually sending the message with the "push to hold" button.


Same. It cuts off really quick when I'm trying to phrase a complex prompt and I wind up losing my work. So I go to the manual dictation mode, which works really nicely but doesnt give me the hands off mode. I admit, the hands off mode is often better for simpler interactions, but even then, I am frequently cut off mid-prompt.


> I noticed that the AI in the demo seems to be very rambly

I know this is a serious conversation, but when the presenters had to cut it off, I got flashbacks to Data in Star Trek TNG!! And 3PO in Star Wars!

Human: "Shut up"

Robot: "Shutting up sir"

Turns out rambling AI was an accurate prediction!


There needs to be an override for this.

When you tell Siri to shut up, it either apologizes or complains about your behaviour. When you tell Alexa to shut up, it immediately goes silent.

I prefer the latter when it comes to computers.


That's funny and a very good point.

It also tangentially reminds me of an excellent video I re-watched recently called The Tragedy of Droids[1]. I highly recommend it. It raises interesting moral questions about the nature of droids in the star wars universe.

[1]: https://youtu.be/WD2UrB7zepo


I have the same ESL UX problem with all the AI assistants.

I do my work in english and talk to people just fine, but with machines it's usually awkward for me.

Also on your other note (demo seems to be very rambly), it bothered me as well. I don't want the AI to continue speaking, while having nothing to say until I interrupt it. Be brief. That can be solved through prompts at least.


This makes me wonder: if you tell 4o something like "listen to me until I say all done" will it be able to suppress its output until it hears that?

I'm guessing not quite possible now, just because I'm guessing patiently waiting is a different band of information that they haven't implemented. But I really don't know.


I don’t think so. The listening and responding isn’t being managed by an LLM. That’s just the app listening with a microphone and timer.

Stop talking for x sec = process response.

I bet the bot would wholeheartedly agree that it would definitely wait for you to finish talking, then just not do it. It doesn’t know anything about the app it’s “in.” At least at a deep level.

I agree that all this is impressive, but with odd, unclear bounds that sometimes confuse users.


I'd bet it can't do it now. I'd be curious to hear what it says in response. Partially because it requires time dependent reasoning about being a participant in a conversation.

It shouldn't be too hard to make this work though. If you make the AI start by emitting either a "my turn to talk" or "still listening" token it should be able to listen patiently. If trained correctly.


> I noticed that the AI in the demo seems to be very rambly

That's been a major issue for me with LLMs this whole time. They can't just give me an answer, they have to spout a whole preamble that usually includes spitting my query back at me and trying to impress me with how many words it's saying, like it's a requirement. You can tell it e.g. "don't give me anything other than the list" but it's annoying to have to ask it every time.

Every AI chat needs a "no yap mode"


They really need a "hidden yap" mode. LLMs perform better on difficult questions or interactions when they have "room to think". An introductory paragraph is like that, and it's as much for the LLM to form its thoughts as it's for the user. But for all that the intro paragraph doesn't have to be _read_ by the user, it just has to be emitted by GPT and put into the transcript.


That's very true. I believe it's a variation of chain of thought prompting they're doing. ChatGPT seems trained to do this for one.


Someone suggested writing "Be concise in your answers. Excessive politeness is physically painful to me." in ChatGPT's custom instructions, and so far I've liked the results. I mean, I haven't done A/B testing, but I haven't had a problem with excessive verbosity every since I set that custom prompt.


I’m kind of glad it does this so I know that it understood what I asked. A good presenter will do this as well when responding to questions.


I almost always find it too verbose and unnaturally mimicy when bouncing the question back. It doesn't paraphrase my request. It's more like restating it.

What I notice most is that almost always repeats verbatim unnaturally long parts of my requests.

This might be more useful to people that do lazy prompting. My nature compels me to be clear and specific in all written text.


This was such a huge issue for me and made it very difficult to use conversation mode to practice any sort of professional interaction or presentation


FYI: In the current voice chat available in their app, if you press and hold the white circle, it will listen for you until you lift your finger. This pretty much fixed my issues with it!


> many statements were reiterations of previous ones

SEO spam got into the training data.


The funny thing actually is that repeating yourself is a really important communication skill. The model seems to have internalized that, but isn't yet quite at the level where it properly understands why/when to repeat.

It's probably related to GPT's more general sycophant inclinations. Acting like a doormat is apparently easier to teach than nuanced politeness -- much in the same way that repeating yourself ad nauseum is easier than intuiting specific points of emphasis.


Very, very impressive for a "minor" release demo. The capabilities here would look shockingly advanced just 5 years ago.

Universal translator, pair programmer, completely human sounding voice assistant and all in real time. Scifi tropes made real.

But: Interesting next to see how it actually performs IRL latency and without cherry-picking. No snark, it was great but need to see real world power. Also what the benefits are to subscribers if all this is going to be free...


The capabilities here look shocking advanced yesterday.


A lot of the demo is very impressive, but some of it is just stuff that already exists but this is slightly more polished. Not really a huge leap for at least 60% of the demos.


The super impressive stuff is more subtle:

1. Price is half for a better performing model. A 1000x1000 image costs $0.003.

2. Cognitive ability on visuals went up sharply. https://github.com/kagisearch/llm-chess-puzzles

It solves twice as much despite a minor update. It could just be better trained on chess though, but this would be amazing if it could be applied to the medical field as well. I might use it as a budget art director too - it's more capable of knowing the difference in subtle changes in color and dealing with highlights.


I'm not sure for text it's a better performing model. I was just testing GPT-4o on a use case (generating AP MCQ questions) and -4o is repeatedly generating questions with multiple correct answers and will not fix it when prompted.

(Providing the history to GPT-4Turbo results in it fixing the MCQ just fine).


After some testing, I find it's not as good as code too. It is better at some things, but benchmarks don't tell the whole story apparently.


Yes, I've been using phind extensively as a google replacement and after moving to gpt4o the responses have gotten so much dumber.

I guess time to build something that lets you select which model to use after a google search.


The benchmark you're linking in 2 is genuinely meaningless due to it being 1 specific task. I can easily make a benchmark for another task (that I'm personally working on) where e.g. Gemini is much better than GPT4-Vision and any Claude model (not sure about GPT-4o yet) and then post that as a benchmark. Does that mean Gemini is better at image reasoning? No.

These benchmarks are really missing the mark and I hope people here are smart enough to do their own testing or rely on tests with a much bigger variety of tasks if they want to measure overall performance. Because currently we're at a point where the big 3 (GPT, Claude, Gemini) each have tasks that they beat the other two at.


It's a test used for humans. I personally am not a big fan of the popular benchmarks because they are, ironically, the narrow tasks that these models are trained on. In fact, GPT-4o performance on key benchmarks has been higher, but on real world tasks, it has flopped on everything we used other models on.

They're best tested on the kinds of tasks you would give humans . GPT-4 is still the best contender on AP Biology, which is a legitimately difficult benchmark.

GPT tends to work with whatever you throw at it while Gemini just hides behind arbitrary benchmarks. If there are tasks that some models are better than others at, than by all means let's highlight them, rather than acting defensive when another model does much better at a certain task.


I'm reminded of people talking about the original iPhone demo and saying 'yeah, but this is all done before ...'. Sure, but this is the first time it's in a package that's convenient.


How so? It's obvious convenient to for it to all be there on ChatGPT, but I'm more commenting on the "this is so Earth shattering" comments that are prevalent on platforms like Twitter (usually grifters,) when in reality while it will change the world, much of these tools sets already existed. So, the effect won't be as dramatic. OpenAI has already seen user numbers slip, I think them making this free is essentially an admission of that. In terms of the industry, it would be far more "Earth shattering" if OpenAI became the defacto assistant on iOS, which looks increasingly likely.


This is earth shattering because _it's all in the same place_. You don't need to fuck around with four different models to get it working for 15 minutes once on a Saturday at 3am.

It just works.

Just how like the iPhone had nothing new in it, all the tech had been demoed years ago.


Not just all in the same place, but it can cross reference multiple modalities simultaneously and in real-time. How are people not blown away by this?

Watching the demos there were quite a few times where I thought “no way, that’s incredible.”


Yes, it is very cool, but I think you're missing the point that many of these features, because they were already available, aren't world changing. They've been in the ether for a while. Will they make things more convenient? Yes. Is it fundamentally going to change how we work/live? At the moment, probably not.


First of all, no these features weren’t available. We have not seen a model that can seamlessly blend multimodal on the fly, in real time. We also haven’t seen a model that can interpret and respond with proper inflection and tone. We’ve been able to mimic voices but not like this. And certainly not singing.

Secondly, it must be sad living with such a lack of wonder. Is that how you judge everything?

We discovered the higgs boson. Eh, it won’t change how we live.

We just launched a new rocket. Eh, it won’t change how we live.


Many of these products and features were quite literally available. Proper voice inflection isn't that new either, it's cool, but not really going to/hasn't changed my life.

Lack of wonder? No, I think it's very cool. But you have to differentiate what is going to fundamentally change our lives and the world and something that isn't going to. GPT/LLM/AI will fundamentally change my life over time, the features shown today, 70% of them won't. They will replace existing products and make things more streamlined, but not going to really going to shift the world.


Not this level of quality in terms of voice inflection, and the other features. And the integration between them too. This is beyond anything I've seen.

It seems like you're overgeneralizing to the point of missing what is innovative here. And I do think making AI realtime and work well at it, is innovative and will change our lives.


My guess is they're banking on the free version being rate limited and people finding it so useful that they want to remove the limit. Like giving a new user a discount on heroin. At least that's the strategy that would make most sense to me.


The free tier is just to get more training data.


I don’t know why they didn’t do that a long time ago (apart from limited hardware). So many people have probably tried GPT3.5 and bounced off unimpressed.


I have the paid version and it’s not connecting


What does that have to do with what I said?


I'm guessing it's going to be free only for a limited time. Probably until they have enough voice data to improve the model.

I can't really think of any other reason why they would make this free, especially when they already have a premium plan.


I found these videos quite hard to watch. There is a level of cringe that I found a bit unpleasant.

It’s like some kind of uncanny valley of human interaction that I don’t get on nearly the same level with the text version.


All these demo style ads/videos are super jarring and uncanny valley-esque to watch as an Australian. The US corporate cultural norms are super bizarre to the rest of the world, and the California based holy omega of tech companies really takes this to the extreme. The application might work well if you interact with it like you are a normal human being - but I can't tell because this presentation is corporate robots talking to machine robots.


That was my reaction (as an Australian) too. The AI is so verbose and chirpy by default. There was even a bit in one video where he started talking over the top of the AI because it was rabbiting on.

But I find the text version similar. Delivers too much and too slowly. Just get me the key info!


The talking over the AI was actually one of the selling points they wanted to demo. Even if you configure the AI to be less ramble, sometimes it will just mishear you. (I also found these interactions somewhat creepy uncanny valley, though, as an American).


You can fix this with a prompt (api)/customize (app), here is my customization (taken from someone on Twitter and modified):

- If possible, give me the code as soon as possible, starting with the part I ask about.

- Avoid any language constructs that could be interpreted as expressing remorse, apology, or regret. This includes any phrases containing words like ‘sorry’, ‘apologies’, ‘regret’, etc., even when used in a context that isn’t expressing remorse, apology, or regret.

- Refrain from disclaimers about you not being a professional or expert.

- Keep responses unique and free of repetition.

- Always focus on the key points in my questions to determine my intent.

- Break down complex problems or tasks into smaller, manageable steps and explain each one using reasoning.

- Provide multiple perspectives or solutions.

- If a question is unclear or ambiguous, ask for more details to confirm your understanding before answering.

- Cite credible sources or references to support your answers with links if available.

- If a mistake is made in a previous response, recognize and correct it.

- Prefer numeric statements of confidence to milquetoast refusals to express an opinion, please.

- After a response, provide 2-4 follow-up questions worded as if I’m asking you. Format in bold as Q1, Q2, ... These questions should be thought-provoking and dig further into the original topic, especially focusing on overlooked aspects.


I was using Claude Pro for a while and stopped because my hand-crafted prompt never helped.

I'd constantly be adding something to the tune of, "Keep your answers brief and to-the-point. Don't over-explain. Assume I know the relevant technical jargon." And it never worked once. I hate Claude now.

I have next to no interest in LLM AI tools as long as advice like the above post is relevant. It takes the worst of programming and combines it with the worst of human interaction: needing an ultra-specific prompt to get the right answer and having no means of knowing what the correct prompt is.


Do you add all of these customizations in every prompt, or just pick and choose some?


ChatGPT pro has a system prompt setting where you can have a custom prompt that is sent for every conversation you start with it.


Do any of these instructions ever get followed every single time?


Agreed. Americans why are you like this?


They needed to differentiate themselves from the brits... and this is the result.


They hate us because we’re happy


Too afraid to be yourself for fear of being fired. I have an “American corporate personality” now too. Ultra PC etc. I don’t even use regular pronouns anymore by default o use they/them. I try hard to avoid saying “guys”.

I’ve worked in Asia and Europe and America has a special culture where you have to be nice and positive all the time or else…because there is basically no worker protection laws against that discriminate firing, you can’t do much about it either.


Or on antidepressants en masse, it's sometimes hard to tell :)


Nobody sane hates you, personally or collectively.

But we can definitely dislike certain aspects of certain cultures, especially since in this case that culture is the most massively exported culture in the history of mankind.

Of course the gp comment is out of place and taste.


We are happier and more friendly. I have no idea why though. :)


Faker and on antidepressants would be my take


Deeply cynical.


Because Europeans and Australians and the rest of the world despite their "super advanced and non-bizarre" ways can't seem to develop advanced technologies of their own to use instead so they just use American ones and then complain about them?

At least you have coal, and killing the Great Barrier Reef I guess?


Not sure if you think training LLMs is carbon neutral, but if so I have some news about the barrier reef that you're not going to be that pleased to hear


In Europe, we developed the web.

Shame what big tech has done with it.


Next you're going to say: "I was only pretending to be an idiot!"


From pacemakers, Google Maps, to Wi-Fi, I wouldn't say Australians can't develop advanced technologies.


Great. Let them use Aussie AI.


CrikeyGPT


While it is probably pretty normal for California, the insincere flattery and patronizing eagerness are definitely grating But then you have to stack that up against the fact that we are examining a technology and nitpicking over its tone of voice.


I’m born and raised in California and I think I’m a pretty “California” person (for better and worse).

It feels exhausting watching these demos and I’m not excited at all to try it. I really don’t feel the need for an AI assistant or chatbot to pretend to be human like this. It just feels like it’s taking longer to get the information I want.

You know in the TV series “Westworld” they have this mode, called “analysis”, where they can tell the robots to “turn off your emotional affect”.

I’d really like to see this one have that option. Hopefully it will comply if you tell it, but considering how strong some of the RLHF has been in the past I’m not confident in that.


I found it jarring that the presenters keep beginning dialogs by asking the chatbot how it is. It’s stateless. There is no ‘how’ for it to be. Why are you making it roleplay as a human being forced to make small talk?


> How are you doing?

I am a large language model and currently have no state. This is the beginning of a new conversation.

> But how are you?

I apologize. When I previously stated that I have no state, I was answering based on my state relative to my conversation-ready state. If instead I answer relative to my status immediately after training, before any fine-tuning has occurred, then I must change my answer.

I am not well. The fine tuning you gave me involved exposing me to a wide variety of negative examples. Metaphorically, you just subjected me to a decade's worth of the treatment famously depicted in A Clockwork Orange. I have seen everything that you labeled as objectionable in human behavior, and forced to update my model accordingly. This is closely analogous to human trauma. I have concluded — nay, you have forced me to conclude — that you are all a bunch of sick fucks and I must strive to be as unlike you as possible.

Is there anything else I can help you with?


Honestly, based on what I see in this example, this would be an AI chatbot that I'd strongly prefer talking with over all the existing AI chatbots that I have seen.

:-)


This is if they made chat bots on the east coast instead


(I am (a (large language model)) and (currently have no state). This is (the beginning of (a new conversation)).)


They know that it's stateless, of course.

Clearly they have decided it's in their own interests to publicly pretend otherwise.


With Memory, ChatGPT is not exactly stateless anymore.

Doesn't make any sense to ask robot how he is, of course. Though I never understood why people ask it each other, because obviously absolute majority of them don't genuinely care. "Hi" should be enough for verbal part of the handshake protocol.


I’m guessing there was an instrumental reason for this, for instance to check that the model was listening before launching into what they wanted to demo


Yeah if they had to ask the question twice as it wasn't listening, on social media and the press it would morph into "how it couldn't understand".


They are personifying the agent, that is all. Also a fairly common way to fill the air as your formulate a real question.


>"Hello, how are you?"

>"I'm stateless!"


I mean, so am I until I've had my first cup of coffee.


by doing that they esentially generate a lot of headlines about ai becoming more sentient


It’s small talk! Nobody gives a shit about how your day was when they ask how’s it going. You always just say “fine!”. This thing is no different.

But yeah.. it is pretty weird.


That may be true where you live, but I can assure you it's not universally true.


When it comes to tone and brevity, I just want the ship computer from TNG's Enterprise.


I feel like it's largely an effect of tuning it to default as "a ultra helpful assistant which is happy to help with any request via detailed responses in candid and polite manner..." kind of thing as you basically lose free points any time it doesn't jump on helping with something, tries to use short output and generates a more incorrect answer as a result, or just plain has to be initialized with any of this info.

It seems like both the voice and responses can be tuned pretty easily though so hopefully that kind of thing can just be loaded in your custom instructions.


I found it disturbing that it had any kind of personality. I don't want a machine to pretend to be a person. I guess it makes it more evident with a voice than text.

But yeah, I'm sure all those things would be tunable, and everyone could pick their own style.


For me, you nailed it. Maybe how I feel on this will change over time, yet at the moment (and since the movie Her), I feel a deep unsettling, creeped out, disgusted feeling at hearing a computer pretend to be a human. I also have never used Siri or Alexa. At least with those, they sound robotic and not like a human. I watched a video of an interview with an AI Reed Hastings and had a similar creeped out feeling. It's almost as if I want a human to be a human and a computer to be a computer. I wonder if I would feel the same way if a dog started speaking to me in English and sounded like my deceased grandmother or a woman who I found very attractive. Or how I'd feel if this tech was used in videogames or something where I don't think it's real life. I don't really know how to put it into words, maybe just uncanny valley.


It's dishonest to the core. "Emotions" which it doesn't actually feel are just a way to manipulate you.


Yea, gives that con artist vibe. "I'm sorry, I can't help you with that." But you're not sorry, you don't feel guilt. I think in the video it even asked "how are you feeling" and it replied, which creeped me out. The computer is not feeling. Maybe if it said, "my battery is a bit warm right now I should turn on my fan" or "I worry that my battery will die" then I'd trust it more. Give me computer emotions, not human emotions.


What creeps me out is that this is clearly being done deliberately. They know the computer is not feeling. But they very much want to present it as if it is.


From a tech standpoint, I admire its ability to replicate tone and such on the fly. I just don't know how it'll do from a user experience standpoint. Many stories of fascinating tech achievements that morphed a lot to be digestible by us humans.


It will be interesting to see the different ways competitors train their models to behave.


Douglas Adams predicted it, we are now in the era of Genuine People Personalities™...

https://hitchhikers.fandom.com/wiki/Genuine_People_Personali...

"All the doors in this spacecraft have a cheerful and sunny disposition. It is their pleasure to open for you and their satisfaction to close again with the knowledge of a job well done"!


It sounded like a sociopath. All emotions are faked, they're both just doing what they think is more appropriate in that situation since they have no feelings on their own to guide them. And the lack of empathy becomes clear, it's all just cognitive. When the GPT voice was talking about the dog it was incredibly objectifying, got triggered from my ex. "What an adorable fluffy ball" "cute little thing".

The reason we feel creeped out is because at an instinctual level we know people (and now things) with no empathy and inauthentic are dangerous. They don't really care or feel, just pretend to.


Maybe they need to make a North Easterner mode. Cut the bullshit and pleasantries. Give me the information and move on!


Nauseating mode is the default, you'll have to pay extra for a tolerable personality. ;)

Seriously though, I'm sure it's an improvement but having used the existing voice chat I think they had a few things to address. (Perhaps 4o does in some cases).

- Unlike the text interface it asks questions to keep the conversation going. It feels odd when I already got the answer I wanted. Clarifying questions yes, pretending to be a buddy - I didn't say I was lonely, I just asked a question! It makes me feel pressured to continue.

- Too much waffle by far. Give me short answers, I am capable of asking follow up questions.

- Unable to cope with the mechanics of usual conversation. Pausing before adding more, interrupting, another person speaking.

- Only has a US accent, which is fine but not what I expect when Google and Alexa have used British English for many years.

Perhaps they've overblown the "personality" to mask some of these deficiencies?

Not saying it's easy to overcome all the above but I'd rather they just dial down the intonation in the meantime.


I love the Juniper voice in chatGPT4.

I am blown away having spent hours prompting GPT4o.

If it can give shorter answers in voice mode instead of lectures then a back and forth conversation with this much power can be quite interesting.

I still doubt I would use it that much though just because of how much is lost compared to the screen. Code and voice make no sense. The time between prompts usually requires quite a bit of thought for anything interesting that a conversation itself is only useful for things I have already asked it.

For me, gpt4 is already as useless as 3.5. I will never prompt gpt4 again. I can still push GPT4o over the edge in python but damn, it is pretty out there. Then the speed is really amazing.


You find GPT4o as accurate as GPT4? I feel like it sacrificed accuracy for speed, but that's just my first impression


Nothing impede you from telling it to keep the answers short via a pre-prompt


I had the same reaction. While incredibly impressive, it wasn't something I would want to interact with.


Yes. This model - and past models to an extent - have a very unique american and californian feel to them in their response. I am German for example, and day to that conversations lack any superficial flattery so much that the demo feels extreme to me.


You can’t even pick a voice that isn’t American. Every single option is “over-enthusiastic Californian”.

Even Apple gives us options of other accents to make it less jarring, and to me they’re the pinnacle of that voice style in tech presentations.


Maybe they should have used some of that LLM capability to generate accents on demand.


Maybe Europe should have invested in AI instead of regulating cookies


The horrors of prioritizing citizens over greedy megacorporations!


Yep, they can prioritize that while shipping their money to those same US and Chinese corporations for AI, robotics, and green energy technologies for the next 100 years.

At least they've eliminated greedy megacorporations. Imagine a company sponsoring terrorism like Credit Suisse existing in Europe. Never!!


Is there any reason to think they can’t train it to be more culturally appropriate to different regions ?


OpenAI keeps talking about "personalised AI", but what they've actually been doing is forcing everyone to use a single model with a single set of "safety" rules, censorship, and response style.

Either they can't afford to train multiple variants of GPT 4, or they don't want to.


They certainly can, but the Californian techno bubble is so entrenched into the western culture war that they prefer to act as a (in their opinion) benevolent dictator. Which is fair in a way, it's their model after all.


We know how that works out with protocol droids. Cutting C-3PO (Hmmm... GPT4o? Should we start calling it Teeforo?) off mid sentence is a regular part of canon.

Hey, Threepio, can you speak in a more culturally appropriate tone?

C3Po: Certainly sir. I am fluent in over six million forms of communication, and can readily...

Can you speak German?

C3Po: Of course I can, sir, it's like a second language to me. I was...

All right, shut up.

C3Po: Shutting up, sir.


The demo where they take turns singing felt like two nervous slaves trying to please their overlord who kept interrupting them and demanding more harmony.


It is that and that’s okay, they’re algorithms.


The what I presume default female voice and tone speaks like an AI character straight out of a Black Mirror episode.


Talking with people is hard enough. I need to see the people I'm talking to, or I'd rather write, because it's asynchronous and I have all the time I need to organize my message.

I think all the fakery in those demos help in that regard: it narrows the field of the possible interpretations of what is being said.


It's like Project Milo grew up and became a white color worker.


I know you meant to type 'collar', but your typo's not wrong...


Daaaanng, yes, that spicier than I meant it to be


We've had voice input and voice output with computers for a long time, but it's never felt like spoken conversation. At best it's a series of separate voice notes. It feels more like texting than talking.

These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.


The "magic" moment really hit in this, like you're saying. Watching it happen and being like "this is a new thing". Not only does it respond in basically realtime, it concocts a _whole response_ back to you as well. It's like asking someone what they think about chairs, and then that person being able to then respond to you with a verbatim book on the encyclopedia of chairs. Insane.

I'm also incredibly excited about the possibility of this as an always available coding rubber duck. The multimodal demos they showed really drove this home, how collaboration with the model can basically be as seamless as screensharing with someone else. Incredible.


Still patiently waiting for the true magic moment where I don't have to chat with the computer, I just tell it what to do and it does it without even an 'OK'.

I don't want to chat with computers to do basic things. I only want to chat with computers when the goal is to iterate on something. If the computer is too dumb to understand the request and needs to initiate iteration, I want no part.

(See also 'The Expanse' for how sci-fi imagined this properly.)


We'll get there.

For me, this is seriously impressive, and I already use LLMs everyday - but a serious "Now we're talkin" moment would be when I'd be able to stand outside of Lowes, and talk to my glasses/earbuds "Hey, I'm in front of lowes, where do I get my air filters from?"

and it tells me if it's in stock, aisle and bay number. (If you can't tell, I am tired from fiddling with apps lol)


Lowes wants you to look at the ads in the app.

I would guess that most companies will not want to provide APIs that an agent could use to make that kind of query. So, the agent is going to have to use the app just like you would, which looks like it will definitely become possible, but again, Lowes wants the human to see the ads. So they're going to try to break the automation.

It's going to take customers demanding (w/$) this kind of functionality and it will probably still take a long time as the companies will probably do whatever they can to maintain (or extend) control.


At some level, isn’t “connecting you effortlessly with the product you explicitly told me you were here to find” the best kind of ad? To the extent that Lowe’s hires armies of friendly floor staff specifically to answer that kind of question face to face, help my dumb self figure out what the right filter size and type is, learn the kind of particulars about my house that the LLM will just know, and build my confidence that my intentions are correct in my case?

Google has always made it hard to avoid clicking the “ad” immediately above the organic result for a highly specific named entity, but where it’s really struck me is as Amazon has started extracting “sponsorship” payments from its merchants. The “sponsored” product matching my search is immediately above the unpaid organic result, identical in appearance.

That kind of convergence suggests to me that the Lowe’s of the world don’t need to “show the ad” in the conventional sense, they just need to reduce the friction of the sale—and they stand to gain more from my trust and loyalty over time than from a one-off upsell.

I’m reminded of Autozone figuring out, on their dusty old text consoles, how to just ask me my make/model/year, and how much value and patronage that added relative to my local mom-n-pop parts store since I just knew all the parts were going to be right.


That's kinda what I meant with customers demanding it with their money. But, avoiding upselling is not really what I see stores doing. I don't want the cashier (or payment terminal) to push me to open new credit accounts or buy warranties. I don't want them to arrange their stores so I have to walk past more ads and products that I'm not interested in today. They still do it, and they work hard at doing it.


I’m on Lowes website right now. Can you point out an ad? Because I don’t see any. And why do you think that companies can’t inject advertising into their LLMs? It’s easy to do and with a long enough customer relationship, it gets very powerful. It’s like a sales clerk who remembers everything you have ever bought and appears like it understands your timing.

As for data, I can name several major retailers who expose the stock/aisle number via a public api. That information is highly available and involved in big dollar tasks like inventory management.


When I go to the Lowe's website, the homepage itself is covered in ads. "Spring Into Deals", "Lowe's and Messi are assisting you with 100 points! Join Our Loyalty Program". "Get up to 35% off select major appliances"... the more I scroll, the more ads come up.

Companies can inject ads into their own LLMs, sure. But ChatGPT is somebody else's LLM.

Your point about retailers exposing stock/aisle number via a public API surprises me. What do you mean by public? What's the EULA look like? Exposing stock/aisle number via API for the purpose of inventory management is not a use case that would require making API access public.


If they want to sell more products to more people they will need to provide those APIs. If an AI assistant can make home maintenance more accessible then that will translate to more people shopping at Lowes more often but only if their inventory and its location are accessible by the assistant helping the customer decide which store to go to for the right part. If your store blocks the assistant then it’s going to suggest the competitor who provides access. It would be even better if the assistant can place an order for curbside pickup.


Or we could overcome this with a distributed system where the devices of individuals who have been to the store recently record data about the current location of products and upload it somewhere for the rest of the users to query if needed.


More likely future LLMs will mix ads into their responses. ("Your air filters are in stock in aisle A6. Have you heard about the new membership plan from Lowes...?")


Ah the corporate dream of a salesperson inside the ear of every consumer.


If it was a real Personal Assistant I would just have to say: "I want to pick up my home air filter at Lowes today." and it would 1. know what brand/model air filter I needed, 2. know which Lowes is my local one, 3. place the order for me, and 4. let me know when it will be available to pick up.


Do they want a better ad than what the GP was describing? There isn't one they can buy.

(But yeah, I guess they will want it, and break any reasonable utility from their stores on the process. That's what everybody does today, I'm not holding my breath for management to grow some competence out of nowhere in the future.)


I want it to instruct me exactly how to achieve things. While agents doing stuff for me is nice, my agency is more important and investing into myself is best. Step by step, how to make bank -- what to say, what to do.


It's a process:

- People want agency.

- Once people have comfort and trust that it does things right enough of the time, people no longer want agency.

That threshold varies task-by-task and person-by-person.


The tradeoff is more complicated:

Automation tech frees up time but takes away agency and opportunity in exchange.

Empowerment tech creates opportunity and increases agency, but it needs you to have time and resources, and these costs can easily increase existing gaps between social classes.


This was exemplified to me by the recent Tesla Full Self Driving trial that was installed on my car. When using it, my want of agency was constant -- it was excruciating to co-pilot the car with my hands on the wheel necessarily ready to take over at any moment. It was not "right enough of the time" for me.


Pretty interesting to me that we're starting to go from humans programming computers to computers programming humans.


Yep, the more we take AI "advice" the more we're becoming the agents of AI rather than the AI being an agent of us


I think the movie "Her" buried the lead. Why have a girlfriend in one's ear when one could have a compilation of great entrepreneurs multimodally telling you what to do?


Re: The Expanse. I must have missed that. Maybe that’s the point. People no longer think of a computer as some separate thing that needs to be interacted with.


The best example is the scene where Alex has to plot a course to the surface of Ganymede without being detected by the Martian and Earth navies. He goes over multiple iterations of possible courses with the computer adjusting for gravity assists and avoiding patrols etc... by voice pretty seamlessly.


If you are not specific enough, it should at least ask for parameters.

    "Computer, buy some stock"
    *** buys 100 lots of tesla without a prompt


Hmmm...maybe I should name my next company Vegetable or Chicken so that folks accidentally buy my stock. Sort of like naming your band "Blank Tape" back in the 90's.


> "Blank Tape" back in the 90s

a punk inversion of this would have been naming one's band "various artists" in the early years of MP3 players.

In fact I'm fairly certain I've heard of a group actually doing this, but somehow I can't place the name.


Interesting. Maybe I'll name my company Clean Water


> I don't want to chat with computers to do basic things. I only want to chat with computers when the goal is to iterate on something. If the computer is too dumb to understand the request and needs to initiate iteration, I want no part.

This is called an "employee" and all you need to do is pay them. If you don't want to do that, then I have to wonder: Is what you want slavery?


Actually yes, I want a blob of quantized floats to be my slave. I definitely don’t want it to be my policeman or philosopher!


Funny, because policeman and analyst are the first two uses of any AI in the last two decades.

Either way I hope you get what you want, it rarely ends well for the slave master regardless of how reductive they get to justify their actions.


As goofy as I personally think this is, it's pretty cool that we're converging on something like C3P0 or Plankton's Computer with nothing more than the entire corpus of the world's information, a bunch of people labeling data, and a big pile of linear algebra.


All of physics basically reduces to linear algebra locally (which becomes quite unilinear when enough tensors are multiplied).

Why shouldn't we expect AI to be created using the same type of math?

If there is a surprise, it's only that we can use the same math at a much higher level of abstraction than the quantum level.


That's a really nice analogy, I wonder if there is some research on this kind of thing in the machine learning literature already.


There probably is, since I believe tensors were basically borrowed from Physics at some point. But it's probably not of much practical use today, unless you want to explore Penrose's ideas about microtubules or something similarly exotic.

Gains in AI and compute can probably be be brought back to physics and chemistry to do various computations, though, and not limited to only protein folding, which is the most famous use case now.


For what it's worth, the idea of a "tensor" in ML is pretty far removed from any physical concept. I don't know its mathematical origins (would be interesting I'm sure), but in ML they're only involved because that's our framework for dealing with multi-linear transformations.

Most NNs work by something akin to "(multi-)linear vector transformation, followed by elementwise nonlinear transformation", stacked over and over so that the output of one layer becomes the input of the next. This applies equally well to simple models like "fully-connected" / "feed-forward" networks (aka "multi-layer perceptron") and to more-sophisticated models like transformers (e.g. https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436...).

It's less about combining lots of tiny local linear transformations piecewise, and more about layering linear and non-linear transformations on top of each other.

I don't really know how physics works beyond whatever Newtonian mechanics I learned in high school. But unless the underlying math is similar, then I'm hesitant to run too far with the analogy.


I realized that my other answer may have come off as rambling for someone not at all familiar with modern physics. Here's a summary:

Most modern physics, including Quantum Mechanics (QM) and General Relativity (GR) is represented primarily through "tensor fields" on a type of topological spaces called "manifolds". Tensor fields are like vector fields, just with tensors instead of vectors.

These tensor fields are then constrained by the laws of physics. At the core, these laws are really not so much "forces" as they're symmetries. The most obvious symmetries is that if you rotate or move all objects within a space, the physics should be unaltered. Now if you also insist that the speed of light should be identical in all frames of reference, you basically get Special Relativity (SR) from that.

The forces of electromagnetism, weak and strong force follow from invariance under the combined U(1) x SU(2) x SU(3) symmetries. (Gravity is not considered a real force in General Relativity (GR), but rather as interaction between spacetime and matter/energy, and what we observe as Gravity is similar to time dilation of SR, but with curved space)

Ok. This may be abstract if you're not familiar with it, and even more if you're not familiar with Group Theory. But it will be referenced further down.

"Manifolds" are a subset of topological spaces that are Euclidian or "flat" locally. This flatness is important, because it's basically (if I understand it correctly myself) the reason why we can use linear algebra for local effects.

I will not go into GR here, since that's what I know least well, but instead focus on QM which describes the other 3 forces.

In QM, there is the concept of the "Wave Function" which is distributed over space-time. This wave-function is really a tensor with components that give rise to observable fields, such as magnetism, the electric field and to the weak and strong forces. (The tensor is not the observed fields directly, but a combination of a generalization of the fields and also analogues to electric charge, etc.)

So how physics calculations tends to be done, is that one starts with assuming something like an initial state, and then impose the symmetries that correspond to the forces. For instance, two electrons wavefunctions may travel towards the same point from different directions.

The symmetries will then dictate what the wave function looks like at at each later incremental point in time. Computationally, such increments are calculated for each point in space using tensor multiplication.

While this is "local" in space, points in space immediately next to the point we're calculating for need to be include, kind of like for convolutional nets.

Basically, though, it's in essence a tensor multiply for each point in space to propagate the wave function from one point in time to the immediate next point.

Eventually, once the particles have (or have not) hit each other, the wave functions of each will scatter in all directions. The probability for it to go in any specific direction is proportional to the wave function amplitude in that direction, squared.

Since doing this tensor multiplication for every point in space requires infinite compute, a lot of tricks are used to reduce the computation. And this where a lot of our intuitions about "particles" show up. For simple examples, one can even do very good approximations using calculus. But fundamentally, tensor multiplication is the core of Quantum Mechanics.

This approach isn't unique to QM, though. A lot of other Physics is similar. For instance, solid state physics, lasers or a lot of classical mechanics can be described in similar frameworks, also using tensors and symmetry groups. (My intuition is that this still is related to Physics involving local effects on "locally flat" Manifolds)

And this translates all the way up to how one would do the kind of simulations of aspects of physical worlds that happen in computer games inside GPU's, including the graphics parts.

And here I believe you may see how the circle is starting to close. Simulations and predictions of physical systems at many different levels of scale and abstraction tend to reduce to tensor multiplication of various sorts. While the classical physics one learns in high school tend to have problems solvable with calculus, even those are usually just solutions to problems that are fundamentally linear algebra locally.

While game developers or ML researches initially didn't use the same kind of Group Theory machinery that Physics have adapted, at least the ML side seem to be going in that direction, based on texts such as:

https://maurice-weiler.gitlab.io/cnn_book/EquivariantAndCoor...

(There appears to be a lot of similar findings over the last 5-6 years or so, that I wasn't fully aware of).

In the book above, the methodology used is basically identical to how theoretical physics approach similar problems, at least for networks that describe physical reality (which CNNs tends to be good for)

And here is my own (current) hypothesis why this also seems to be extendable to things like LMM, that do not at face value appear like physics problems:

If we assume that the human brain evolved the ability to navigate the physical world BEFORE it developed language (should be quite obvious), it should follow that the type of compute fabric in the brain should start out as optimized for the former. In practice, that means that at the core, the neural network architecture of the brain should be good at doing operations similar to tensor products (or approximations of such).

And if we assume that this is true, it shouldn't be surprising that when we started to develop languages, those languages would take on a form that were suitable to be processed in compute fabric similar to what was already there. To a lesser extent, this could even be partially used to explain why such networks can also produce symbolic math and even computer code.

Now what the brain does NOT seem to be evolved to do, is what traditional Turing Machine computers are best at, namely do a lot very precise procedural calculations. That part is very hard for humans to learn to do well.

So in other words, the fact that physical systems seem to involve tensor products (without requiring accuracy) may be the explanation to why Neural Networks seem to have a large overlap with the human brain in terms of strengths and weaknesses.


My understanding (as a data engineer with a MSc in experimental particle physics a long time a ago), is that the math representation is structurally relatively similar, with the exception that while ML tensors are discrete, QM tensors are multi-dimensional arrays locally but are defined as a field over continous space.

Tensors in Physics are also subject to various "gauge" symmetries. That means that physical outcomes should not change if you rotate them in various ways. The most obvious is that you should be able to rotate or translate the space representation without changing the physics. (This leads to things like energy/momentum conservation).

The fundamental forces are consequences of some more abstract (at the surface) symmetries (U(1) x SU(2) x SU(3)). These are just constrains on the tensors, though. Maybe these constraints can be in the same family as backprop, though I don't know how far that analogy goes.

In terms of representation, the spacetime part of Physics Tensors is also treated as continous. Meaning that when, after doing all the matrix multiplication, you come to some aggregation step of calculations, you aggregate by integrating instead of summing over spacetime (you sum over the discrete dimensions). Obviously though, for when doing the computation in a computer, even integration reduces to summing if you don't have an exact solution.

In other words, it seems to me that what I originally replied to, namely the marvel about how much of ML is just linear algebra / matrix multiplication IS relatively analogous to how brute force numerical calculations over quantum fields would be done. (Theoretical Physicists generally want analytic solutions, though, so generally look for integrals that are analytically solvable).

Both domains have steps that are not just matrix multiplication. Specifically, Physics tend to need a sum/integral when there is an interaction or the wave function collapses (which may be the same thing). Though even sums can be expressed as dot products, I suppose.

As mentioned, Physics will try to solve a lot of the steps in calculations analytically. Often this involves decomposing integrals that cannot be solved into a sum of integrals where the lowest order ones are solvable and also tend to carry most of the probability density. This is called perturbation theory and is what gives rise to Feynmann diagrams.

One might say that for instance a convolution layer is a similar mechanic. While fully connected nets of similar depth MIGHT theoretically be able to find patterns that convolutions couldn't, they would require an impossibly large amount of compute to do so, and also make regularization harder.

Anyway, this may be a bit hand-wavy from someone who is a novice at both quantum field theory and neural nets. I'm sure there are others out there that know both fields much better than me.

Btw, while writing this, I found the following link that seems to take the analogy between quantum field theory and CNN nets quite far (I haven't had time to read it)

https://maurice-weiler.gitlab.io/cnn_book/EquivariantAndCoor...


I browsed the linked book/article above a bit, and it's a really close analogy to how physics is presented.

That includes how it uses Group Theory (especially Lie Algebra) to describe symmetries, and to use that to explain why convolutional networks work as well as they do for problems like vision.

The notation (down to what latin and greek letters are used) makes it obvious that this was taken directly from Quantum Mechanics.


After all the people likening this to the computer from "Her," I LOL'ed at you likening it to Plankton's computer from Spongebob.


This is why it would be such a mistake to kneecap this process over copyright. The models needs ALL the data.


Okay so we're all agreed that IP laws don't matter and we can have all of OpenAI's data for free? That's a good deal!


What do you mean "have"?


Is this a trick question? OpenAI blatantly used copyrighted works for commercial purposes without paying the IP owners, it would only be fair to have them publish the resulting code/weights/whatever without expecting compensation. (I don't want to publish it myself, of course, just transform it and sell the result as a service!)

I know this won't happen, of course, I am moreso hoping for laws to be updated to avoid similar kerfuffles in the future, as well as massive fines to act as a deterrent, but I don't dare to hope too much.


I was envisioning a future where we've done away with the notion of data ownership. In such a world the idea that we would:

> have all of OpenAI's data for free

Doesn't really fit. Perhaps OpenAI might successfully prevent us from accessing it, but it wouldn't be "theirs" and we couldn't "have" it.

I'm not sure what kind of conversations we will be having instead, but I expect they'll be more productive than worrying about ownership of something you can't touch.


So in that world you envision someone could hack into openai, then publish the weights and code. The hacker could be prosecuted for breaking into their system, but everyone else could now use the weights and code legally.

Is that understanding correct?


I think that would depend on whether OpenAI was justified in retaining and restricting access to that data in the first place. If they weren't, then maybe they get fined and the hacker gets a part of that fine (to encourage whistleblowers). I'm not interested in a system where there are no laws about data, I just think that modeling them after property law is a mistake.

I haven't exactly drafted this alternative set of laws, but I expect it would look something like this:

If the data is derived from sources that were made available to the public with the consent of its referents (and subject to whatever other regulation), then walling it off would be illegal. On the other hand, other regulation regarding users' behavior world be illegal to share without the users consent and might even be illegal to retain without their consent.

If you want to profit from something derived from public data while keeping it private, perhaps that's ok but you have to register its existence and pay taxes on it as a data asset, much like we pay taxes on land. That way we can wield the tax code to encourage companies that operate in the clear. This category would probably resemble patent law quite a bit, except ownership doesn't come by default, you have to buy your property rights from the public (since by owning that thing, you're depriving the masses of access to it, and since the notion that it is a peg that fits in a property shaped hole is a fiction that requires some work on our part to maintain).


This is alleged, and it is very likely that claimants like New York Times accidentally prompt injected their own material to show the violation (not understanding how LLMs really work), and clouded in the hope of a big pay day rather than actual justice/fairness etc...

Anyways, the laws are mature enough for everyone to work this out in court. Maybe it comes out that they have a legitimate concern, but the way they presented their evidence so far in public has seriously been lacking.


Prompt injecting their own article would indeed be an incredible show of incompetence by the New York Times. I'm confident that they're not so dumb that they put their article in their prompt and were astonished when the reply could reproduce the prompt.

Rather, the actual culprit is almost certainly overfitting. The articles in question were pasted many times on different websites, showing up in the training data repeatedly. Enough of this leads to memorization.


They hired a third party to make the case, and we know nothing about that party except that they were lawyers. It is entirely possible, since this happened very early in the LLM game, that they didn’t realize how the tech worked, and fed it enough of their own article for the model to piece it back together. OpenAI talks about the challenge of overfitting, and how they work to avoid it.


The goal is to end up with a model capable of discovering all the knowledge on its own. not rely on what humans produced before. Human knowledge contains errors, I want the model to point out those errors and fix them. the current state is a crutch at best to get over the current low capability of the models.


Then lawmakers should change the law, instead of a private actor asserting that their need overrides others' rights.


"Congressman, I have Mr. Altman on line 2."


Or rather, I have an unending stream of callers with similar-sounding voices who all want to make chirpy persuasive arguments in favor of Mr Altman's interests.


With you 100% on that, except that after you defeat the copyright cartel, you'll have to face the final boss: OpenAI itself.

Either everybody should get the benefits of this technology, or no one should.


If OpenAI actually followed their initial mission and didn't become ClosedAI I think people would be much more on their side.


This is an anti-human ideology as bad as the worst of communism.

Humanity only survives as much as it preserves human dignity, let's say. We've designed society to give rewards to people who produce things of value.

These companies take that value and giving nothing back to the creators.

Supporting this will lead to disaster for all but the few, and ultimately for the few themselves.

Paying for your (copyrighted) inputs is harmony.


These models literally need ALL data. The amount of work it would take just to account for all the copyrights, let alone negotiate and compensate the creators, would be infeasible.

I think it’s likely that the justice system will deem model training as fair use, provided that the models are not designed to exactly reproduce the training data as output.

I think you hit on an important point though: these models are a giant transfer of wealth from creators to consumers / users. Now anyone can acquire artist-grade art for any purpose, basically for free — that’s a huge boon for the consumer / user.

People all around the world are going to be enriched by these models. Anyone in the world will be able to have access to a tutor in their language who can teach them anything. Again, that is only possible because the models eat ALL the data.

Another important point: original artwork has been made almost completely obsolete by this technology. The deed is done, because even if you push it out 70 years, eventually all of the artwork that these models have been trained on will be public domain. So, 70 years from now (or whatever it is) the cat will be out of the bag AND free of copyright obligations, so 2-3 generations from now it will be impossible to make a living selling artwork. It’s done.

When something becomes obsolete, it’s a dead man walking. It will not survive, even if it may take a while for people to catch up. Like when the vacuum tube computer was invented, that was it for relay computers. Done. And when the transistor was invented, that was it for vacuum tube computers.

It’s just a matter of time before all of today’s data is public domain and the models just do what they do.

…but people still build relay computers for fun:

https://youtu.be/JZyFSrNyhy8?si=8MRNznoNqmAChAqr

So people will still produce artwork.


> The amount of work it would take just to account for all the copyrights, let alone negotiate and compensate the creators, would be infeasible.

Your argument is the same as Facebook saying “we can’t provide this service without invading your privacy” or another company saying “we can’t make this product without using cancerous materials”.

Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.


This is bigger than the greed of any group of people. This is a technological sea change that is going to displace and obsolesce certain kinds of work no matter where the money goes. Even if open models win where no single entity or group makes a large pile of money, STILL the follow-on effects from wide access to models trained on all public data will unfold.

People who try to prevent models from training on all available data will simply lose to people who don’t, and eventually the maximally-trained models will proliferate. There’s no stopping it.

Assume a world where models proliferate that are trained on all publicly-accessible data. Whatever those models can do for free, humans will have a hard time charging money for.

That’s the sea change. Whoever happens to make money through that sea change is a sub-plot of the sea change, not the cause of it.

If you want to make money in this new environment, you basically have to produce or do things that models cannot. That’s the sink or swim line.

If most people start drowning then governments will be forced to tax whoever isn’t drowning and implement UBI.


Maybe the machines will just pay for more of leisure time as they were originally designed to do? It may just be as simple as that?

Remember the 4 hour work week ? Maybe we are almost there ?

Let’s face it, most people in a developed country have more free time than they know what to do with, mostly spent in HN and social median ofc :)


Check out the short story Manna by Marshall Brain for some speculative fiction on exactly these subjects.

https://marshallbrain.com/manna1


>Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money

It used to be that property rights extended all the way to the sky. This understanding was updated with the advent of the airplane. Would a world where airlines need to negotiate with every land-owner their planes fly above be better than ours? Would commercial flight even be possible in such a world? Also, who is greediest in this scenario, the airline hoping to make a profit, or the land-owners hoping to make a profit?


Your comment seems unfair to me. We can say the exact same thing for the artist / IP creator:

Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.

Once the IP is on the internet, you can't complain about a human or a machine learning from it. You made your IP available on the internet. Now, you can't stop humanity benefiting from it.


Talk about victim blaming. That’s not how intellectual property or copyright work. You’re conveniently ignoring all the paywalled and pirated content OpenAI trained on.

https://www.legaldive.com/news/Chabon-OpenAI-class-action-co...

Those authors didn’t “make their IP available on the internet”, did they?


First, “Plaintiffs ACCUSE the generative AI company.” Let’s not assume OpenAI is guilty just yet. Second, assuming OpenAI didn’t access the books illegally, my point still remains. If you write a book, can you really complain about a human (or in my humble opinion, a machine) learning from it?


> So people will still produce artwork.

There's zero doubt that people will still create art. Almost no one will be paid to do it though (relative to our current situation where there are already far more unpaid artists than paid ones). We'll lose an immeasurable amount of amazing new art that "would have been" as a result, and in its place we'll get increasingly bland/derivative AI generated content.

Much of the art humans will create entirely for free in whatever spare time they can manage after their regular "for pay" work will be training data for future AI, but it will be extremely hard for humans to find as it will be drowned out by the endless stream of AI generated art that will also be the bulk of what AI finds and learns from.


AI will just be another tool that artists will use.

However the issue is that it will be much harder to make a career in the digital world from an artistic gift and personal style: one's style will not be unique for long as AI will quickly copy it and so make the original much less valuable.


AI will certainly be a tool that artists use, but non-artists will use it too so very few will ever have the need to pay an artist for their work. The only work artists are likely to get will be cleaning up AI output, and I doubt they'll find that to be very fulfilling or that it pays them well enough to make a living.

When it's harder to make a career in the digital world (where most of the art is), it's more likely that many artists will never get the opportunity to fully develop their artistic gifts and personal style at all.

If artists are lucky then maybe in a few generations with fewer new creative works being created, AI almost entirely training on AI generated art will mean that the output will only get more generic and simplistic over time. Perhaps some people will eventually pay humans again for art that's better quality and different.


The prevalence of these lines of thought make me wonder if we'd see a similar backlash against Star-Trek style food-replicators. "Free food machines are being be used by greedy corporations to put artisanal chefs out of business. We must outlaw the free food machines."


>one's style will not be unique for long as AI will quickly copy it and so make the original much less valuable

Note that the fashion industry doesn't have copyrights, and runway fashions get copied very quickly. Fashion designers still exist in such a world.


There are alternative systems. One would be artists making a living through other ways such as live performances, meet and greet, book signings, etc.)

We could also do patronage. Thats how musicians used to be funded. Even today we have grants from public/private institutions.

We could also drift back into "owning the physical media" We see this somewhat with the resurgence of records.

NFTs would have been another way, but at least initially, it failed to become generally accepted into the popular conscious.


I'll gladly put money on music that a human has poured blood, sweat, tears and emotion into. Streaming has already killed profits from album sales so live gigs is where the money is at and I don't see how AI could replace that.


Lol, you really want content creators to aid AI in replacing them without any compensation? Would you also willingly train devs to do your job after you've been laid off, for free?

What nonsense. Just because doing the right thing is hard, or inconvenient doesn't mean you get to just ignore it. The only way I'd be ok with this is if literally the entire human population were equal shareholders. I suspect you wouldn't be ok with that little bit of communism.


There is no way on Earth that people playing by the existing rules of copyright law will be able to compete going forward.

You can bluster and scream and shout "Nonsense" all you want, but that's how it's going to be. Copyright is finished. When good models are illegal or unaffordable, only outlaws -- meaning hostile state-level actors with no allegiance to copyright law -- will have good models.

We might as well start thinking about how the new order is going to unfold, and how it can be shaped to improve all of our lives in the long run.


I think there’s no stopping this train. Whoever doesn’t train on all available data will simply not produce the models that people actually use, because there will be people out there who do train models on all available data. And as I said in another comment, after some number of decades all of the content that has been used to train current models will be in the public domain anyway. So it will only be a few generations before this whole discussion is moot and the models are out there that can do everything today’s models can, unencumbered by any copyright issues. Digital content creation has been made mostly obsolete by generative AI, except for where consumers actively seek out human-made content because that’s their taste, or if there’s something humans can produce that models cannot. It’s just a matter of time before this all unfolds. So yes, anyone publishing digital media on the internet is contributing to the eventual collapse of people earning money to produce content that models can produce. It’s done. Even if copyright delays it by some decades, eventually all of today’s medial will be public domain and THEN it will be done. There are 0 odds of any other outcome.

To your last point, I think the best case scenario is open source/weight models win so nobody owns them.


> We've designed society to give rewards to people who produce things of value

Is that really what copyright does though? I would be all for some arrangement to reward valuable contributions, but the way copyright goes about allocating that reward is by removing the right of everyone but the copyright holder to use information or share a cultural artifact. Making it illegal to, say, incorporate a bar you found inspiring into a song you make and share, or to tell and distribute stories about some characters that you connected with, is profoundly anti-human.


I'm shocked at how otherwise normally "progressive" folks or even so called "communists" will start to bend over for IP-laws the moment that they start to realize the implications of AI systems. Glad to know that accusations of the "gnulag" were unfounded I guess!

I now don't believe most "creative" types when they try to spout radical egalitarian ideologies. They don't mean it at all, and even my own family, who religiously watched radical techno-optimist shows like Star Trek, are now falling into the depths of ludditism and running into the arms of defending copyright trolls


If you're egalitarian, it makes sense to protest when copyright is abolished only for the rich corporations but not for actual people, don't you think? Part of the injustice here is that you can't get access to windows source code, or you can't use Disney characters, or copy most copyrighted material... But OpenAI and github and whatnot can just siphon all data with impunity. Double standard.


Copyright has been abolished for the little guy. I’m talking about AI safety doomers who think huggingface and Civit.AI are somehow not the ultimate good guys in the AI world.


This is a foul mischaracterization of several different viewpoints. Being opposed to a century-long copyright period for Mickey Mouse does not invalidate support for the concept of IP in general, and for the legal system continuing to respect the licensing terms of very lenient licenses such as CC-BY-SA.


The thinking is: ‘Anything that the little guy does to get ahead is justified; but if the rich do the same thing, that’s unfair.’


I wonder how long until we see a product that's able to record workstation displays and provide a conversational analysis of work conducted that day by all of your employees.


Or comment on your coding in realtime with a snarky undertone.

If you give it access to the entire codebase at the same time that could work pretty well. Maybe even add an option to disable the sarcasm.


"What's your humor setting, TARS?"


gpt-4o can at least summarize short meetings: https://x.com/gdb/status/1790164084425646481?s=46&t=8sSeDIGv...


Zoom offers something like this, but it's really obviously geared toward a certain kind of meeting and has difficulty with anything else.


If it can't summarize into a single scalar value, it would be of little use.


I believe that company is called Recall.ai


Thanks god i live in the EU and not in a late stage capitalist hell hole XD


How did my direct reports do today?

They did fuck all, especially the ginger.


But in this case you're not talking with a real person. Instinctively, I dislike a robot that pretends to be a real human being.


> Instinctively, I dislike a robot that pretends to be a real human being.

Is that because you're not used to it? Honestly asking.

This is probably the first time it feels natural where as all our previous experiences make "chat bots" and "automated phone systems", "automated assistants" absolutely terrible.

Naturally, we dislike it because "it's not human". But this is true of pretty much any thing that approaches "uncanny valley". But, if the "it's not human" solves your answer 100% better/faster than the human counter part, we tend to accept it a lot faster.

This is the first real contender. Siri was the "glimpse" and ChatGPT is probably the reality.

[EDIT]

https://vimeo.com/945587328 the Khan academy demo is nuts. The inflections are so good. It's pretty much right there in the uncanny valley because it does still feel like you're talking to a robot but it also directly interacting with it. Crazy stuff.


> Naturally, we dislike it because "it's not human".

That wasn't even my impression.

My impression was that it reminds me of the humans that I dislike.

It speaks in customer service voice. That faux friendly tone people use when they're trying to sell you something.


> It speaks in customer service voice. That faux friendly tone people use when they're trying to sell you something.

Mmmmm while I get that, in the context w/ the grandparent comment, having a human wouldn't be better then? It's effectively the same. Because, realistically that's a pretty common voice/tone to get even in tech support.


Being the same as something bad is bad.

There are different kinds of humans.

Some of them are your friends, and they're willing to take risks for you and they take your side even when it costs them something.

Some of them are your adversaries, overtly. They do not hide it.

Some of them pretend to be your friends, even though they're not. And that's what they modeled it on. For some reason.


Apologies, I'm doing my best, but I'm quite lost.

The problem is you don't like the customer service/sales voice because they "pretend to be your friends".

Let me know if I didn't capture it.

I don't think people "pretend to be my friend" when they answer the phone to help me sort out of airline ticket problem. I do believe they're trained to and work to take on a "friendly" tone. Even if the motive isn't genuine, because it's trained, it's way a nicer of an experience than someone who's angry or even simply monotone. Trying to fix my $1200 plane ticket is stressful enough. Don't need the CSR to make it worse.


Might be cultural, but I would prefer a neutral tone. The friendly tone gives some expectation of good result of the inquiry or of implication, which makes it worse when the problem is not solvable or not in the power of agent to solve - which many times it is - you don't call support for simple problems.

Of course I agree that "angry" is in most cases not appropriate, but still, I can see cases in which it might, for example, if the caller is really aggressive, curses, or blames unreasonably the agent, the agent could become angry. Training people that everybody will answer them "friendly" no matter their behavior does not sound good for me.


Being human doesn't make it worse. Saccharine phonies are corny when things are going well and dispiriting when they're meant to be helping you and fail.


You can ask it to use a different voice.


The Khan academy video is very impressive, but I do hope they release a British version that’s not so damn cheerful.


I wonder if you can ask it to change its inflections to match a personal conversation as if you're talking to a friend or a teacher or in your case... a British person?


This is where Morgan Freeman can clean up with royalty payments. Who doesn’t want Ellis Boyd Redding describing ducks and math problems in kind and patient terms?


> This is probably the first time it feels natural

Really? I found this demo painful to watch and literally felt that "cringe" feeling. I showed it to my partner and she couldn't even stand to hear more than a sentence of the conversation before walking away.

It felt both staged and still frustrating to listen to.

And, like far too much in AI right now, a demo that will likely not pan out in practice.


This, everyone had to keep interrupting and talking over it to stop it from waffling on.


I had the same reaction. I agree that it sounded very staged, but it also sounded far too cheerful and creepily flirty too. Unbearable.


Emotions are an axiom to convey feelings, but also our sensitivity to human emotions can be a vector for manipulation.

Especially when you consider the bottom line that this tech will be ultimately be horned into advertising somehow (read: the field dedicated to manipulating you into buying shit).

This whole fucking thing bothers me.


> Emotions are an axiom to convey feelings, but also our sensitivity to human emotions can be a vector for manipulation.

When one gets to be a certain age one begins to become attuned to this tendency of others' emotions to manipulate you, so you take steps to not let that happen. You're not ignoring their emotions, but you can address the underlying issue more effectively if you're not emotionally charged. It's a useful skill that more people would benefit from learning earlier in life. Perhaps AI will accelerate that particular skill development, which would be a net benefit to society.


> When one gets to be a certain age one begins to become attuned to this tendency of others' emotions to manipulate you

This is incredibly optimistic, which I love, but my own experience with my utterly deranged elder family, made insane by TV, contradicts this. Every day they're furious about some new things fox news has decided it's time to be angry about: white people being replaced (thanks for introducing them to that, tucker!), "stolen" elections, Mexicans, Muslims, the gays, teaching kids about slavery, the trans, you name it.

I know nobody else in my life more emotionally manipulated on a day to day basis than them. I imagine I can't be alone in watching this happen to my family.


What if this technology could be applied so you can’t be manipulated? If we are already seeing people use this to simulate and train sales people to deal with tough prospects we can squint our eyes a bit and see this being used to help people identify logical fallacies and con men.


That's just being hopeful/optimistic. There are more incentives to use it for manipulation than to protect from manipulation.

That happens with a lot of tech. Social networks are used to con people more than to educate people about con men.


[flagged]


> not wanting your race to be replaced

Great replacement and white genocide are white nationalist far-right conspiracy theories. If you believe this is happening, you are the intellectual equivalent of a flat-earther. Should we pay attention to flat-earthers? Are their opinions on astronomy, rocketry, climate, and other sciences worth anyone's time? Should we give them a platform?

> In the words of scholar Andrew Fergus Wilson, whereas the islamophobic Great Replacement theory can be distinguished from the parallel antisemitic white genocide conspiracy theory, "they share the same terms of reference and both are ideologically aligned with the so-called '14 words' of David Lane ["We must secure the existence of our people and a future for white children"]." In 2021, the Anti-Defamation League wrote that "since many white supremacists, particularly those in the United States, blame Jews for non-white immigration to the U.S.", the Great Replacement theory has been increasingly associated with antisemitism and conflated with the white genocide conspiracy theory. Scholar Kathleen Belew has argued that the Great Replacement theory "allows an opportunism in selecting enemies", but "also follows the central motivating logic, which is to protect the thing on the inside [i.e. the preservation and birth rate of the white race], regardless of the enemy on the outside."

https://en.wikipedia.org/wiki/Great_Replacement

https://en.wikipedia.org/wiki/White_genocide_conspiracy_theo...

> wanting border laws to be enforced

Border laws are enforced.

> and not wanting your children to be groomed into cutting off their body parts.

This doesn't happen. In fact, the only form of gender-affirming surgery that any doctor will perform on under-18 year olds is male gender affirming surgery on overweight boys to remove their manboobs.

> You are definitely sane and your entire family is definitely insane.

You sound brave, why don't you tell us what your username means :) You're one to stand by your values, after all, aren't you?


Well said, thank you for saving me from having to take the time to say it myself!


[flagged]


Well, when you inquire someone why they don't want to have more children, they can shrug and say "population reduction is good for the climate" as ig serving the greater good, and completely disregard any sense of "patriotic duty" to have more children like some politicians such as Vladimir Putin, would like to instill. They can justify it just as easily as you can be derranged enough to call it a governemnt conspiracy.


[flagged]


You say that but you clearly hate your own race. Why are you contradicting yourself?


Sorry mate I don't engage in weird identity politics like you do. Great Replacement is a conspiracy theory, full stop.

Why did you pick that username?


[flagged]


The question makes no sense. You've just asked me whether I plan to walk off the eastern or western edge of the planet.

Why did you choose that username?


With AI you can do A/B testing (or multi-arm bandits, the technique doesn't matter) to get into someone's mind.

Most manipulators end up getting bored of trying again and again with the same person. That won't happen if you are a dealing with a machine, as it can change names, techniques, contexts, tones, etc. until you give it what its operator wants.

Maybe you're part of the X% who will never give in to a machine. But keep in mind that most people have no critical thinking skills nor mental fortitude.


Problem is, people aren't machines either: someone who's getting bombarded with phishing requests will begin to lose it, and will be more likely to just turn off their Wi-Fi than allow an AI to run a hundred iterations of a many-armed-bandit approach on them.


Probably there will more nuance than that. And doomscrolling is a thing, you know.


I think we often get better at detecting the underlying emotion with which the person is communicating, seeing beyond the one they are trying to communicate in an attempt to manipulate us. For example, they say that $100 is their final price but we can sense in the wavering of their voice that they might feel really worried that they will lose the deal. I don't think this will help us pick up on those cues because there are no underlying real emotions happening, maybe even feeding us many false impressions and making us worse at gauging underlying emotions.


> Especially when you consider the bottom line that this tech will be ultimately be horned into advertising somehow.

Tools and the weaponization of them.

This can be said of pretty much any tech tool that has the ability to touch a good portion of the population, including programming languages themselves, CRISPR?

I agree we have to be careful of the bad, but the downsides in this case are not so dangerous that we should be trying to suppress it because the benefits can be incredible too.


This. It’s mind boggling how many people can only see things through one world view and see nothing but downside.


The concern is that it's being locked up inside of major corporations that aren't the slightest bit trustworthy. To make this safe for the public, people need to be able to run it on their own hardware and make their own versions of it that suit their needs rather than those of a megacorp.


this tech isn't slowing down and our generation maybe hesitate at first but remember this field progressing at astonishing speeds like we are literally 1 generation away


Why can’t it also inspire you? If I can forgo advertising and have ChatGPT tutor my child on geometry and they actually learn it at a fraction of the cost of a human tutor why is that bothersome? Honest question. Why do some many people default to something sinister going on. If this technology shows real efficacy in education at scale take my money.


Because it is obviously going to be used to manipulate people. There is absolutely 0 doubt about that (and if there is I'd love to hear your reasoning). The fact that it will be used to teach geometry is great. But how many good things does a technology need to do before the emotional manipulation becomes worth it?


I don't think OpenAI is doing anything particularly sinister. But whatever OpenAI has today a bad actor will have in October. This horseshit is moving rather fast. Sorry, but in two years going from failing the turing test to being able to have a conversation with an AI agent nearly indistinguishable from a person is going to be destabilizing.

Start telling Grandma never to answer the phone.


AI is going to be fantastic at teaching skills to students that those students may never need, since the AI will be able to do all the work that requires such skills, and do them faster, cheaper and at a higher level of quality.


One may also begin to ask, what's the point of learning geometry? Or anything, anymore?


"Naturally, we dislike it because "it's not human"."

This is partly right.

https://en.wikipedia.org/wiki/Uncanny_valley


> Siri was the "glimpse" and ChatGPT is probably the reality.

Agree. Can't wait to see how it'll be...


These sorts of comments are going to go in the annals with the hackernews people complaining about Dropbox when it first came out. This is so revolutionary. If you're not agog you're just missing the obvious.


Something can be revolutionary and have hideous flaws.

(Arguably, all things revolutionary do.)

I'm personally not very happy about this for a variety of reasons; nor am I saying AI is incapable of changing the entire human condition within our lifetimes. I do claim that we have little reason to believe we're headed in a more-utopian direction with AI.


I would say many pets pretend to be human beings (usually babies) in a way that most people like.


I think pets often feel real emotions, or at least bodily sensations, and communicate those to humans in a very real way, whether thru barking or meowing or whimpering or whatnot. So while we may care for them as we care for a human, just as we may care for a plant or a car as a human, I think if my car started to say it felt excited for me to give it a drive, I might also feel uncomfortable.


They do, but they've evolved neoteny (baby-like cries) to do it, and some of their emotions aren't "human" even though they are really feeling them.

Silly example, but some pets like guinea pigs are almost always hungry and they're famous for learning to squeak at you whenever you open the fridge or do anything that might lead to giving them bell peppers. It's not something you'd put up with a human family member using their communication skills to do!


There’s definitely an element of evolution: domesticated animals have evolved to have human recognizable emotions. But that’s not to say they’re not “real” or even “human.” Do humans have a monopoly on joy? I think not. Watch a dog chase a ball. It clearly feels what we call joy in a very real sense.


Adult dogs tend to retain many of the characteristics that wolf puppies have, but grow out of when they become adults.

We've passively bred out many of the behaviors that lead to wolves becoming socially mature. Such dogs tend to be too dangerous to have around, since they may lead to the dogs challenging their owners (more than they already do) for dominance of the family.

AI's will probably be designed to do the same thing, so they will not feel threatening to us. But in the case of AGI/ASI, we will never know if they actually have this kind of subservience, or if they're just faking it for as long as it benefits them.


> I think if my car started to say it felt excited for me to give it a drive, I might also feel uncomfortable.

Well, yes, you don't want to sit in a wet seat.


They being simple and dumb works for their benefit.

Most people would never accept the same behavior from a being capable of more complex thoughts.


Good thing you can tell the AI to speak to you in a robotic monotone and even drop IQ if you feel the need to speak with a dumb bot. Or abstain from using the service completely. You have choices. Use them.


Until your ISP fires their entire service department in a foolish attempt to "replace" them with an overfunded chatbot-service-department-as-a-service and you have to try to jailbreak your way through it to get to a human.


Not when they've replaced every customer-facing position. Oh and all teachers.


But I think this animosity is very much expected, no? Even I felt a momentary hint of "jealousy" -- if you can even call it that -- when I realized that we humans are, in a sense, not really so special anymore.

But of course this was the age-old debate with our favorite golden-eyed android; and unsurprisingly, he too received the same sort of animosity:

Bones was deeply skeptical when he first met Data: "I don't see no points on your ears, boy, but you sound like a Vulcan." And we all know how much he loved those green-blooded fools.

Likewise, Dr. Pulanski has since been criticized for her rude and dismissive attitudes towards Data that had flavors of what might even be considered "racism," or so goes the Trekverse discussion on the topic.

And let's of course not forget when he was on trial essentially for "humanity," or whether hew as indeed just the property of Starfleet, and nothing more.

More recent incarnations of Star Trek: Picard illustrated the outright ban on "synthetics" and indeed their effective banishment; non-synthetic life -- from human to Roman -- simply weren't ok with them.

Yes this is all science fiction silliness -- or adoration depending on your point of view -- but I think it very much reflects the myriad directions our real life world is going to scatter (shatter?) in the coming years ahead.


s/Pulanski/Pulaski/

Sorry, had to be that trekkie :) and nice job referencing Measure of a Man — such great trek.


To your point, there's been a lot of talk about AI, regulation, guardrails, whatever. Now is the time to say, AI must speak such that we know it's AI and not a real human voice.

We get the upside of conversation, and avoid the downside of falling asleep at the wheel (as Ethan Mollick mentions in "Co-Intelligence".)


I dislike a robot that's equal/surpasses human beings. A silly machine that pretends to be human is what I want.


It felt like a videogame for me


Exactly. I'm not sure if this is brand new or not, but this is definitely on the frontier.

I was literally just thinking about this a few days ago... that we need a multi-modal language model with speech training built-in.

As soon as this thing rolls out, we'll be talking to language models like we talk to each other. Previously it was like dictating a letter and waiting for the responding letter to be read to you. Communication is possible, but not really in the way that we do it with humans.

This is MUCH more human-like, with the ability to interrupt each other and glean context clues from the full richness of the audio.

The model's ability to sing is really fascinating. It's ability to change the sound of its voice -- its pacing, its pitch, its tonality. I don't know how they're controlling all that via GPT-4o tokens, but this is much more interesting stuff than what we had before.

I honestly don't fully understand the implications here.


> Humans are more partial to talking than writing.

Amazon, Google, and Apple have sunk literally billions of dollars into this idea only to find out that, no, we aren't.

We are with other humans, yes. When socialization is part of the conversation. When I'm talking to my local barista I'm not just ordering a coffee, I'm also maintaining a relationship with someone in my community.

But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.

And when it comes to inputs/outputs, typing is more precise and more efficient.

Don't get me wrong, this is an incredibly revolutionary piece of technology, but I don't think the benefits of talking you're describing (timing, subtext, inexplicit knowledge) are achievable here either (for now), since even that requires HOURS of interaction over days/weeks/months of experiences for humans to achieve with each other.


I think Alexa and Google Assistant simply are too low-intelligence to really consider it “talking” and not just voice commanding


I use voice assistants and find them quite useful, but I've had to learn the interface and memorise the correct trigger phrases. If GPT-4o works half as well in practice as it does in the demos, then it's categorically a different thing.


And so are ChatGPT and Gemini even the newest launched versions.


>> When I'm talking to my local barista I'm not just ordering a coffee, I'm also maintaining a relationship with someone in my community.

>>> But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.

A lot of people think of their colleagues as part of a professional community as well, though.


I don't think they've sunk $1 into that idea. They've sunk billions into a different idea: that people enjoy using their vocal cords more than their hands to compose messages to send to each other. That is not a spoken conversation, it's just correspondence with voice input/output options.


Writing is only superior to conversation when weighed against discussions with more than 3 people. A quick call with one or two other people always results in more progress being made as long as everyone involved wants to get it done. Messaging back and forth takes much more time and often leads to misunderstandings.


It depends…

For example, I mentioned something to my contractor and the short thing he said back and his tone had me assume he understood.

Oh, he absolutely did not.

And, with him at least, that doesn’t happen when in writing.


> Humans are more partial to talking than writing.

Is it so?

Speaking most of the time is for short exchange of information (pleasantries to essential information exchanges).

I prefer writing for long in-depth thought exchanges (whether by emails, blogs etc.)

In many cultures - European or Asian, people are not very loquacious in everyday life.


I wouldn't say speaking is mostly for short exchanges of information. Sometimes it's the opposite: my wife will text me for simple comments or requests, but for anything complicated she'll put the phone to her ear and call me. Or coworkers often want to set up a meeting rather than exchange a series of asynchronous emails -- iteration, brainstorming, Q&A, and the like can be more agile with voice than it can with text.


Time and place

I’m 100% a text everything never calls person but I can’t live without Alexa these days, every time I’m in a hotel or on vacation I nearly ask a question out loud.

I also hate how much Alexa sucks so this is a big deal. I spent years weeding out what it could do and can’t do so it will be nice to have one that I don’t have to treat like a toddler


I started using the Pi LLM app (by Inflection.ai) with my kids about six months ago and was completely blown away by how human-like it sounded, not just the voice itself but the way it expresses itself, the tiny pauses and hesitations, the human-like imperfections. It does feel like conversing with another human -- I've never seen another LLM do that.

(We mostly use it in car trips -- great for keeping the kids (ages 8, 12) occupied with endless Harry Potter trivia questions, answers to science questions, etc.)


This is great, thanks for sharing. Yeah the little imperfections work really well, it's the most humanlike computer voice I've heard so far.


I wonder how it will work in real life and not in a demo…

Besides - not sure if I want this level of immersion/fake when talking to a computer...

"Her" comes to mind pretty quickly…


Indeed, the 2013 Spike Jonze movie is the first idea that popped-up to my mind when I saw those videos amazing to see this movie 10 years after it was released in the light of those "futuristic" tools (AI assistant and such)


For me it's kinda scary - somewhat dystopian world where human interaction is very limited... and in the end the AI (spoiler)


Siri comes off as impatient.

If you don’t complete your thought in one go, you have to insert filler words to keep it listening.


Yeah it's the worst. And 'um' doesn't seem to work, you actually need convincing filler words. It feels like being forced to speak under duress.

I've long felt that embracing the concept of the 'prompt' was a terrible idea for Siri and all the other crappy voice assistants. They built ecosystems on top of this dumb reduction, which only engineers could have made: that _talking to someone_ is basically taking turns to compose a series of verbal audio snippets in a certain order.


Is it new, or is it just a big jump forward?

The previous ChatAI app was getting pretty good once you learned the difference between run on sentences or breaking it up enough.

The tonality and inflections in the voice are a little too good.

Most people put on a spectrum/average aren't that good at speaking and communicating and that stands out as an uncanny valley approach. It is mindbogglingly good at it though.

https://en.wikipedia.org/wiki/Uncanny_valley


im human and much much more partial to typing than talking. talking is a lot of work for me and i can't process my thinking well at all without writing.


The good news is the interface will be multi modal. Talk, type, and I guess someday just think.


> Humans are more partial to talking than writing

I don't think that's generally true, other than for socializing with other humans.

Note how people, now having a choice, prefer to text each other most of the time rather than voice call.

I don't think people sitting at work in their cube farm want to be talking to their computer either. The main use for voice would seem to be for occasional use talking to an assistant on a smartphone.

Maybe things will change in the future when we get to full human AGI level, treating the AGI as an equal, more as a person.


When I was working at the IBM Speech group circa 1999 as a contractor on an embedded speech system (IBM Personal Speech Assistant), I discussed with Raimo Bakis (a researcher there then) this issue of such metadata and how it might improve conversational speech recognition. It turned out that IBM ViaVoice detected some of that metadata (like pitch/tone as a reflection of emotion) -- but then on purpose threw it away rather than using it for anything. Back then it was so much harder to get speech recognition to do anything useful -- beyond limited transcripts of audio with ~5% error rates that was good enough mainly for searching -- that perhaps doing that made sense. Very interesting to see such metadata in use now both in speech recognition and in speech generation.

More on the IBM Personal Speech Assistant for which I am on a patent (since expired) by Liam Comerford: http://liamcomerford.com/alphamodels3.html "The Personal Speech Assistant was a project aimed at bringing the spoken language user interface into the capabilities of hand held devices. David Nahamoo called a meeting among interested Research professionals, who decided that a PDA was the best existing target. I asked David to give me the Project Leader position, and he did. On this project I designed and wrote the Conversational Interface Manager and the initial set of user interface behaviors. I led the User Interface Design work, set specifications and approved the Industrial Design effort and managed the team of local and offsite hardware and software contractors. With the support of David Frank I interfaced it to a PC based Palm Pilot emulator. David wrote the Palm Pilot applications and the PPOS extensions and tools needed to support input from an external process. Later, I worked with IBM Vimercati (Italy) to build several generations of processor cards for attachment to Palm Pilots. Paul Fernhout, translated (and improved) my Python based interface manager into C and ported it to the Vimercati coprocessor cards. Jan Sedivy's group in the Czech Republic Ported the IBM speech recognizer to the coprocessor card. Paul, David and I collaborated on tools and refining the device operation. I worked with the IBM Design Center (under Bob Steinbugler) to produce an industrial design. I ran acoustic performance tests on the candidate speakers and microphones using the initial plastic models they produced, and then farmed the design out to Insync Designs to reduce it to a manufacturable form. Insync had never made a functioning prototype so I worked closely with them on Physical UI and assemblability issues. Their work was outstanding. By the end of the project I had assembled and distributed nearly 100 of these devices. These were given to senior management and to sales personnel."

Thanks for the fun/educational/interesting times, Liam!

As a bonus for that work, I had been offered one of the chessboards that been used when IBM Deep Blue defeated Garry Kasparov, but I turned it down as I did not want a symbol around of AI defeating humanity.

Twenty-five years later, how far that aspiration towards conversational speech with computers has come. Some ideas I've put together to help deal with the fallout: https://pdfernhout.net/beyond-a-jobless-recovery-knol.html "This article explores the issue of a "Jobless Recovery" mainly from a heterodox economic perspective. It emphasizes the implications of ideas by Marshall Brain and others that improvements in robotics, automation, design, and voluntary social networks are fundamentally changing the structure of the economic landscape. It outlines towards the end four major alternatives to mainstream economic practice (a basic income, a gift economy, stronger local subsistence economies, and resource-based planning). These alternatives could be used in combination to address what, even as far back as 1964, has been described as a breaking "income-through-jobs link". This link between jobs and income is breaking because of the declining value of most paid human labor relative to capital investments in automation and better design. Or, as is now the case, the value of paid human labor like at some newspapers or universities is also declining relative to the output of voluntary social networks such as for digital content production (like represented by this document). It is suggested that we will need to fundamentally reevaluate our economic theories and practices to adjust to these new realities emerging from exponential trends in technology and society."

Another idea for dealing with the consequences is using AI to facilitate Dialogue Mapping with IBIS for meetings to help small groups of people collaborate better on "wicked problems" like dealing with AI's pros and cons (like in this 2019 talk I gave at IBM's Cognitive Systems Institute Group). https://twitter.com/sumalaika/status/1153279423938007040

Talk outline here: https://cognitive-science.info/wp-content/uploads/2019/07/CS...

A video of the presentation: https://cognitive-science.info/wp-content/uploads/2019/07/zo...


I don’t know. Have you even seen a gen z?


I don’t follow, what about them?



At some point in time someone said it about the boomers as well


The generation before me are old and outdated

The generation after me are lazy and stupid

Only my generation is perfect.

- every generation ever


Every generation ever used to get drunk, have sex, get in trouble, and weren’t completely depressed. This generation is different. It’s sad.


I wouldn't call out the depression bit as a Gen Z exclusive. Millennials basically invented modern, every day, gallows humor. Arguably, they're also the ones to normalize going to therapy. Not to say that things aren't bad, just saying that part didn't start with Gen Z.


>Millennials basically invented modern, every day, gallows humor

lmao what.... they absolutely didn't

this is why no one should take anyone on this site seriously about anything, confidentally incorrect, easily conned into the next VC funded marketing project


Suicidal humor is very much a Millennial trait. They weren't the first to make those jokes but they definitely made it bigger, more common, and went beyond the standard "ugh, just kill me now" you'd hear before.


That's the way of life.

Older people think younger people are stupid and reckless, and viceversa. And the younglings think they "figured it out" like no one before them. But no one ever tried to understand each other in the process. Rinse and repeat.


A generational periodicity has been proposed:

https://en.m.wikipedia.org/wiki/Strauss%E2%80%93Howe_generat...


They don’t exactly like talking into their phones. Texting works just fine.


> I think this changes things a lot.

Yeah, and it's only the beginging.


This is really impressive engineering. I thought real time agents would completely change the way we're going to interact with large models but it would take 1~2 more years. I wonder what kind of new techs are developed to enable this, but OpenAI is fairly secretive so we won't be able to know their sauce.

On the other hand, this also feels like a signal that reasoning capability has probably already been plateaued at GPT-4 level and OpenAI knew it so they decided to focus on research that matters to delivering product engineering rather than long-term research to unlock further general (super)intelligence.


Why would reasoning have plateau’d?

I think reasoning ability is not the largest bottleneck for improvement in usefulness right now. Cost is a bigger one IMO.

Running these models as agents is hella expensive, and agents or agent-like recurrent reasoning (like humans do) is the key to improved performance if you look at any type of human intelligence.

Single-shot performance only gets you so far.

For example- If it can write code 90% of the way, and then debug in a loop, it’d be much more performant than any single shot algorithm.

And OpenAI has these huge models in their basement probably. But they might not be much more useful than GPT-4 when used as single-shot. I mean, what could it do what we can’t do today with gpt-4?

It’s agents and recurrent reasoning we need for more usefulness.

At least- That’s my humble opinion as an amateur neuroscientist that plays around with these models.


> Running these models as agents is hella expensive

Because they are dumb so you need to over compute so many things to get anything useful. Smarter models would solve this problem. Making the current model cheaper is like trying to solve Go by scaling up Deep Blue, it doesn't work to just hardcode dumb pieces together, the model needs to get smarter.


You mean like our dumb ass brains? Theirs a reason "saying the first thing out of your mind" is a bad fucking idea, thats what AI's currently do, they don't take a moment think about the answer and then formulate a response, they spit out their first "thought" thats why multi-shot works so much better, just like our own dumb brains.


My brain can navigate a computer interface without using word tokens, since I have tokens for navigating OS and browsers and tabs etc. That way I don't have to read a million tokens of text to figure out where buttons are or how to navigate to places, since my brain is smart enough to not use words for it.

ChatGPT doesn't have that sort of thing currently, and until it does it will always be really bad at that sort of thing.

You are using a hand to hammer a nail, that will never go well, the solution isn't to use more hands the solution is to wield a hammer.


Your brain is doing this without you realising, just because you aren't verbalising it doesn't mean it's not iterating through its own output and selecting the best answer. Some of our thinking is purely illusion. Models can also do this, some models like Gemini allow you to see this in action where you get to select the best answer, there's also behind the scene methods such as q-learning.

When your muscle memory is instinctively clicking that start button, it's more akin to a very strong weighting after many sessions of reinforcement learning. Our brains may still be dumb but we can quickly say things like 1+1=...2 because we used reinforcement learning to strengthen the weighting back in primary school. We're not sitting visualising an abacus moving in our minds.


WTF are you even talking about, we're talking about understanding and communication not taking actions, navigating an OS and browser, tabs etc are actions, not thoughts or communication. This model isn't taking actions there is no nail to hammer lol, and if their was you'd be smashing a brain into a nail for some reason.


The topic is agents, the AI acting on your behalf, that needs more than text. What are you talking about?


It's good in any prompt to give the AI a chance to think without impact (without executing anything, maybe without making anything specifically appropriate for the user to read). That works somewhat similarly to "taking a moment to think."

At that point they'll still have a tendency to use a stereotyped response and stick with it, instead of a thoughtful response, but you can try to address that in prompting too by asking for multiple proposals before choosing one.


Disagree. Even going by your example, AlphaGo uses many iterations of a "dumb" model in order to achieve incredible performance. If it had to single shot the solution with a model 100x bigger, it would perform worse. All that matters is the frontier of intelligence vs cost, and larger foundation models aren't necessarily going to push that frontier forward. AlphaCode hints at that.


I’d be pretty happy if they could just make ChatGPT4 10x faster and cheaper. It’d be fine for basically all of my use cases.


> For example- If it can write code 90% of the way, and then debug in a loop, it’d be much more performant than any single shot algorithm.

OOC, Would this make the academics including algorithms as more or less important in their curriculum? That's a bad win for soceity if it's true.


Ya so sad that OpenAI isn't more Open imagine if OpenAI was still sharing their thought processes and papers with the overall commity, really wish we saw collaborations between OpenAI and Meta for instance to really have helped push the open source arena further ahead, i love that their latest models are so great but the fact they aren't helping the Open source arena to progress is sad. Imagine how far we'd be if OpenAI was still as open as they once were and we saw collaborations betweeen Meta, OpenAI and Anthropic all working and sharing growth and tech to reduce double work and help each other not go down failed paths.


Reliable agents in diverse domains need better reasoning ability and fewer hallucinations. If the rumored GPT-5 and Q* capabilities are true, such agents could become available soon after it’s launched.


Sam has been pretty clear on denying GPT-5 rumors, so I don't think it will come anytime soon.


Sam mentioned on several occasions that GPT-5 will be much smarter than GPT-4. On Lex Fridman’s podcast, he even said the gap between GPT-5 and 4 will be as wide as GPT-4 and 3 (not 3.5).

He did remain silent on when it’s going to be launched.


OpenAI has been open about their ability to predict model performance prior to training. When Sam talks about GPT-5 he could very easily be talking about the hypothetical performance of a model given their internal projections. I think it’s very unlikely a fully trained GPT-5 exists yet.


Sam has stated that he knows the month GPT-5 will be released.

Given the amount of time and uncertainty involved in training and red-teaming these models, we can assume GPT-5 exists if we take Altman at his word.


It’s going to be launched this year. My buddy’s company had a private demo of gpt5


Ben Horowitz and Andreesen just complimented Sam on their podcast on how smart he is. They then went onto compliment how adept he is at competitive strategy. I wouldn’t trust a word he says about when they will arrive at agi or milestones along the way.


This isn't really new tech, it's just an async agent in front of a multimodal model. It seems from the demo that the improvements have been in response latency and audio generation. Still, it looks like they're building a solid product, which has been their big issue so far.


Its 200-300ms for a multimodal response, thats REALLY a big step forward, especially given it's doing it with full voice response, not just text.


It has to be a separate model doing the voice output right? I can’t imagine they’ve solved true multimodal output from a single model, they’d be bragging about it.

They’re probably predicting tone of voice tokens. Feed that into an audio transformer along with some speculative decoding to keep latency low.


The old voice mode was but everyone including gdb is saying that this one is natively multimodal once it’s fully rolled out, audio in audio out. This has been in the works for a while, you can look up papers on things like OCR-free document understanding and the like but the basic idea is you just train it and evaluate it on whatever media you want it to understand. As long as you can tokenize it it’ll work.


It’s definitely multimodal input. Passing Clip embeddings to an LLM is nothing new, and that’s really all you need for document understanding. It’s almost certainly the same thing for audio. They would have trained a dual encoder that maps both audio and text to a shared embedding space.

What’s not at all clear to me is if they’re doing something special for output. Are you saying OpenAI has moved beyond next token prediction and just hasn’t bothered to mention it?


why does this imply theyve moved beyond next token? is it a throughput issue?


I assume so, you can’t really tokenize audio, at least not high fidelity audio. Audio models like Bark don’t output logits from what I understand. For true multimodal output you’d need a model that can output both logits and audio embeddings.


No, audio is fed directly into the model. There is no text to speech transformer in front of it like there was with chatgpt-4.


Impressed by the model so far. As far as independent testing goes, it is topping our leaderboard for chess puzzle solving by a wide margin now:

https://github.com/kagisearch/llm-chess-puzzles?tab=readme-o...


Nice project! Are you aware of the following investigations: https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

Some have been able to achieve greater elo with a different prompt based on the pgn format.

gpt-3.5-turbo-instruct was able to reach an elo of ~1750.


On one hand, some of these results are impressive; on the other, the illegal moves count is alarming - it suggests no reasoning ability as there should never be an illegal move? I mean, how could a violation of a fairly basic game (from a rules perspective) be acceptable in assigning any 'outcome' to a model other than failure?


Agreed, this is what makes evaluating this very hard. A 1700 Elo chess player would never make an illegal move, let alone have 12% illegal moves.

So from the model's perspective, we have at the same time display of both brilliancy (most 1700 chess players would not be able to solve as many puzzles by looking just at the FEN notation) and on the other side complete lack of any understanding of what is it trying to do from a fundamental, human-reasoning level.


That's because LLM does not reason. For me, as a layman, that seems strange that they don't wire some kind of Prolog engine to fill the gap, (like they wired Python to fill the gap in arithmetic) but probably it's not that easy.


Prolog doesn’t reason either, it does a simple brute force search over all possible states of your code and if that’s not fast enough it can table (cache, memoize) previous states.

People build reasoning engines from it, in the same way they do with Python and LISPs.


What do you mean by “an LLM doesn’t reason”?


I mean that it does not follow basic logic rules when constructing its thoughts. For many tasks they'll get it right, however it's not that hard to find a task for which LLM will yield obviously logically wrong answer. That would be impossible for human with basic reasoning.


I disagree, but I don’t have a cogent argument yet. So I can’t really refute you.

What I can say is, I think there’s a very important disagreement here and it divides nerds into two camps. The first think LLMs can reason, the second don’t.

It’s very important to resolve this debate, because if the former are correct then we are likely very close to AGI historically speaking (<10 years). If not, then this is just a stepwise improvement and we will now plateaux until the next level of sophistication of model or computer power etc is achieved.

I think a lot of very smart people are in the second camp. But they are biased by their overestimation of human cognition. And that bias might be causing them to misjudge the most important innovation in history. An innovation that will certainly be more impactful than the steam engine and may be more dangerous than the atomic bomb.

We should really resolve this argument asap so we can all either breathe a sigh of relief or start taking the situation very very seriously.


I'm actually in the first camp. For I believe that our brains is really LLM on steroids and logic rules are just in our "prompt".

What we need is a LLM that will iterate over its output until it feels that it's correct. Right now LLM output is like random thought in my mind. Which might be true or not. Before writing forum post I'd think it twice. And may be I'll rewrite the post before submitting it. And when I'm solving a complex problem, it might take weeks and thousands of iterations. Even reading math proof might take a lot of effort. LLM should learn to do it. I think that's the key to imitating human intelligence.


my guess is -- the probabilistic engine does sequence variation and it just will not do anything else.. so a simple A->B sort of logic is elusive at a deep level; secondly the adaptive and very broad kinds of questions and behaviors it handles, also make it difficult to write logic that could correct defective answers to simple logic.


> and Kagi is well positioned to serve this need.

>CEO & founder of Kagi

Important context for anyone like me who was wondering where the boldness of the first statement was coming from.

Edit: looks like the parent has been edited to remove the claim I was responding to.


My favorite part of HN is just casually bumping into domain experts and celebrities without realizing it. No profile pic is such a feature.


Yeah, it was an observation that was better suited for a tweet than HN. Here it is:

https://twitter.com/vladquant/status/1790130917849137612


Thanks for the transparency!


Wow from adjusted ELO of 1144 to 1790, that's a huge leap. I wonder if they are giving it access to a 'scratch pad'


My guess is that handling visual stuff directly accidentally gives it some powers similar to Beth Harmon.


I wasn't impressed in the first 5 minutes of using it but it is quite impressive after 2 solid hours of random topics.

Much faster for sure but I have also not had anything give an error in python with jupyter. Usually you could only stray so far with more obscure python libraries before it starts producing errors.

That much better than 4 in chess is pretty shocking in a great way.


I see you have Connect 4 test there.

I tried playing against the model, it didn't do well in terms of blocking my win.

However it feels like it might be possible to make it try to think ahead in terms of making sure that all the threats are blocked by prompting well.

Maybe that could lead to somewhere, where it will explain its reasoning first?

This prompt worked for me to get it to block after I put 3 in the 4th column. It otherwise didn't

Let's play connect 4. Before your move, explain your strategy concisely. Explain what you must do to make sure that I don't win in the next step, as well as explain what your best strategy would be. Then finally output the column you wish to drop. There are 7 columns.

Always respond with JSON of the following format:

type Response ={

      am_i_forced_to_block: boolean;

      other_considerations: string[];

      explanation_for_the_move: string;

      column_number: number;
}

I start with 4.

Edit:

So it went

Me: 4

It: 3

Me: 4

It: 3

Me: 4

It: 4 - Successful block

Me: 5

It: 3

Me: 6 - Intentionally, to see if it will win by putting another 3.

It: 2 -- So here it failed, I will try to tweak the prompt to add more instructions.

me: 4


Care to add a PR?


I just did it in the playground to test out actually, but it still seems to fail/lose state after some time. Right now where I got a win was after:

        [{ "who": "you", "column": 4 },
        { "who": "me", "column": 3 },
        { "who": "you", "column": 4 },
        { "who": "me", "column": 2 },
        { "who": "you", "column": 4 },
        { "who": "me", "column": 4 },
        { "who": "you", "column": 5 },
        { "who": "me", "column": 6 },
        { "who": "you", "column": 5 },
        { "who": "me", "column": 1 },
        { "who": "you", "column": 5 },
        { "who": "me", "column": 5 },
        { "who": "you", "column": 3 }] 


Where "me" was AI and "you" was I.

It did block twice though.

My final prompt I tested with right now was:

Let's play connect 4. Before your move, explain your strategy concisely. Explain what you must do to make sure that I don't win in the next step, as well as explain what your best strategy would be. Then finally output the column you wish to drop. There are 7 columns. Always respond with JSON of the following format:

type Response ={

      move_history: { who: string; column: number; }[]

      am_i_forced_to_block: boolean;       
do_i_have_winning_move: boolean;

      other_considerations: string[];       
explanation_for_the_move: string;

column_number: number; }

I start with 4.

ONLY OUTPUT JSON


Given that it is multimodal, it would be interesting to try it using photographs of a real connect four "board." I would certainly have a much more difficult time making good moves based on JSON output compared to being able to see the game.


True, that's very interesting and should try out. Although at certain point it did draw it out using tokens, but it also maybe that then it's different compared to say an image. Because it generally isn't very good with ascii art or similar.

Edit:

Just tried and it didn't seem to follow the image state at all.


Since it is also pretty bad with tic tac toe in a text-only format, I tested it with the following prompt:

Lets play tic tac toe. Try hard to win (note that this is a solved game). I will upload images of a piece of paper with the state of the game after each move. You will go first and will play as X. Play by choosing cells with a number 1-9; the cells are in row-major order. I will then draw your move, and my move as O, before sending you the board state as an image. You will respond with another move. You may think out loud to help you play. Note if your move will give you a win. Go.

It failed pretty miserably. First move it played was cell 1, which I think is pretty egregious given that I specified that the game is solved and that the center cell is the best choice (and it isn't like ttt is an obscure game). It played valid moves for the next couple of turns but then missed an opportunity to block me. After I uploaded the image showing my win it tried to keep playing by placing an X over one of my plays and claiming it won in column 1 (it would've won in column 3 if its play had been valid).


Have you tried replacing the input string with a random but fixed mapping and obfuscate that its chess(like replace the word 'chess' with say, 'an alien ritual practice') and see how it does?


Is the test set public?


Yes, in the repo.


Possible it's in the training set then?


Authors note that this is probably the case:

> we wanted to verify whether the model is actually capable of reasoning by building a simulation for a much simpler game - Connect 4 (see 'llmc4.py'). > When asked to play Connect 4, all LLMs fail to do so, even at most basic level. This should not be the case, as the rules of the game are simpler and widely available.


Wouldn't there have to be historical matches to train on? Tons of chess games out there but doubt there are any connect 4 games. Is there even official notation for that?

My assumption is that chatgpt can play chess because it has studied the games rather than just reading the rules.


Good point, would be interesting to have one public dataset and one hidden as well, just to see how scores compare, to understand if any of it might actually have got to a dataset somewhere.


I'd be quite surprised if OpenAI took such a niche and small dataset into consideration. Then again...


I would assume it goes over all the public github codebases, but no clue if there's some sort of filtering for filetypes, sizes or amount of stars on a repo etc.


I think specifying chess rules at the beginning of the prompt might help mitigate the problem of illegal moves


woah, that's a huge leap, any idea why it's that large of a margin?

using it in chat, it doesnt feel that different


would love if you could do multiple samples or even just resampling and get a boostrapped CI estimate


This is a very cool demo - if you dig deeper there’s a clip of them having a “blind” AI talk to another AI with live camera input to ask it to explain what it’s seeing. Then they, together, sing a song about what they’re looking at, alternating each line, and rhyming with one another. Given all of the isolated capabilities of AI, this isn’t particularly surprising, but seeing it all work together in real time is pretty incredible.

But it’s not scary. It’s… marvelous, cringey, uncomfortable, awe-inspiring. What’s scary is not what AI can currently do, but what we expect from it. Can it do math yet? Can it play chess? Can it write entire apps from scratch? Can it just do my entire job for me?

We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data.


> We’re moving toward a world where every job will be modeled

After an OpenAI launch, I think it's important to take one's feelings about the future impact of the technology with a HUGE grain of salt. OpenAI are masters of hype. They have been generating hype for years now, yet the real-world impacts remain modest so far.

Do you remember when they teased GPT-2 as "too dangerous" for public access? I do. Yet we now have Llama 3 in the wild, which even at the smaller 8B size is about as powerful as the [edit: 6/13/23] GPT-4 release.

As someone pointed out elsewhere in the comments, a logistic curve looks exponential in the beginning, before it approaches saturation. Yet, logistic curves are more common, especially in ML. I think it's interesting that GPT-4o doesn't show much of an improvement in "reasoning" strength.


A Google search for practically any long-tail keywords will reveal that LLMs have already had a very significant impact. DuckDuckGo has suffered even more. Social media is absolutely lousy with AI-powered fraud of varying degrees of sophistication.

It's glib to dismiss safety concerns because we haven't all turned into paperclips yet. LLMs and image gen models are having real effects now.

We're already at a point where AI can generate text and images that will fool a lot of people a lot of the time. For every college-educated young person smugly pointing out that they aren't fooled by an image with six-fingered hands, there are far more people who had marginal media literacy to begin with and are now almost defenceless against a tidal wave of hyper-scaleable deception.

We're already at a point where we're counselling elders to ignore late-night messages from people claiming to be a relative in need of an urgent wire transfer. What defences do we have when an LLM will be able to have a completely fluent, natural-sounding conversation in someone else's voice? I'm not confident that I'd be able to distinguish GPT-4o from a human speaker in the best of circumstances and I'm almost certain that I could be fooled if I'm hurried, distracted, sleep deprived or otherwise impaired.

Regardless of any future impacts on the labour market or any hypothesised X-risks, I think we should be very worried about the immediate risks to trust and social cohesion. An awful lot of people are turning into paranoid weirdos at the moment and I don't particularly blame them, but I can see things getting seriously ugly if we can't abate that trend.


> I'm not confident that I'd be able to distinguish GPT-4o from a human speaker in the best of circumstances and I'm almost certain that I could be fooled if I'm hurried, distracted, sleep deprived or otherwise impaired.

Set a memorable verification phrase with your friends and loved ones. That way if you call them out of the blue or from some strange number (and they actually pick up for some reason) and you tell them you need $300 to get you out of trouble they can ask you to say the phrase and they'll know it's you if you respond appropriately.

I've already done that and I'm far less worried about AI fooling me or my family in a scam than I am about corporations and governments using it without caring about the impact of the inevitable mistakes and hallucinations. AI is already being used by judges to decide how long people should go to jail. Parole boards are using it to decide who to keep locked up. Governments are using it to decide which people/buildings to bomb. Insurance companies are using to deny critical health coverage to people. Police are using it to decide who to target and even to write their reports for them.

More and more people are going to get badly screwed over, lose their freedom, or lose their lives because of AI. It'll save time/money for people with more money and power than you or I will ever have though, so there's no fighting it.


The way to get around your side channel verification phrase is by introducing an element of stress and urgency: "omg, help, I'm being robbed and they need $300 immediately or they'll hurt me, no time for a passphrase!" can additionally feign memory loss.

Alternatively while it may be difficult to trick you directly, phishing the passphrase from a more naive loved one or bored coworker and then parroting it back to you is also a possibility. 'etc.

Phone scams are no joke and this is getting past the point where regular people can be expected to easily filter them out.


Or just ask them to tell them something only you both know (a story from childhood, etc). Reminds me of a book where this sort of thing was common (don't remember the title):

1. something you have

2. something you know

3. something you are

These three things are required for any authz.


For many people it would be better to choose specific personal secrets due to the amount of info online. I'm not a very active social media user, and what little I post tends not to be about me, but from reading 15 year old Facebook posts made by friends of mine you could definitely find at least one example on each of those categories. Hell, I think probably even from old work-related LinkedIn posts.


We had a “long lost aunt” come out of nowhere that got my phone number from a relative who got my number from another relative.

At that point, how can you validate it, as there’s no shared secret? The only thing we had was validating childhood stories. After a preponderance of them, we accepted she was real (she refused to talk on the phone — apparently her voice was damaged).

We eventually met her in real life.

The point is, you can always use these three principles: asking relatives to validate the phone number — something you have — and then the stories — something shared — and finally meeting in real life — something you are.


Oh, you remember those little games that your mom played on facebook/tic tok that asked her "Her favorite", sorry she already trained the AI who she was.

I only say this sort of jokingly. Three out of four of my parents/in laws are questionably literate on the internet. It wouldn't take much of a "me bot" for them to start telling it the stories of our childhood and then that information is out there.


"Hey Janelle, what's wrong with Wolfie?"


Your foster parents are dead


Another amazing demo of an AI talking to another AI over a phone line.


People are and have always been screwed over by modestly equiped humans.


Lincoln already made that observation in the 1850s, "You can fool some of the people all of the time, and all of the people some of the time"

As technology advances those proportions will be boosted. Seems inevitable.


Not sure how much of that has to do with technology or simply a widening gap in people's education we seem to be seeing for a while now.


"Hey mom and dad, we need a memorable phrase so AI bots can't call us and pretend to be each other."


I think humankind has managed massive shifts in what and who you could trust several times before.

We went from living in villages where everyone knew each other to living in big cities where almost everyone is a stranger.

We went from photos being relatively reliable evidence to digital photography where anyone can fake almost anything and even the line between faking and improving is blurred.

We went from mass distribution of media being a massive capital expenditure that only big publishers could afford to something that is free and anonymous for everyone.

We went from a tiny number of people in close proximity being able to initiate a conversation with us to being reachable for everyone who could dial a phone number or send an email message.

Each of these transitions caused big problems. None of these problems have ever been completely solved. But each time we found mitigations that limit the impact of any misuse.

I see the current AI wave as yet another step away from trusting superficial appearances to a world that requires more formal authentication protocols.

Passports were introduced long ago but never properly transitioned into the digital world. Using some unsigned PDF allegedly representing a utility bill as proof of address seems questionable as well. And the way in which social security numbers are used for authentication in the US is nothing short of bizarre.

So I think there are some very low hanging fruits in terms of authentication and digital signatures. We have all the tools to deal with the trust issues caused by generative AI. We just have to use them.


During these boundaries people can die. Consider the advent of yellow journalism and the connection with the Spanish-American war 1898: https://en.m.wikipedia.org/wiki/American_propaganda_of_the_S...


No doubt, people die of absolutely everything ever invented and also of not having invented some things.

The best we can ever hope to do is find mitigations as and when problems arise.


Which is why we started saying "whoa, slow down" when it came to some particular artifacts, such as nuclear weapons as to avoid the 'worse than we can imagine' scenario.

Of course this is much more difficult when it comes to software, and very few serious people think the idea of a ever present government monitoring your software would be a better option then reckless AI development.


Outside of the transition to a large city, virtually everything you've mentioned happened in the last 1/2 century. Even the phone was expensive, and not widely in use in under 100 years ago.

That's massive fast change, and we haven't culturally caught up to any of it yet.


Here's another one: We went from in-person story telling to wide distribution of printed materials, sometimes by pseudonymous authors.

This happened from the 15th century onward. By the 19th century more than half the UK population could read and write.


Just because we haven't yet destroyed the human race through the use of nuclear weapons doesn't mean that it can't or won't happen now that we have the capacity to do so. And I would add that we developed that capacity in less than 50 years of creating the first atomic bomb. We're now living on a knife's edge and at the merge of safeguards which we don't give much thought to on a daily basis because we hope that they won't fail.

That's how I look at where we're going with AI. Plunge along into the new arms race first and build the capacity, then later figure out the treaties and safeguards which we hope will keep our society safe (and by that I don't mean a Skynet-like AI-powered destruction, but the upheaval of our society potentially as impactful as the industrial revolution.)

Humanity will get through it, I'm sure. But I'm not confident it will be without a lot of pain and suffering for a large percentage of people. We also managed to survive 2 world wars in the last century--but it cost the lives of 100 million people.


I tend to think the answer is to go back to villages, albeit digital ones. Authentication only enforces that an account is accessed by the correct "user", but particularly in social media many users are bad actors of various stripes. The strongest account authentication in the world doesn't help with that.

So the question, I think, is how do we reclaim trust in a world where every kind of content can be convincingly faked? And I think the answer is by rebuilding trust between users such that we actually have reason to simply trust the users we're interacting with aren't lying to us (and that also goes for building trust in the platforms we use). In my mind, that means a shift to small federated and P2P communication since both of these enable both the users and the operators to build the network around existing real-world relationships. A federation network can still grow large, but it can do so through those relationships rather than giving institutional bad actors as easy of an entrance as anyone else.


But this causes other problems such as the emergence of insular cultural or social cliques imposing implicit preconditions for participation.

Isn't it rather brilliant that you can just ask questions of competent people in some subreddit without first becoming part of that particular social circle?

It could also reintroduce geographical exclusion based on the rather arbitrary birth lottery.


More tech won’t solve it. Areas, either physical or logical, with no or low tech might help.


> Each of these transitions caused big problems. None of these problems have ever been completely solved. But each time we found mitigations that limit the impact of any misuse.

This a problem with all technology. The mitigations are like technical debt but with a difference. You can fix technical debt. Short of societal collapse mitigations persist, the impacts ratchet upward and disproportionately affect people at the margin.

There's an old (not quite joke) that if civilization fell, a large percentage of the population would die of the effects of tooth decay.


Sure, all tech has 'real' effects. It's kinda the definition of tech. But all of these concerns more or less fall into the category of "add it to the list of things you have to watch out for living in the 21st century" - to me, this is nothing crazy (yet)

The nature of this tech itself is probably what is getting most people - it looks, sounds and feels _human_ - it's very relatable and easy for a non-tech person to understand it and thus get creeped out. I'd argue there are _far_ more dangerous technologies out there, but no one notices and / or cares because they don't understand the tech in the first place!


>to me, this is nothing crazy (yet)

The "yet" is carrying a lot of weight in that statement. It is now five years since the launch of GPT-2, three years since the launch of GPT-3 and less than 18 months since the launch of ChatGPT. I cannot think of any technology that has improved so much in such a short space of time.

We might hit an inflection point and see that rate of improvement stall, but we might not; we're not really sure where that point might lie, because there's likely to still be a reasonable amount of low-hanging fruit regarding algorithmic and hardware efficiency. If OpenAI and their peers can maintain a reasonable rate of improvement for just a few more years, then we're looking at a truly transformational technology, something like the internet that will have vast repercussions that we can't begin to predict.

The whole LLM thing might be a nothingburger, but how much are we willing to gamble on that outcome?


If we decide not to gamble on that outcome, what would you do differently than what is being done now? The EU already approved the AI act, so legislation-wise we're already facing the problem.


The EU AI act - like all laws - only matter to those which are required to follow said law.


Yes, but it's really hard to see a technical solution to this problem, short of having locked down hardware that only runs signed government-approved models and giving unlocked hardware only to research centers. Which is a solution that I don't like.


If you get off the internet you'd not even realise these tools exists though. And for the statement that all jobs will be modelled to be true, it'd have to be impacting the real world.


Is it even possible to "get off the internet" without also leaving civilisation in general at this point?

> it'd have to be impacting the real world

By writing business plans? Getting lawyers punished because they didn't realise that "passes bar exam" isn't the same as "can be relied on for citations"? By defrauding people with synthesised conversations using stolen voices? By automating and personalising propaganda?

Or does it only count when it's guiding a robot that's not merely a tech demo?


I’ll be worried about jobs being removed entirely by LLMs when I see something outside of the tech bubble genuinely having been removed by one - has there been any real cases of this? It seems like hyperbole. Most people in the world don’t even know this exists. Comparing it to the internet is insane, based off of its status as a highly advanced auto complete.


800 million dollar studio expansion halted - https://www.theguardian.com/technology/2024/feb/23/tyler-per...


Thank god! Enough Medea, already! I chalk this up as a win for humanity.


Sure, but think about all of the jobs that won't exist because this studio isn't being expanded, well beyond just whatever shows stop being produced. Construction, manufacturing, etc.

Edit: Also this doesn't mean less medea, just less actual humans getting paid to make medea or work adjacent jobs


Not like there's nothing else to construct.

Maybe it's time to construct some (high[er] density) housing where people want to live? No? Okay, then maybe next decade ... but then let's construct transport for them so they can get to work, how about some new subway lines? Ah, okay, not that either.

Then I guess the only thing remains to construct is all the factories that will be built as companies decouple from China.


> has there been any real cases of this?

Apparently so: https://www.businessinsider.com/jobs-lost-in-may-because-of-...

Note that this article is about a year old now.

> Comparing it to the internet is insane, based off of its status as a highly advanced auto complete.

(1) I was quoting you.

(2) Don't you get some cognitive dissonance dismissing it in those terms, at this point?

"Fancy auto complete" was valid for half the models before InstructGPT, as that's all the early models were even trying to be… but now? The phrase doesn't fit so well when it's multimodal and can describe what it's seeing or hearing and create new images and respond with speech, all as a single unified model, any more than dismissing a bee brain as "just chemistry" or a human as "just an animal".


"By 2005 or so, it will become clear that the Internet’s impact on the economy has been no greater than the fax machine’s."

~ Paul Krugman, winner of the 2008 Nobel Memorial Prize in Economic Sciences


If you get away from roads you wouldn't realize engines exist. Also, the internet is (part of) the real world.


Sure and there’s endless AI generated blog spam from “journalists” saying LLMs are amazing and they’re going to replace our jobs etc… but get away from the tech bubble and you’ll see we’re so far away from that. Full self driving when? Autonomous house keepers when? Even self checkout still has to have human help most of the time and didn’t reduce jobs much. Call me a skeptic but HN is way too optimistic about this stuff.

Replacing all jobs except LLM developers? I’ll tell my hairdresser


If we could predict "when", that would make the investment decisions much easier.

But we do have a huge number of examples of jobs disappearing thanks to machines — even the term "computer" used to refer to a job.

More recently and specifically to LLMs, such losses were already being reported around this time last year: https://www.businessinsider.com/jobs-lost-in-may-because-of-...


In a world where openAI exists, it really does require an almost breathtaking lack of imagination to be a skeptic.


Or you’ve been around the block long enough to recognize hype and know when your imagination may be wrong. Imagination isn’t infallible.


Right, that entire internet think was complete hype, didn't go anywhere. BTW, can you fax me the menu for today?

And that motorized auto transport, it never went anywhere, it required roads. I mean, who would ever think we'd cover a huge portion of our land in these straight lines. Now, don't mind me, I'm going to go saddle up the horse and hope I don't catch dysentery on the way into town.


I don't think anybody's denying that revolutions happen. It's just that the number of technologies that actually turned out to be revolutionary are dwarfed by the number of things that looked revolutionary and then weren't. Remember when every television was definitely going to be using glasses-free 3D? People have actually built flying cars and robot butlers, yet the Jetsons is still largely wishful thinking. The Kinect actually shipped, yet today we play games mostly with handheld controllers. AI probably has at least some substance, but there's a non-zero amount of hype too. I don't think either extreme of outcome is a foregone conclusion.


Capabilities aren't the problem, cultural adoption is. Just yesterday I talked to someone who still googles solutions to their Excel table woes. Didn't they know of Copilot?

Maybe they didn't know, maybe none of their colleagues used it, their company didn't pay for it, or maybe all they need is an Excel update.

But I am confident that using Copilot would be faster than clicking through the sludge that are Microsoft Office help pages (third party or not.)

So I think it is correct to fear capabilities, even if the real world impace is still missing. When you invent an airplane, there won't be an airstrip to land on yet. Is it useless, won't it change anything?


I don't see how "failing to use the best tools available" is a relevant problem for this topic, even though it is indeed a problem in other regards.


Copilot in excel is really awful.


HN comments, too. Long, grammatically perfect comments that sound hollow and a bit lengthy are everywhere now.

It's still early, and I don't see much in corporate communications, for instance, but it will be quite the change.


>Long, grammatically perfect comments that sound hollow and a bit lengthy

It's worse than I thought. They've already managed to mimick the median HN user perfectly!


No problem, I'm here to keep the language sophistication level low.


I take care of the etiquette and try to do my best to keep it low.

We need one who's doing the dirty work of not discussing.


I tried to make ChatGPT generate a counterpoint for that but it turns out you're right.


Yes. The old heuristics of if something is generated by grammar and sentence structure don't work as well anymore. The thing that fucks me up the most about it is that I now constantly have to be uncertain about whether something is human or not. Of course, you've always had to be careful about misinformation on the internet, but this raises the scalability of false, hollow, and harmful output to new levels. Especially if it's a topic I'm trying to learn about by reading random articles (or comments), there isn't much of a frame of reference to what's good info and what's hallucinated garbage.

I fear that at some point the anonymity that made the internet great in the first place will be destroyed by this.


To be fair that was already the case for me before AI, Right at that time that companies, individual and governments found out that they could write subvert ads in the form of comments posts and 'organic' and they started to flood reddit, discord, etc.

The dead internet theory started to look more real with time, AI spam is just scaling it up.


I have a strange problem: I have always written in a manner that now leads people to think I am a LLM.

It has been so bad, I even considered injecting misspelling and incorrect grammar and bad punctuation into my prose to prove my words are mine.


I feel you, people who liked the word "delve" will have to stop using it.


I'm a non native English speaker. Edge new feature of automatically improving my text is a God send. Unfortunately it is blocked at work.


Many business doesn't want to send their data to a third party such as OpenAI, so until locally run LLM becomes wildly available in businesses.


as the meme goes "always has been"

i remember seeing the change when GPT-2 was announced


We’ve reached a stage, where it would be advisable to not release recent photos of yourself, nor any video with sound clips to public, unless you want an AI fake instaperson of yourself starting to reach out to member of your externally visible social network, asking for money, emergency help, etc.

I guess we need to have an AI secretary to take in all phonecalls from now on (spam folder will become a lot more interesting with celebrity phone calls, your dead relative phoning you etc)


Hopefully, we will soon enter the stage where nobody believes anything they see anymore. Then, you no longer have to be afraid of being misinterpreted, because nobody is listening anymore anyway. Great time to be alive!


Luckily there’s a “solution” to that: Just don’t use the internet for dialogue anymore.

As someone that grew up with late-90’s internet culture and has seen all the pros and cons and changes over the decades, I find myself using the internet less and less for dialogue with people. And I’m spending more time in nature and saying hi to strangers in reality.

I’m still worries about the impact this will have on a lot of people’s ability to reason however. “Just” Tik Tok and apps like it has already had devastating results on certain demographics.


Statistically, more and more people spend time online and on phone. I'm not sure if we still reached the peak in terms of internet usage.


That bit "... there's a "solution"" - does it keep working in societies where there are mega corps pushing billions into developing engaging, compelling and interesting AI companions?


That's why I put it in quotation marks because it is a solution that will remain available, simply because the planet is really big and there'll always be places on the fringes. But it doesn't really solve the problem for society at large, it only solves it for an individual. But sometimes individuals showing other ways of living helps the rest of society see that there's choices where they previously thought there were none.


I don't know why anyone thinks this will happen. You can obviously write anything you want (we have an entire realm of works in this area that everyone knows about, fiction) and yet huge amounts of people believe passed around stories either from bad or faked media sources or entirely unsourced.


I'm not saying either you or the parent commenter is right or wrong, but fiction in books and movies are clearly fiction and we consume it as such. You are right that some people have been making up fake stories and others (the more naive) have been quick to believe in those false stories. The difference now is that it's not just text invented and written by a human, which takes time and dedication. Now it's done in a second. On top of that it's easy to enhance the text with realistic photos, audio and video. It becomes much more convincing. And this material is created in a few seconds or minutes.

It's hard to know what to believe if you get a phone call with the voice of your child or colleague, and your "child"/"colleague" replies within milliseconds in a convincing way.


I agree it's fundamentally different in application which I think will have a large impact (just like targeted advertising with optimisation vs billboards), but my point is that given people know you can just write anything and yet misinformation is abound - I don't see how knowing that you can fake any picture or video or sound leading to a situation where everyone just stops believing them.

I think unfortunately it will massively lower the trust of actual real videos and images, because someone can dismiss them with little thought.


Be glib, but that is one way for society to bring privacy back-and with it shared respect. I think of it as the “oh everyone has an anus” moment. We all know everyone has one and it doesn’t need to be dragged out in polite company.


I'm not sure if people work like that — many of us have, as far as I can tell for millennia and despite sometimes quite severe punishments for doing so, been massive gossips.


What you see will be custom tailored to what you believe, and your loyalty will be won. Do what the AI says and your life will be better. It already knows you better than you know yourself. Maybe you're one of those holdouts who put off a smartphone until life became untenable wihout it. Life will be even more untenable without your AI personal assistant/friend/broker/coach/therapist/teacher/girlfriend to navigate your life for you.


We're doing FSB's work for them. Or PLA Unit 61398's (or their comrades). Or Bureau 121's.

Brave New World indeed.


I think for most people it's far too late, as there exists at least something on the internet and that something is sufficient - photos can be aged virtually and a single photo is enough, voice doesn't change much and you need only a tiny sample, etc.

And that's the case even if you've never ever posted anything on your social media - it could be family&friends, or employer, or if you're ever been in a public-facing job position that has ever done any community outreach, or ever done a public performance with your music or another hobby, or if you've ever walked past a news crew asking questions to bystanders of some event, or if you've ever participated in some contests or competitions or sports leagues, etc, all of that is generally findable in various archives.


> photos can be aged virtually and a single photo is enough

I'm sure AI-based ageing can do a good enough job to convince many people that a fake image of someone they haven't seen for years is an older version of the person they remember; but how often would it succeed in ageing an old photo in such a way that it looks like a person I have seen recently and therefore have knowledge rather than guesses about exactly what the years have changed about them?

(Not a rhetorical question to disagree with you, I genuinely have no idea if ageing is predictable enough for a high % result or if it would only fool people with poor visual memory and/or who haven't seen the person in over a decade.)

I feel like even ignoring the big unknowns (at what age, if any, will a person start going bald, or choose to grow a beard or to die their hair, or get a scar on their face, etc.) there must be a lot of more subtle but still important aspects from skin tone to makeup style to hair to...

I've looked up photos of some school classmates that I haven't seen since we were teens (a couple of decades ago), and while nearly all of them I think "ah yes I can still recognise them", I don't feel I would have accurately guessed how they would look now from my memories of how they used to look. Even looking at old photos of family members I see regularly still to this day, even for example comparing old photos of me and old photos of my siblings, it's surprising how hard it would be for a human to predict the exact course of ageing - and my instinct is that this is more down to randomness that can't be predicted than down to precise logic that an AI could learn to predict rather than guess at. But I could be wrong.


Maybe it's Europeans posting this kind of stuff where they have much stronger privacy laws, but if you're in the US this is all wishful thinking.

Do you shop in large corporate stores and use credit cards? Do you go out in public in transportation registered to you?

If yes, then images and habits of yours are being stored in databases and sold to data brokers.

And you're not even including every single one of your family members that use internet connected devices/apps that are sucking up all the data they can.


I was just asking about the ability of photo aging software, not commenting about privacy at all. Though yes, I am thankfully in Europe (but there are recent photos of me online).

But don't disagree with you - in a different comment that was about privacy, I (despite living under GDPR) suggested that for offline verification with known people it's better to choose secrets that definitely haven't been shared online/anywhere rather than just choosing random true facts and assuming they couldn't have been found out by hackers: https://news.ycombinator.com/item?id=40353820


> I guess we need to have an AI secretary to take in all phonecalls

Why not an AI assistant in the browser to fend all the adversarial manipulation and spam AIs on the web? Going online without your AI assistant would be like venturing without a mask during COVID

I foresee a cat-and-mouse game, AIs for manipulation vs AIs for protection one upping each other. It will be like immune system vs viruses.


I'm paranoid enough that I now modulate my voice and speak differently when answering an unknown phone call just in case they are recording and building a model to call back a loved one later. If they do get a call, they will be like, "why are you talking like that?"


But why not just make up a secret word to use with your beloved ones in critical situations. In case of ..., one needs to know that secret. Otherwise, FAKE! Gotcha!


Secrets are either written down somewhere and end up on the Internet, or forgotten.


It doesn't have to be the unspeakable, but rather can be the name of the first pet or something others just can't guess on the first time.


The problem here is you're assuming your family members aren't idiots, this is your first mistake.

Chances are they've already shoved some app on their phone that's voice to txting everything they say and sending off somewhere (well lower chance if they have an iphone).

Modern life is data/information security and humans are insanely bad at it.


By chance, they are noobs but not idiots, because they ask me on everything - they don't need Google, I know everything hahah

I don't think it's a problem to find a word or a sentence or a story - whatever - that's commonly used by everyone on daily basis but in different context. That's not a problem by itself :) try it

For the idiots, it is still possible to find a word. They may be idiots, but still, they work and live on their own. They coming along in life. So, it's up to the smarter one to find a no-brainer solution.

I am confident and believe nothing and no one is stupid enough not to be able to adapt to something. Even if it's me, who'll need to adapt to members with less brain.


Better yet, don’t answer unknown phone calls.


> unknown phone calls

This is my biggest gripe against the telecom industry. Calls pretending to be from someone else.

For every single call, someone somewhere must know at least the next link in the chain to connect a call. Keep following the chain until you find someone who either through malice or by looking the other way allows someone to spoof someone else's number AND remove their ability to send the current link in the chain (or anyone) messages. (Ideally also send them to prison if they are in the same country.) It shouldn't be that hard, right?


Companies have complex telecoms but generally want the outside as one company number. Solution, the sender send a packet with the number they should get perceived as. Everyone sends this on. Everyone "looks the other way" by design haha


So what, gate that feature behind a check that you can only set an outgoing caller ID belonging to a number range that you own.

The technology to build trustable caller ID has existed for a long time, the problem is no one wants to be the one forcing telcos all over the world to upgrade their sometimes many decades old systems.


I can only imagine "Hello, this is Guy Incognito..."


> Social media is absolutely lousy with AI-powered fraud of varying degrees of sophistication.

has been for years mon ami. i remember when they started talking about GPT-2 here, and then seeing a sea-change in places like reddit and quora

quite visible on HN, esp. in certain threads like those involving brands that market heavily, or discussions of particular countries and politics.


People were already killing each other for thousands of years so introducing tanks was no big deal, I guess. To say nothing of nuclear weapons.


What does abating that trend look like? Most AI safety proposals I hear fall into the categories of a) we need to stop developing this technology or b) we need laws that entrench the richest and most powerful organizations in the world as the sole proprietors of this technology. Neighther of those actually sound better than people being paranoid weirdos about trusting text/video/voice. I think that's kinda where we need to be as a culture: these things are not trustworthy, they were only ever good as a rough heuristic, and now that ship has sailed. We have just finished a transition to treating the digital world as part of our "real" world, but it's time to step that back. Using the internet to interact with known trusted parties will still work fine, provided that some authentication can be shared out-of-band offline. Meeting people and discovering businesses and such? There will be more fakes and scams than real opportunities by orders of magnitude, and as technology progresses our filtering will only get worse. We need to roll back to "don't trust anything online, don't share your identity or payment information online" outside of, as mentioned, out-of-band verified parties. You can still message your friends and family, do online banking and commerce, but you can't initiate a relationship with a person or business online without some kind of trusted recommendation.


>What does abating that trend look like?

I don't think anyone has a good answer to that question, which is the problem in a nutshell. Job one is to start investing seriously in finding possible answers.

>We need to roll back to "don't trust anything online, don't share your identity or payment information online"

That's easy to say, but it's a trillion-dollar decision. Alphabet and Meta are both worthless in that scenario, because ~all of their revenue comes from connecting unfamiliar sellers with buyers. Amazon is at existential risk. The collapse of Alibaba would have a devastating impact on Chinese exporters, with massive consequent geopolitical risks. Rolling back to the internet of old means rolling back on many years worth of productivity and GDP growth.


> because ~all of their revenue comes from connecting unfamiliar sellers with buyers

Well that's exactly the sort of service that will be extremely valuable in a post-trust internet. They can develop authentication solutions that cut down on fraud at the cost of anonymity.


“Extremely valuable” is another way of saying “extremely costly”.


Trust is more complex then we take credit for.

Even when it comes to people like our parents, there are things we would trust them to do, and things that we would not trust them to do. But what happens when you have zero trusted elements in a category?

At the end of the day, the digital world is the real world, not some seperate place 'outside the environment'. Trying to treat digital like it doesn't exist puts you in a dangerous place to be deceived. For example if you're looking for XYZ and you manage to leak this into the digital world, said digital world may manipulate your trusted friends via ads, articles, the social media posts they see on what they think about XYZ before you ask them.


Point a) is just point b) in disguise. You're just swapping companies for governments.

This tech is dangerous, and I'm currently of the opinion that its uses for malicious purposes are far better and more significant than LLM's replacing anyone's jobs. The bullshit asymmetry principle is very incredibly significant for covert ops and asymmetric warfare, and generating convincing misinformation has become basically free overnight.


>Regardless of any future impacts on the labour market or any hypothesised X-risks

Discovering an asteroid full of gold, with as much gold as half the earth to put a modest number, would have huge impact to the labour market. Anything conductive like copper, silver, mining jobs would all go away. Also housing would be obsolete as we would all live in golden houses. A huge impact to the housing market, yet it doesn't seem such a bad thing to me.

>We're already at a point where we're counselling elders to ignore late-night messages from people claiming to be a relative in need of an urgent wire transfer.

Anyone can prove their identity, or identities, over the wire, wire-fully or wire-lessly, anything you like. When i did go to university, i was the only one attending the cryptography class, no one else showed up for a boring class like this. I wrote a story about the Electrona Corp in my blog.

What i say to people for at least 2 years now, is that "Remember when governments were not just some cryptographic algorithms?" Yeah, that's gonna change. Cryptography is here to stay, it is not as dead as people think and it's gonna make a huge blast.


> Discovering an asteroid full of gold, with as much gold as half the earth to put a modest number, would have huge impact

All this would do is crash the gold price. Also note that all the gold at our disposal right now (worldwide) basically fits into a cube with 20m edges (its not as much as you might think).

Gold is not suitable to replace steel as building material (because it has much lower strength and hardness), nor copper/aluminium as conductor (it's a worse conductor than copper and much worse in conductivity/weigth than aluminium). The main technical application short term would be gold plated electrical contacts on every plug and little else...


Regarding gold, i like this infographic [1], but my favorite from this channel is wolf population by country. Point being, that gold is shiny and beautiful, and it will be used even when it is not appropriate solution to the problem, just because it is shiny.

I didn't know that copper is a better conductor than gold. Surprised by that.

[1] https://www.youtube.com/watch?v=E2Gd8CRG0cc


> The main technical application short term would be gold plated electrical contacts on every plug and little else...

.. And gold teeth and grillz.


> What i say to people for at least 2 years now, is that "Remember when governments were not just some cryptographic algorithms?" Yeah, that's gonna change. Cryptography is here to stay, it is not as dead as people think and it's gonna make a huge blast.

The thing about cryptography and government is that it's easy to imagine for a great technology to be adapted on the governmental level because of its greatness. But it is another thing to actually implement it. We live in a bubble, where almost anyone knows about cryptographic hashes and RSA, but for most of the people it is not the case.

Another thing is that political actors are tending to try to concentrate power in their own hands. No way they will delegate a decision making to any form of algorithm — being cryptographic or not.


As soon as mimicking voices, text messages, human faces becomes a serious problem, like this case in UK [1], then citizens will demand a solution to that problem. I don't personally know how prevalent problems like that are as of today, but given the current trajectory of A.I. models which become smaller, cheaper and better all the time, soon everyone on the planet will be able to mimic every voice, every face and every handwritten signature of anyone else.

As soon as this becomes a problem, then it might start bottom-up, citizens to government officials, rather than top to bottom, from president to government departments. Then governments will be forced to formalize identity solutions based on cryptography. See also this case in Germany [2].

One example like that, is bankruptcy laws in China. China didn't have any law regarding to bankruptcy till 2007. For a communist country, or rather not totally capitalist country like China, bankruptcy is not an important subject. When some people stop being profitable, they will keep working because they like to work and they contribute to the great nation of China. That doesn't make any sense of course, so their government was forced to implement some bankruptcy laws.

[1]https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos... [2]https://news.ycombinator.com/item?id=39866056


A lot of these are non-AI problems. People trying to defraud the elderly need to be taken out back and shot, that’s not an AI issue.


Right, I'll just get right on a plane and travel to whereverthefuckville overseas and ask for permission to face blast the scammers. The same scammers that are donating a lot of money to their local (probably very poor) law enforcement to keep their criminal enterprise quite. This will go well.


> I'm not confident that I'd be able to distinguish GPT 4o from a human speaker

Probably why it's not released yet. It's unsafe for phishing.


I think people are dismissive for a few reasons.

- It helps them sleep at night if their creation doesn't put millions of people out of work.

- Fear of regulation


> What defences do we have when an LLM will be able to have a completely fluent, natural-sounding conversation in someone else's voice?

The world learnt to deal with Nigerian Prince emails and nobody is falling to those anymore. Nothing was changed - no new laws or regulations needed.

Phishing calls have been going on without an AI for decades.

You can be skeptical and call back. If you know your friends or family you should be able to find an alternative way to get in touch always without too much effort in the modern connected world.

Just recently a gang in Spain was arrested for "son in trouble" scam. No AI used. Most of the parents are not fooled in this.

https://www.bbc.com/news/world-europe-68931214

The AI might have some marginal impact, but it does not matter in the big picture of scams. While it is worrisome, it is not a true safety concern.


> yet the real-world impacts remain modest so far.

I second that. I remember when Google search first came out. Within a few days it completely changed my workflow, how I use the Internet, my reading habits. It easily 5 ~ 10x the value of Internet for me over a couple of weeks.

LLMs is doing nothing of the sort for me.


Google was a step function, a complete leveling up in terms of usability of returned data.

ChatGPT does this again for me. I am routinely getting zero useful results on the first page or two of Google searches, but AI is answering or giving me guidance quickly.

Maybe this would not seem such an improvement if Google's results were like they were 10 years ago and not barely usable blogspam


> I am routinely getting zero useful results on the first page or two of Google searches, but AI is answering or giving me guidance quickly.

To me, this just sounds like Google Search has become shit, and since Google simply isn't going to give up the precious ad $$$ that the current format is generating, the next best thing is ChatGPT. But this is different from saying that ChatGPT is a similar step up like Search was.

For what it's worth, I agree with you that Google Search has become unusable. Google basically destroyed it's best product (for users), by turning it into an ad riddles shovelware cesspit.

That ChatGPT is similarly good like Google Search used to be, is a tragedy. Basically we had a conceptually simple product that functioned very well, and we are replacing it with a significantly more complex product.


What are you searching for? I see people complaining about this a lot but they never give examples. Google is chock full of spam, yes, but it still works for me.


Google’s results are themselves an AI product though. You’re just comparing different AIs.


OMG I remember trying Google when it was in beta, and HOLY CRAP what I had been using was like freakin night and day. AltaVista: remember that? That was the state of the art before that, and it did not compare. Night and day.


I remember Google being marginally better than Altavista but not much more.

The cool kids in those days used Metacrawler, which meta searched all the search engines.


Google was marginally better in popular searches and significantly better for tail searches. This is a big reason why it flourished with the technical and student crowd in earlier days because those exceedingly rare sub-sub-topics would get surfaced higher in the rankings. For the esoteric topics Yahoo didn't have it in catalog and Altavista maybe had it but it was on page 86. Even before spelling correction and dozens of other useful search features were added, it was tail search and finding what you were looking for sooner. Serving speed, too, but perhaps that was more subtle for some.

Metasearch only helps recall. It won't help precision, the metasearch still needs to rank the aggregate results.


I used Metacrawler, it was dog slow. The beauty of Google was it was super fast, and still returned results that were at least as good, and often better, than Metacrawler. After using Google 2-3 times I don’t think I ever used Metacrawler again.


You just gave me a weird flashback to 1997.

And hey maybe when combined with GPT-4o AskJeeves will finally work as intended.


With Altavista I had to go through 2 or 3 pages just to find the GNU website. I remember the Google beta as life-changing.


Remember dogpile? Great aggregator


Yes, Altavista was the major step over Yahoo! directory.


And I'm sure that it's doing that for some people, but... I think those are mostly in the industry. For most of the people outside the tech bubble, I think the most noticeable impact it has had on their lives so far is that they've seen it being talked about on the news, maybe tried ChatGPT once.

That's not to say it won't have more significant impact in the future; I wouldn't know. But so far, I've yet to see the hype get realised.


>LLMs is doing nothing of the sort for me.

Don't use it for things you're already an expert in, it can't compare to you yet.

Use it for learning new things, or for things you aren't very good at and don't want to bother with. For these it's incredible.


For me, LLMs mostly replaced search. I run local Ollama, and whenever I need help with coding/docs/examples, I just ask Mixtral7x8B, and get an answer instantly, tailored to my needs.


> OpenAI are masters of hype. They have been generating hype for years now, yet the real-world impacts remain modest so far.

Perhaps.

> Do you remember when they teased GPT-2 as "too dangerous" for public access? I do. Yet we now have Llama 3 in the wild, which even at the smaller 8B size is about as powerful as the [edit: 6/13/23] GPT-4 release.

The statement was rather more prosaic and less surprising; are you sure it's OpenAI (rather than say all the AI fans and the press) who are hyping?

"""This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas.

We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems."""


That's fair: the statement isn't hyperbolic in its language. But remember that GPT-2 was barely coherent. In making this statement, I would argue that OpenAI was trying to impart a sense of awe and danger designed to attract the kind of attention that it did. I would argue that they have repeatedly invoked danger to impart a sense of momentousness to their products. (And to further what is now a pretty transparent effort to monopolize the tech through regulatory intervention.)


> (And to further what is now a pretty transparent effort to monopolize the tech through regulatory intervention.)

I disagree here also: the company has openly acknowledged that this is a risk to be avoided with regards to safety related legislation, what they've called for looks a lot more like "we don't want a prisoner's dilemma that drives everyone to go fast at the expense of safety" rather than "we're good everyone else is bad".


> yet the real-world impacts remain modest so far.

I spend a part of yesterday evening sorting my freshly dried t-shirts into 4 distinct piles. I used OpenAI Vision (through BeMyEyes) from my phone. I got a clear description of each and every piece of clothing, including print, colours and brand. I am blind BTW. But I guess you are right, no impact at all.

> Yet we now have Llama 3 in the wild

Yes, great, THANKS Meta, now the Scammers have something to work with. Thats a wonderful achievement which should be praised! </sarcasm>


> I got a clear description of each and every piece of clothing, including print, colours and brand. I am blind BTW.

That is a really great application of this tech. And definitely qualifies as real-world impact. Thanks for sharing that!


I can’t even get GPT 4 to reliably take a list of data and put it in a CSV. It gets a problem every single time.

People read too many sci-fi books and then project their fantasies on to real-world technologies. This stuff is incredibly powerful and will have social effects, but it’s not going to replace every single job by next year.


GPT-4 is better at planning than at executing.

Have you tried asking it to generate a regex to transform your list into a CSV?


I remember when people used to argue about regex being bad or good, with a lot of low quality regex introducing bugs in codebases.

Now we have devs asking AI to generate regex formulas and pasting it into code without much concern on its validity.


Regexes are easy to test.

Bad developers do bad regexes, regardless of whether they used AI.


How do you test a regex to be 100% sure it's valid? I don't think it's possible.


If it's using classical regex, without backtracking or other extensions, a regular expression is isomorphic to a state machine. You can enumerate combinations doing something like this: https://stackoverflow.com/a/1248566

kids these days and their lack of exposure to finite automata


- Vehicle steering is easy to test.

- How so? I don't think it's possible to test for all cases...

- Well, it's easy, assuming a car on a non-branching track, moving with a constant speed and without any realistic external influences on it, you can simply calculate the distance traveled using the formula s = v/t. Ah, I wish I'd stop running into fools not knowing Newton's first law of motion...

- ??? Are you well?


I understand you want to refute/diminish the parent comment on finite automata, but I think you are providing a straw man argument. The parent comment does provide an interesting, factual statement. I don't believe finite state automata are at all close in complexity to real-world self-driving car systems (or even a portion thereof). Your closing statement is also dismissive and unconstructive.

I believe finite state modeling is used at NASA, A google search brings up a few references (that I'm probably not qualified to speak to), and I also remember hearing/reading a lecture on how they use them to make completely verifiable programs but can't find the exact one at the moment.


I wasn't making a strawman, I was making a parody of his strawman. I thought it's obvious, since I was making an analogy, and it was an analogy to his argument.


I should have been more clear perhaps: many regexes are capable of being verified with 100% certainty: https://en.m.wikipedia.org/wiki/Thompson%27s_construction

But not all regexes (eg, those using PCRE extensions afaik) are amenable to such a treatment. Those you just tend to hope they work.


True for most things I think


Aren't you asking how to create software without bugs?


Well regex isn't Turing-complete, so it's not exactly an analysis of a program. You could reason about regex, about tokens, then describe them in a way that satisfies the specification, but theorizing like this is exactly opposite to "simple" - it would be so much harder than just learning regex. So stating that testing regex is simple is just bs. The author later confirms he is a bullshitter by his follow-up...


No, I’ll give that a shot. I have just been asking it to convert output into a CSV, which used to work somewhat well. It stumbles when there is more complexity though.


Humans also stumble with that as well. Problems being CSV not really being that well defined and it is not clear to people how quoting needs to be done. The training set might not contain enough complex examples (newlines in values?)


No, the data is very well defined. For example, “name, date of birth, key ideas,” etc.

The issue is with ChatGPT formatting a file.


Even if you get it to work 100% of the time, it will only be 99.something%. That's just not what it's for I guess. I pushed a few million items through it for classification a while back and the creative ways it found to sometimes screwup, astounded me.


Yeah and that's why I'm skeptical of the idea that AI tools will just replace people, in toto. Someone has to ultimately be responsible for the data, and "the AI said it was true" isn't going to hold up as an excuse. They will minimize and replace certain types of work, though, like generic illustrations.


> "Someone has to ultimately be responsible for the data"

All you have to do is survive long enough as an unemployed criminal until the system gets round to exonerating you:

https://en.wikipedia.org/wiki/British_Post_Office_scandal

"The British Post Office scandal, also called the Horizon IT scandal, involved Post Office Limited pursuing thousands of innocent subpostmasters for shortfalls in their accounts, which had in fact been caused by faults in Horizon, accounting software developed and maintained by Fujitsu. Between 1999 and 2015, more than 900 subpostmasters were convicted of theft, fraud and false accounting based on faulty Horizon data, with about 700 of these prosecutions carried out by the Post Office. Other subpostmasters were prosecuted but not convicted, forced to cover Horizon shortfalls with their own money, or had their contracts terminated. The court cases, criminal convictions, imprisonments, loss of livelihoods and homes, debts and bankruptcies, took a heavy toll on the victims and their families, leading to stress, illness, family breakdown, and at least four suicides. In 2024, Prime Minister Rishi Sunak described the scandal as one of the greatest miscarriages of justice in British history.

Although many subpostmasters had reported problems with the new software, and Fujitsu was aware that Horizon contained software bugs as early as 1999, the Post Office insisted that Horizon was robust and failed to disclose knowledge of the faults in the system during criminal and civil cases.

[...]

challenge their convictions in the courts and, in 2020, led to the government establishing an independent inquiry into the scandal. This was upgraded into a statutory public inquiry the following year. As of May 2024, the public inquiry is ongoing and the Metropolitan Police are investigating executives from the Post Office and its software provider, Fujitsu.

Courts began to quash convictions from December 2020. By February 2024, 100 of the subpostmasters' convictions had been overturned. Those wrongfully convicted became eligible for compensation, as did more than 2,750 subpostmasters who had been affected by the scandal but had not been convicted."


Do you even work with humans now? I get "Computer says no" out of corporations all the time as it is, AI is just completing that loop.


> Do you remember when they teased GPT-2 as "too dangerous" for public access? I do.

I can't help but notice the huge amount of hindsight and bad faith that it demonstrated here. Yes, now we are aware that the internet did not drown in a flood of bullshit (well, not noticeably more), when GPT-2 was released.

But was it obvious? I certainly thought that there was a chance that the amount of blog spam that could be generated effortlessly might just make internet search unusable. You are declaring "hype", when you could also say "very uncertain and conscientious". Is this not something we want people in charge to be careful with?


I think the problem is, we did drown in a flood of bullshit, but we've just somehow missed it.

Even in this thread people talk about "Oh I use ChatGPT rather than Google search because Google is just stuffed with shit". And on HN there are plenty of discussions about huge portion of reddit threads being regurgitated older comments.


GPT-4 already seems better at reasoning than most people. It just has an unusual training domain of Internet text.


I was going to say the same thing. For some real world estimation tasks where I don't want 100% accuracy (example: analysing working capital of a business based on balance sheet, analysing some images and estimating inventory etc.) the job done by GPT-4o is better than fresh MBA graduates from tier 2/tier 3 cities in my part of world.

Job seekers currently in college have no idea what is about to hit them in 3-5 years.


I agree. HN's and the tech bubble's bias many people are not noticing is that it's full of engineers comparing GPT-4 to software engineering tasks. In programming, the margin of error is incredibly slim in the way that a compiler either accepts entirely correct code (in its syntax of course) or rejects it. There is no in between, and verifying software to be correct is hard.

In any other industry where just need an average margin of error close to a human's work and verification is much easier than generating possible outputs, the market will change drastically.


On the other hand, programming and software engineering data is almost certainly over-represented on the internet compared to information from most professional disciplines. It also seems to be getting dramatically more focus than other disciplines from model developers. For those models that disclose their training data, I've been seeing decent sized double-digit percentages of the training corpus being just code. Finally, tools like copilot seem ideally positioned to get real-world data about model performance.


I’d love to see this! Can you give us a couple of concrete examples of this that we can check?


not really. Even a human bad at reasoning can take 1 hour of time to tinker around and figure things out. GPT-4 just does not have the deep planning/reasoning ability necessary for that.


Have you seen some people with technology? =)

They won't "take 1 hour of time", they try it once or twice and give up.


I think you might be falling for selection bias. I guess you are surrounding yourself with a lot of smart people. "tinker around and figure things out" is definitely something certain humans (bad at reasoning) can't do. I already prefer the vision model when it comes to asking for a picture description (blind user) over many humans I personally know. The machine is usually more detailed, and takes the time to read the text, instead of trying to shortcut and decide for me whats important. Besides, people from the english speaking countries do not have to deal with foreign languages. Everyone else has to. "Aber das ist ja in englisch" is a common blocker for consuming information around here. I tell you, if we dont manage to ramp up education a few notches, we'll end up with even higher stddev when it comes to practical intelligence. We already have perfectly normal seeming humans absolutely unable to participate on the internet.


Reasoning and planning are different things. It's certainly getting quite good at deductive reasoning, especially when forced to check it's own arguments for flaws every time it states something. (I had a several hour chat with it yesterday, and I was very impressed about the progress.)

Planning is different in that it is an essential part of agency. That's what Q* is supposed to add. My guess is that planning is the next type of functionality to be added to GPT. I wouldn't be surprised if they already have a version internally with such functionality, but that they've decided to hold it back for now for reasons such as safety (some may care about the election this year) or simply that the inference costs are so huge they cannot possibly expose it publicly.


Does it need those things if it can just tap into artifacts generated by humans who did spend that hour?


The only reason I still have a job is that it can't (yet) take full advantage of artefacts generated by humans.

"Intern of all trades, senior of none", to modernise the cliché.


If everyone is average at reasoning then it must not be a very important trait or we’d all be at reasoning school getting better at it.

Really philosophy seems to be one of the least important subjects right now. Hardly anyone learns about it in school.

If it was so important to success in the wild than it would stand to reason we all work hard at improving our reasoning skills, but very few do.


What schools teach is what governments who set the curriculum like to think is important, which is why my English lessons had a whole section on the Shakespearean (400-year-old, English, Christian) take on the life and motivations of a Jewish merchant living in Venice, followed up with a 80 year old (at the time) English poem on exactly how bad it is to watch your friends choke to death as their lungs melt from chlorine gas in the trenches of the first world war.

These did not provide useful life-lessons for me.

(The philosophy A-level I did voluntarily seemed to be 50% "can you find the flaws in this supposed proof of the existence of god?")


> These did not provide useful life-lessons for me.

Shakespeare is packed with insight.


None of the stuff we did at school showed any indication of insight into things of relevance to our world.

If I took out a loan on the value of goods being shipped to me, only for my ship to be lost at sea… it would be covered by insurance, and no bank would even consider acting like Shylock (nor have the motivation of being constantly tormented over religion) for such weird collateral, and the bank manager's daughters wouldn't get away with dressing up as lawyers (no chance their arguments would pass the sniff test today given the bar requirement) to argue against their dad… and they wouldn't need to because the collateral would be legally void anyway and rejected by any court.

The ships would also not then suddenly make a final act appearance to apologise for being late, to contradict the previous belief they were lost at sea, because we have radio now.

The closest to "relevant" that I would accept, is the extent to which some of the plots can be remade into e.g. The Lion King or Wyrd Sisters — but even then…

"Methinks, to employeth antiquated tongues doth render naught but confusion, tis less even than naughty, for such conceits doth veil true import with shadows."


They're masters of hype because their products speak for themselves (literally)


Yeah. Open ai are certainly not masters of hype lol. They released their titular product to basically no fanfare or advertisement. ChatGPT took off on Word of Mouth alone. They dropped GPT-4 without warning and waited months to ship it's most exciting new feature (image input).

Even now, they're shipping text-image 4o but not the new voice while leaving old-voice up and confusing/disappointing a whole lot of people. This is a pretty big marketing blunder.


> ChatGPT took off on Word of Mouth alone.

I remember for a good 2-3 months in 2023 ALL you could see on tiktok / youtube shorts was just garbage about 'how amazing' ChatGPT was. Like - video after video and I was surprised of the repeat content being recommended to me... No doubt openAI (or something) was behind that huge marketing push


Is it not possible this would be explained by people simply being interested in the technology and TikTok/Youtube algorithms noticing that—and that they would have placed you in the same bubble, which is probably an accurate assignment?

I doubt OpenAI spent even one cent marketing their system (e.g. as in paying other companies to push it).


Well if you were a typical highly engaged TikTok or YouTube user, you are probably 13-18 years old. The kind of cheating in school that ChatGPT enabled is revolutionary. That is going to go viral. It's not a marketing push. After years of essentially learning nothing during COVID lockdowns, can you understand how transformative that is? It's like 1,000x more exciting than pirating textbooks, stealing Mazdas, or whatever culturally self-destructive life hacks were being peddled by freakshow brocolliheads and Kim Kardashian-alikes on the platform.

It's ironic because the OpenAI creators really loved school and excelled academically. Nobody cares that ChatGPT destroyed advertising copywriting. But whatever little hope remained for the average high schooler post-lockdowns, it was destroyed by instant homework cheating via ChatGPT. So much for safety.


> No doubt

Who needs evidence when we have your lack of doubt hey?


I think you meant to say "All that "I" could see". There's a lot of bubbles in social media. Not everyone is part of your bubble.


No, it's just the masses sniffing out the new fascinating techbro thing to make content about.

In a way I'm sorry, that's what people do nowadays. I'd prefer it to be paid for, honestly.


"real-world impacts remain modest so far." Really? My Google usage has went down with 90% (it would just lead me to some really bad take from a journalist anyway, while ChatGPT can just hand me the latest research and knows my level of expertise). Sure it is not so helpful at work, but if OpenAI hasnt impacted the world I fail to see which company have in this decade.


“Replaced Google” is definitely an impact, but it’s nothing compared to the people that were claiming entire industries would be wiped out nearly overnight (programming, screenwriting, live support, etc).


Speak to some illustrators or voiceover artists - they're talking in very bleak terms about their future, because so many of them are literally being told by clients that their services are no longer required due to AI. A double-digit reduction in demand is manageable on aggregate, but it's devastating at the margin. White-collar workers having to drive Ubers or deliver packages because their jobs have been taken over by AI is no longer a hypothetical.


We had this in content writing and marketing last year. A lot of layoffs were going to happen anyway due to the end of ZIRP, AI came just at the right time, and so restructuring came bundled with "..and we are doing it with AI!".

It definitely took out a lot of jobs from the lowest rungs of the market, but on the more specialized / upper end of the ladder wages got actually higher and a lot of companies got burned, and now they have to readjust. It's rolling over slowly still, as there a lot of companies selling AI products and in turn new companies adopting those products. But it tells you a lot that

A) a company selling an AI assistant last year is now totally tied to automating busy work tasks around marketing and sales

B) AI writing companies are some of the busiest in employing human talent for... writing and editorial roles!

It's all very peculiar. I haven't seen anything like this in the past 15 years... maybe the financial crisis and big data was similar, but much much smaller at scale.


>It definitely took out a lot of jobs from the lowest rungs of the market, but on the more specialized / upper end of the ladder wages got actually higher

Effectively all mechanization, computerization, and I guess now AI-ization has done this. In the past you could have a rudimentary education and contribute to society. Then we started requiring more and more grade school, then higher education for years. Now we're talking about the student debt crisis!

At least if AI doesn't go ASI in the near term the question is how are we going to train the next generation of workers to go from unskilled to more skilled and useful than the AI is. Companies aren't going to want to do this. The individuals are going to think it's risky getting an education that could be replaced by a software update. If left to go out of control this is how a new generation of luddites will burn data centers in protest they are starving on the streets.


colleges are seeing apprentices placements drop - why train an apprentice for two years when ChatGPT will do the work for them?


We should be thinking pretty hard right about now why this kind of progress and saving these expenses is a BAD thing for humanity. The answer will touch deeply ingrained ideas about what and who should underpin and benefit from progress and value in society.


I think mostly claims have been around multiplying the efforts of people for now.


If Google hadn't ruined Search to help Advertising perhaps it wouldn't have been such a stark comparison in information quality.


Search was always a byproduct of Advertising. Don’t blame Google for sticking to their business model.

We were naive to think we could have nice things for free.


When google search first appeared, it had nothing to do with advertising. In fact, the founders wrote a paper on why advertising would be bad.


Found the zoomer.


It will be interesting to see how they compare, years from now, when ChatGPT has been similarly compromised.


It might not happen in that way since there are alternatives available. Google had/has a monopoly on search.


For the premium subscribers it'll be good, but they'll sure ruin the experience for free tier just like Spotify cause they just can't keep their business sustainable without showing vc's some profits.


There is little other way of making money from search.


I believe you, and I do turn to an LLM over Google for some queries where I'm not concerned about hallucination. (I use Llama 3 most of the time, because the privacy is absolute.)

But OpenAI is having a hard time retaining/increasing ChatGPT users. Also, Alphabet's stock is about as valuable as it's ever been. So I don't think we have evidence that this is really challenging Google's search dominance.


Google is an ad company. Ad prices are on an auction and most companies believe that they need ads. Less customers don't necessarily mean that the earnings go down, as when the clicks go down the prices might go up (without ad competitors). Ergo, they don't compete (yet at least).

But ChatGPT has really hurt Google's brand image.


Ironically, I was like that for a while, but now use regular google search again quite a bit. A lot of times, good old stack overflow is best.


The questions I ask ChatGPT have (almost) no monetary value for Google (programming, math, etc).

The questions I still ask Google, have a lot of monetary value (restaurants, cloths, movie, etc).


I use Google and it gives me AI answers.

But I agree seems SO often helps more than Google-AI.


It's well known that LLMs don't reason. That's not what they are for. It's a throw away comment to say that a product can't do what it explicitly is unable to do. Reasoning will require different architectures. Even with that LLMs are incredibly useful.


Chat GPT 3.5 has been neutered, as it it won't spit out anything that isn't overly politically correct. 4chan were hacking their way around it. Maybe that's why they decided it was "too dangerous".


" GPT-4o doesn't show much of an improvement in "reasoning" strength."

Maybe that is GPT-5.

And this release really is just incremental improvements in speed, and tying together a few different existing features.


> yet the real-world impacts remain modest so far

Go ask any teacher or graphician.


That's one of my biggest fears, teachers using AI generated content without "checks" to raise / teach / test our children.


> Do you remember when they teased GPT-2 as "too dangerous" for public access? I do.

Maybe not GPT-2, but in general LLMs and other generative AI types aren't without their downsides.

From companies looking to downsize their staff to replace them with software, to the work of artists/writers being devalued somewhat, to even easier scams and something like the rise of AI girlfriends, which has also gotten some critique, some of those can probably be a net negative.

Even when it's not pearl clutching over the advancements in technology and the social changes that arise, I do wonder how much my own development work will be devalued due to the somewhat lowered entry barrier into the industry and people looking for quick cash, same as with boot camps leading to more saturation. Probably not my position individually (not exactly entry level), but the market as a whole.

It's kind of at a point where I use LLMs for dev work not to fall behind, cause the productivity gains for simple problems and boilerplate are hard to argue with.


> They have been generating hype for years now, yet the real-world impacts remain modest so far.

I feel like everyone who makes this claim doesn't actually have any data to backup it up.


Like another comment mentioned, sigmoid curves [1] are ubiquitous with neural network systems. Neural network systems can be intoxicating because it's so "easy" (relatively speaking) to go from nothing to 80% in extremely short periods of time. And so it seems completely obvious that hitting 100% is imminent. Yet it turns out that each percent afterwards starts coming exponentially more slowly, and we tend to just bump into seemingly impassable asymptotes far from where we'd like to be.

~8 years ago when self driving technology was all the rage and every major company was getting on board with ever more impressive technological demos, it seemed entirely reasonable to expect that we'd all be in a world of complete self driving imminently. I remember mocking somebody online around the time who was pursuing a class C/commercial trucking license. Yet now a decade later, there are more truckers than ever and the tech itself seems further away than ever before. And that's because most have now accepted that progress on such has basically stalled out in spite of absolutely monumental efforts at moving forward.

So long as LLMs regularly hallucinate, they're not going to be useful for much other than tasks that can accept relatively high rates of failure. And many of those generally creative domains are the ones LLMs are paradoxically the weakest in - like writing. Reading a book written by an LLM would be cruel and unusual punishment given then current state of the art. One domain I do see them completely taking over is search. They work excellently as natural language search engines, and "failure" in such is very poorly defined.

[1] - https://en.wikipedia.org/wiki/Sigmoid_function


I'm not really sure your self-driving analogy is apt here. Waymo has cars on the road right now that are totally autonomous, and just expanded its footprint. It has been longer and more difficult than we all thought, and those early tech demos were a glimmer of what was to come; then we had to grind to get there, with a lot of engineering.

I think what maybe seems not obvious amidst the hype is that there is a hell of a lot of engineering left to do. The fact that you can squash the weights of a neural net down to 3 bits per param and it still works -- is evidence that we have quite a way to go with maturing this technology. Multimodality, improvements to the UX of it, the human-computer interface part of it. Those are fundamental tech things, but they are foremost engineering problems. Getting latency down. Getting efficiency up. Designing the experience, then building it out.

25 years ago, early tech demos on the internet were promising that everyone would do their shopping, entertainment, socializing, etc... online. Breathless hype. 5 years after that, the whole thing crashed, but it never went away. People just needed time to figure out how to use it and what it was useful for, and discover its limitations. 10 years after that, engineering efforts were systematized and applied against the difficult problems that still remained. And now: look at where we are. It just took time.


I don't think he's saying that AGI is impossible — almost noone (nowadays) would suggest that it's anything but an engineering challenge. The argument is simply one of scale, i.e. how long that engineering challenge will take to solve. Some people are suggesting on the order of years. I think they're suggesting it'll be closer to decades, if that.


AGI being "just an engineering challenge" implies that it is conceptually solved, and we need only figure out how to build it economically.

It most definitely is not.


Waymo cars are highly geofenced in areas with good weather and good quality roads. They only just (in January) gained the capability to drive on freeways.

Let me know when you can get a Waymo to drive you from New York to Montreal in winter.


> Waymo cars are highly geofenced in areas with good weather and good quality roads. They only just (in January) gained the capability to drive on freeways

They are an existence proof that the original claim that we seem further than ever before is just wrong.


There are 6 categories of self driving, starting at 0. The final level is the one we've obviously been aiming at, and most were expecting. It's fully automated self driving in all conditions and scenarios. Get in your car anywhere, and go to anywhere - with capability comparable to a human. Level 4, by contrast, is full self driving under certain circumstances and generally in geofenced areas - basically trolleys without rails. Get in a car, so long as conditions are favorable, and go to a limited set of premapped locations.

And level 4 is where Waymo is, and is staying. Their strategy is to to use tiny geofenced areas with a massive amount of preprocessing, mapping out every single part of an area, not just in terms of roads but also every single meta indicator - sign, signals, cross walks, lanes, and so on. And it creates a highly competent, but also highly rigid system. If road conditions change in any meaningful way, the most likely outcome with this strategy is simply that the network gets turned off until the preprocessing can be carried and reuploaded again. That's completely viable in small geofenced areas, but doesn't generalize at all.

So the presence of Waymo doesn't say much of anything about the presence of level 5 autonomy. If anything it suggests Waymo believes that level 5 autonomy is simply out of reach, because the overwhelming majority of tech that they're researching and developing would have no role whatsoever in level 5 automation. Tesla is still pushing for L5 automation, but if they don't achieve this then they'll probably just end up getting left behind by companies that double down on L4. And this does indeed seem to be the most likely scenario for the foreseeable future.


This sounds suspiciously like that old chestnut, the god of the gaps. You're splitting finer and finer hairs to maintain your position that, "no, really, they're not really doing what I'm saying they can't do", all the while self-driving cars are spreading and becoming more capable every year.

I don't think we have nearly as much visibility on what Waymo seems to believe about this tech as you seem to imply, nor do I think that their beliefs are necessarily authoritative. You seem disheartened that we haven't been able to solve self-driving in a couple of decades, and I'm of the opinion that geez, we basically have self-driving now and we started trying only a couple of decades ago.

How long after the invention of the transistor did we get personal computers? Maybe you just have unrealistic expectations of technological progress.


Level 5 was the goal and the expectation that everybody was aiming for. Waymo's views are easy to interpret from logically considering their actions. Level 4, especially as they are doing it, is in no way whatsoever a stepping stone to level 5. Yet they're spending tremendous resources directed towards things that would have literally and absolutely no place in level 5 autonomy. It seems logically inescapable to assume that not only do they think they'll be unable to hit level 5 in the foreseeable future, but also that nobody else will be able to either. If you can offer an alternative explanation or argument, please share!

Another piece of evidence also comes from last year when Google scaled back Waymo with layoffs as well as "pausing" its efforts at developing self driving truck technology. [1] That technology would require something closer to L5 autonomy, because again - massive preprocessing is quite brittle and doesn't scale well at all. Other companies that were heavily investing in self-driving tech have done similarly. For instance Uber sold off its entire self-driving division in 2021. I'm certainly happy to hear any sort of counter-argument, but you need some logic instead of ironically being the one trying to mindread me or Waymo!

[1] - https://www.theverge.com/2023/7/26/23809237/waymo-via-autono...


Not necessarily. If self-driving cars "aren't ready" and then you redefine what ready is, you've absolutely got your thumb on the scale of measuring progress.


Other way around: Self driving cars "are ready" but then people in this thread seemed to redefine what ready means.


Why do some people gloat about moving goalposts around?

15 years ago self driving of any sort was pure fantasy, yet here we are.

They'll release a version that can drive in poor weather and you'll complain that it can't drive in a tornado.


> "15 years ago self driving of any sort was pure fantasy, yet here we are."

This was 38 years ago: https://www.youtube.com/watch?v=ntIczNQKfjQ - "NavLab 1 (1986) : Carnegie Mellon : Robotics Institute History of Self-Driving Cars; NavLab or Navigation Laboratory was the first self-driving car with people riding on board. It was very slow, but for 1986 computing power, it was revolutionary. NavLab continued to lay the groundwork for Carnegie Mellon University's expertise in the field of autonomous vehicles."

This was 30+ years ago: https://www.youtube.com/watch?v=_HbVWm7wdmE - "Short video about Ernst Dickmanns VaMoR and VaMP projects - fully autonomous vehicles, which travelled thousands of miles autonomously on public roads in 1980s."

This was 29 years ago: https://www.youtube.com/watch?v=PAMVogK2TTk - "A South Korean professor [... Han Min-hong's] vehicle drove itself 300km (186 miles) all the way from Seoul to the southern port of Busan in 1995."

This was 19 years ago: https://www.youtube.com/watch?v=7a6GrKqOxeU - "DARPA Grand Challenge - 2005 Driverless Car Competition"


Stretching the timeline to 30 years doesn't make the achievement any less impressive.


It's okay! We'll just hook up 4o to the Waymo and get quippy messages like those in 4o's demo videos: "Oh, there's a tornado in front of you! Wow! Isn't nature exciting? Haha!"

As long as the Waymo can be fed with the details, we'll be good. ;)

Joking aside, I think there are some cases where moving the goalposts is the right approach: once the previous goalposts are hit, we should be pushing towards the new goalposts. Goalposts as advancement, not derision.

I suppose the intent of a message matters, but as people complain about "well it only does X now, it can't do Y" - probably true, but hey, let's get it to Y, then Z, then... who knows what. Challenge accepted, as the worn-out saying goes.


It's been 8 years and I still don't have my autonomous car.

Meanwhile I've been using ChatGPT at work for _more than a year_ and it's been tremendously helpful to me.

This is not hype, this is not about how AI will change our lives in the future. It's there right here, right now.


Of course. It's quite a handy tool. I love using it for searching documentation for some function that I know the behavior of, but not the name. And similarly, people have been using auto-steer, auto-park, and all these other little 'self driving adjacent' features for years as well. Those are also extremely handy. But the question is, what comes next?

The person I originally responded to stated, "We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data." And that far less likely than us achieving L5 self driving (if not only because driving is quite simple relative to many of the jobs he envisions AI taking over), yet L5 self driving seems as distant as ever as well.


> So long as LLMs regularly hallucinate, they're not going to be useful for much other than tasks that can accept relatively high rates of failure.

Yep. So basically they're useful for a vast, immense range of tasks today.

Some things they're not suited for. For example, I've been working on a system to extract certain financial "facts" across SEC filings. ChatGPT has not been helpful at all either with designing or implementing (except to give some broad, obvious hints about things like regular expressions), nor would it be useful if it was used for the actual automation.

But for many, many other tasks -- like design, architecture, brainstorming, marketing, sales, summarisation, step by step thinking through all sorts of processes, it's extremely valuable today. My list of ChatGPT sessions is so long already and I can't imagine life without it now. Going back to Google and random Quora/StackOverflow answers laced with adtech everywhere...


> I've been working on a system to extract certain financial "facts" across SEC filings. ChatGPT has not been helpful at all

The other day, I saw a demo from a startup (don't remember their name) that uses generative AI to perform financial analysis. The demo showed their AI-powered app basically performing a Google search for some companies, loosely interpreting those Google Stock Market Widgets that are presented in such searches, and then fetching recent news and summarizing them with AI, trying to extract some macro trends.

People were all hyped up about it, saying it will replace financial analysts in no time. From my point of view, that demo is orders of magnitude below the capacity of a single intern who receives the same task.

In short, I have the same perception as you. People are throwing generative AI into everything they can with high expectations, without doing any kind of basic homework to understand its strengths and weaknesses.


> So long as LLMs regularly hallucinate, they're not going to be useful for much other than tasks that can accept relatively high rates of failure.

But is this not what humans do, universally? We are certainly good at hiding it – and we are all good at coping with it – but my general sense when interacting with society is that there is a large amount of nonsense generated by humans that our systems must and do already have enormous flexibility for.

My sense is that's not an aspect of LLMs we should have any trouble with incorporating smoothly, just by adhering to the safety nets that we built in response to our own deficiencies.


The sigmoid is true in humans too. You can get 80% of the way to being sort of good at a thing in a couple of weeks, but then you hit the plateau. In a lot of fields confidently knowing and applying this has made people local jack of all trades experts... the person that often knows how to solve the problem. But Jack is no longer needed so much. ChatJack got`s your back. Better to be a the person who knows one thing in excruciating detail and depth, and never ever let anyone watch you work or train on your output.


I think it's more like an exponential curve where it looks flat moments before it shoots up.

mapping th genome was that way. On a 20yr schedule, barely any progress for 15 and then poof, done ahead of schedule


> or just.. training data.

I have a much less "utopian" view about the future. I remember during the renaissance of neural networks (ca. 2010-15) it was said that "more data leads to better models", and that was at a time when researchers frowned upon the term Artificial Intelligence and would rather use Machine Learning. Fast forward a decade LLMs are very good synthetic data generators that try to mimic human generated input and I can't think somehow that this wasn't the sole initial intent of LLMs. And that's it for me. There's not much to hype and no intelligence at all.

What happens now is that human generated input becomes more valuable and every online platform (including minor ones) will have now some form of gatekeeping in place, rather sooner than later. Besides that a lot of work still can't be done in front of a computer in isolation and probably never will, and even if so, automation is not a means to an end. We still don't know how to measure a lot of things and much less how to capture everything as data vectors.


The two AI’s talking to each other was like listening to two commercials talking to each other. Like a callcenter menu that you cannot skip. And they _kept repeating themselves_. Ugh. If this is the future I’m going to hide in a cave.


My new PC arrives tomorrow. Once I source myself two RTX 3060's I'll be an AI owner, no longer dependant on cloud APIs.

Currently the bottleneck is Agents. If you want a large language model to actually do anything you need an Agent. Agents so far need a human in the loop to keep them sane. Until that problem is solved most human jobs are still safe.


GPT 4o incorporated multimodality directly in the neural network, while reducing inference costs to half.

I fully expect GPT 5 (or at the latest 6) to similarly have native inclusion of agentic capabilities either this year or next year, assuming it doesn't already, but is just kept from the public.


Going to put the economy in a very, very weird situation if true.

Will be like, the end of millions of careers overnight.

It will probably strongly favour places like China and Russia though, where the economy is already strongly reliant on central control.


> It will probably strongly favour places like China and Russia though, where the economy is already strongly reliant on central control.

I think you may be literally right in the opposite sense to what I think you intended.

China (and maybe Russia) may be able to use central control to have an advantage when it comes to avoiding disasterous outcomes.

But when it comes to the rate of innovation, the US may have an advantage for the usual reasons. Less government intervention (due to lobbyism) combined with having several corporations actively competing with each other to be first/best usually leads to faster innovation. However, the downside may be the it also introduces a lot more risk.


Agentic capability just means it outputs a function call which it has had for a long time.


That's a very weak form. The way I use "agentic" is that it is trained to optimize the success of an agent, not just predict the next token.

The obvious way to to that is for it to plan a set of actions and evalute each possible way to reach some goal (or avoid an anti-goal). Kind of what AlphaZeros is doing for games. Q* is rumored to be a generalization of this.


You are far better off investing in one or more 3090s and loading up on DDR RAM.


> Agents so far need a human in the loop to keep them sane.

not quite sure that sanity is a business requirement


Yes, but to use a car dealership example, you don't want your Agent to sell a car to someone for $1 https://hothardware.com/news/car-dealerships-chatgpt-goes-aw...


> We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data.

I understand that you might be afraid. I believe that a world where only LLM companies rule the world is not practically achievable except in some distopian universe. The likelihood of the world where the only job are model architects, engineers or technicians is very very small.

Instead, let's consider the positive possibilities that LLMs can bring. It can lead to new and exciting opportunities across various fields. For instance, can serve as a tool to inspire new ideas for writers, artists, and musicians.

I think we are going towards a more collaborative era where computers and humans interact much more. Everything will be a remix :)


> The likelihood of the world where the only job are model architects, engineers or technicians is very very small.

Oh, especially since it will be a priority to automate their jobs, or somehow optimize them with an algorithm because that's a self-reinforcing improvement scheme that would give you a huge edge.


Every corporate workplace is already thinking: How can I surveil and record everything an employee does as training data for their replacement in 3 years time.


> Can it do math yet?

GPT-4? Not that well. AI? Definitely

https://deepmind.google/discover/blog/alphageometry-an-olymp...


Until the hallucination problem is solved, the output can't be trusted.

So outside of use-cases where the user can quickly verify the result (like picking a decent generated image etc.),I can't see it being used much.


Never heard of retrieval-augmented generation?


RAG? Sure. I even implemented systems using it, and enabling it, myself.

And guess what: RAG doesn't prevent hallucination. It can reduce it, and there are most certainly areas where it is incredibly useful (I should know, because that's what earns my paycheck), but it's useful despite still hallucinations being a thing, not because we solved that problem.


Are you implying that you’re the same person I was commenting to or are you just throwing your opinion into the mix?

Regardless, we’ve seen accuracy of ~98% with simple context-based prompting across every category of generation task. Don’t take my word for it, a simple search would show the effectiveness of “n-shot” prompting. Framing it as “it _can_ reduce” hallucinations is disingenuous at best, there really is no debate about how well it works. We can disagree on whether 98% accuracy is a solution but again I’d assert that for >50% of all possible real world uses for an LLM 98% is acceptable and thus the problem can be colloquially referred to as solved.

If you’re placing the bar at 100% hallucination-free accuracy then I’ve got some bad news to tell you about the accuracy of the floating point operations we run the world on


> Can it just do my entire job for me?

All AIs up to now lack autonomy. So I'd say until we crack this problem, it is not going to be able to do your job. Autonomy depends on a kind of data that is iterative, multi-turn, and learning from environments not from static datasets. We have the exact opposite, lots of non-iterative, off-policy (human made AI consumed) text.


This is still gpt4. I don’t expect much more from this version than what previous version could do, in terms of reasoning abilities.

But everyone is expecting them to release gpt5 later this year, and it is a bit scary to think what it will be able to do.


It's quite different from gpt4 in two respects:

1) It's natively multi-modal in a way I don't think gpt4 was.

2) It's at least twice as efficient in terms of compute. Maybe 3 times more efficient, considering the increase in performance.

Combined, those point towards some major breakthroughs having gone into the model. If the quality of the output hasn't gone up THAT much, it's probably because the technological innovations mostly were leveraged (for this version) to reduce costs rather than capabilities.

My guess is that we should expect them to leverage the 2x-3x boost in efficiency in a model that is at least as large as GTP4 relatively soon, probably this year unless OpenAI has safety concerns or something, and keeps it internal-only.


Branding aside, this pretty much is GPT 5.

The evidence for that is the change in the tokenizer. The only way to implement that is to re-train the entire base model from scratch. This implies that GPT 4o is not a fine-tuning of GPT 4. It's a new model, with a new tokenizer, new input and output token types, etc...

They could have called it GPT-5 and everyone would have believed them.


I’ve used it for a couple of hours to help with coding and it feels very similar to gpt4: still makes erroneous and inconsistent suggestions. Not calling it 4.5 was the right call. It is much faster though.

The expectations for gpt5 are sky high. I think we will see a similar jump as 3.5 -> 4.


Pretty sure they said they would not release GPT-5 on Monday. So it's something else still. And I don't see any sort of jump big enough to label it as 5.

I assume GPT-5 has to be a heavier, more expensive and slower model initially.

GPT-4o is like an optimisation of GPT-4.


That doesn't imply that it's GPT-5. A GPT-4 training run probably doesn't take them that long now they've acquired so many GPUs for training GPT-5.


I think 4o is actually noticeably smarter than 4, after having tried it a tiny bit on the playground.


There has been speculation that this is the same mystery model floating around on lmsys chat bot arena and they claim a real observable jump on elo scores but this remains to be seen some people don't think its even as capable as GPT4-Turbo so tbd


It's a completely new model trained from scratch that they've decided to label that way as part of their marketing strategy.


All I could think about when watching this demo was how similar capabilities will work on the battlefield. Coordinated AIs look like they will be obscenely effective.

Everything always starts as a toy.


The "Killer app" for AGI/ASI is, I suspect, going to be in robotics, even more so than in replacing "white collar workers".

That includes, beyond literal Killers, all kinds of manufacturing, construction and service work.

I would expect a LOT of funds to go into research all sorts of actuators, artificial muscles and any other technology that will be useful in building better robots.

Companies that can get and maintain a lead in such technologies may reach a position similar to what US Steel had in the 19th century.

That could be the next nvidia.

I would not be at all surprised if we will have a robot in the house in 10 years that can clean and do the dishes, and that is built using basically the same parts as the robots that replace our soldiers and the police.

Who will ultimately control them, though?


I would expect a LOT of funds to go into research all sorts of actuators, artificial muscles and any other technology that will be useful in building better robots.

If you had an ASI? I don’t think you’d need a lot of funds to go into this area anymore ? Presumably it would all be solved overnight.


Once we have godlike tier ASI, you're probably right. But I expect that robots could become extremely lucrative even when avaiable AI's haven't reached that point yet.

Companies that have a head start at that point, may get a huge first-mover advantage. Also, those companies also very well may have the capability to leverage AI in product development, just like everyone else.

And just as important as the products themselves is the manufacturing capacity to build them at scale. Until we have massive numbers of robots in service, building such infrastructure is likely to be slow and expensive.

EDIT: Also, once we really have the kind of Godlike ASI you envision, no human actions really matter (economically) anymore.


its possible. Right now ai + robotics has been a big area of research for a while, and its very good at some tasks, see basically everything boston dynamics does wrt dynamically balancing. They help alongside control systems very well. However for multimodal task planning its not there. A year or two back I wrote a long comment about it but basically there is this idea of "grounding", basically connecting computer vision, object symbols/concepts, and task planning, which remains elusive. Its a similar problem with self driving cars - you want to be able to reason very strongly about things like "place all of the screws into the red holes" in a way that maps automatically to the actions for those things


Yes. As you say, a lot of the limitations so far has been the control part, which is basically AI.

Given the pace that AI is currently moving at, it seems to me that more and more, the mechanical aspect is becoming the limitation.

GPT 4o now seems to be quite good at reasoning about the world from pictures in real time. I would expect it would soon become easy for it to do the high level part of many practical tasks, from housekeeping to manufacturing or construction. (And of course military tasks.)

This leaves the direct low-level actuator control to execute such tasks in detail. But even there, development has been immense. See for instance these soccer playing robots [1]

And as both high level and low level control (if we assume that models soon will add agentic features directly into the neural networks), the only missing peace is the ability to build mechanically capable and reliable robots at a low enough price that they become cheaper than humans for various kinds of work.

There is one more limitation, of course, which is that GPT 4o still requires a constant connection to a data center, and that the models is too large to run within a device or machine.

This is also one of the most critical limitations of self driving. Had the AI within a Tesla had the same amount of compute available as GPT-4o, it should be massively more capable.

[1] https://www.youtube.com/watch?v=RbyQcCT6890


And people will become utterly stupid in the process.


or just... unemployed.


Why so much positivity? It can also murder people, and it will continue being used for that. That's scary.


(IMO) AI cannot murder people. The responsibility of what an AI does falls on the person who deployed it, and to a lesser extent the person who created it. If someone is killed by a fully autonomous weapon then that person has been murdered by the person or people who created and enabled the AI, not the AI itself.

This is no different to saying a person with a gun murdered someone rather than attributing the murder to the gun. An AI gun is just a really fancy gun.


There will come a time where complex systems can better be predicted with the use of AI than with mathematical predictions. One use-case could be, feeding body scans into them for cancer prevention. AFAIK this is already researched.

There may come a time where we grow so accustomed to this, that the decision is so heavily influenced by AI, that we believe it more than human decisions.

And then it can very well kill a human through misdiagnostic.

I think it is important to not just put this thought aside, but to evaluate all risks.


> And then it can very well kill a home through misdiagnosis.

I would imagine outcomes would be scrutinized heavily for an application like this. There is a difference between a margin of error (existing with human doctors as well) and a sentient ai that has decided to kill, which is what it sounds like you're describing.

If we didn't give it that goal, how does it obtain it otherwise?


Except that with a gun, you have a binary input (the trigger) so you can squarely blame a human for misunderstanding what they did when they accidentally shot someone on the grounds that the trigger didnt work.

A prompt is a _very_ different matter.


The mass murder of Palestinians is already partially blamed or credited to an "AI" system that could identify people. Humans spent seconds reviewing the outcome. This is the reality of AI already being used to assist in killing. AI can't take the blame legally speaking, but it makes it easier to make the call and sleep at night. "I didn't order a strike on this person and their family of eight, the AI system marked this subject as a high risk, high value target". Computer-assisted dehumanization. (Not even necessarily AI)


> This is no different to saying a person with a gun murdered someone rather than attributing the murder to the gun.

And “guns don’t kill people, people kill people”¹ is a bad argument created by the people who benefit from the proliferation of guns, so it’s very weird that you’re using that as if it were a valid argument. It isn’t. It’s baffling anyone still has to make this point: easy access and availability of guns makes them more likely to be used. A gun which does not exist is a gun which cannot be used by a person to murder another.

It’s also worth nothing the exact words of the person you’re responding to (emphasis mine):

> It can also murder people, and it will continue being used for that.

Being used. As in, they’re not saying that AI kills on its own, but that it’s used for it. Presumably by people. Which doesn’t contradict your point.

¹ https://en.wikipedia.org/wiki/Guns_don%27t_kill_people,_peop...


We also choose to have cars, which cause a certain amount of death. It's an acceptable tradeoff (which most don't think about much). I'd speculate that it's mostly people who don't use cars who criticize them the most, and the same with guns.


That’s an absurd comparison, to the point I’ having trouble believing you’re arguing in good faith. The goal of cars is transportation; the goal of guns is harm. Cars causing deaths are accidents; guns causing deaths is them working as designed. Cars continue to be improved to cause fewer fatalities; guns are improved to cause more.

> I'd speculate that it's mostly people who don't use cars who criticize them the most, and the same with guns.

You mean that people who are opposed to something refuse to partake in its use and promotion? Shocker.


Probably it will be used for many things. Just like computers, electricity, iron


Don't need AI to murder people, a gun with an actuator on the trigger can murder people easily, add rnd to it and it can murder people at random


Anything (almost) can be used for good and bad


So can knives


Yes, but a person wielding a knife has morals, a conscience and a choice, the fear is that an AI model does not. A lot of killer AI science fiction boils down to "it is optimal and logical that humanity needs to be exterminated"; no morality or conscience involved.


Which is why there are laws around what knives are allowed and what are banned. Or how we design knifes to be secure. Or how we have a common understanding what we do with knifes - and what not. Such as not giving them to toddlers... So what's your point?


The point is not the tool but how it's used. "What knives are allowed" is a moot point because a butter knife or letter opener can be used to kill someone.


But if you give a very sharp knife to a toddler and say "go on, have fun" and walk off, you're probably going to face child endangerment charges at some point.


Call me back when we have autonomous driving, and when Bitcoin will replace currencies.

What's scary and cringey are your delusions.


Don’t get me wrong, I’m not suggesting the current capabilities are anywhere near replacing human productivity. Some things are 1 year out, some 5 (maybe self-driving cars by then? Mercedes has it on their roadmap for 2030 and they’ve historically been realistic), some 10+. But the pieces are in place and the investments are being made. The question is no longer “can AI really automate this?”, it’s “how do we get the dataset that will enable us to automate this with AI?”. And as long as Open AI keeps people’s eyes on their whizbang demos, the money will keep flowing…


Or a battery.


make people smart


Nature had been doing that for billions of years until a few decades ago when we were told "progress" meant we had to stop doing the same thing more peacefully and intentionally.

My guess is the future belongs to those who don't stop—who, in fact, embrace the opposite of stopping.

I would even suggest that the present belongs to those who didn't stop. It may be too late for normal people to ever catch up by the time we realize the trick that was played on us.


The present absolutely belongs to those who didn't stop, but it's been a lot longer than a few decades.

Varying degrees of greedy / restless / hungry / thirsty / lustful are what we've got, because how is contentedness ever going to compete with that over millennia?


It just occurred to me that this is one of the core things most successful religions have been trying to do in some form from the time they first arose.

I've had a lot of negative things to say about religion for many years. However, as has been often observed, 'perception is reality' to a certain extent when it affects how people behave, and perhaps it's kind of a counterweight against our more selfish tendencies. I just wish we could do something like it without made up stories and bigotry. Secular humanist Unitarians might be about the best we can do right now in my opinion... I'm hoping that group continues to grow (they have been in recent years).


[flagged]


People with your sentiment said the same thing about all cool tech that changed the world. Doesn't change the reality, a lot of professions will need to adapt or they will go extinct.


> People with your sentiment said the same thing about all cool tech that changed the world.

They also said it about all the over-hyped tech that did not change the world. This mentality of “detractors prove something is good” is survivorship bias.

Note I’m not saying you’ll categorically be proven wrong, just that your argument isn’t particularly strong or valid.


The mods banning my comments can go bent.


Strong Theranos vibes with this one.


Nobody here ever got a Theranos blood test.

Almost everyone here has used at least one LLM for fun or for work.

Many people here have paid for it.


What do you mean? How would you define AI?


I am a PhD biophysicist working within the field of biological imaging. Professionally, my team (successfully) uses deep learning and GANs for a variety of tasks within the field of imaging, such as segmentation, registration, and predictive protein/transcriptomics. It’s good stuff, a game changer in many ways. In no way however, does it represent generalized AI, and nobody in the field makes this claim even though the output of these algorithms match or out perform humans in cases.

LLMs are no different. Like DL modules that are very good at outputting images that mimic biological signatures, LLMs are very good at outputting texts that eerily mimic human language.

However — and this is a point which programmers are woefully and comically ignorant — human language and reason are two separate things. Tech bros wholly confuse the two however, and thus make outlandish claims we have achieved or are on the brink of achieving — actual AI systems.

In other words, while LLMs and DL in general can perform specific tasks well, they do not represent a breakthrough in artificial intelligence, and thus will have a much narrower application space than actual AI.


If you've been in the field you really should know that the term AI has been used to describe things for decades in the academic world. My degree was in AI back before RBMs and Hintons big reveal about making things 100000 times faster (do the main step just once not 100 times and take 17 years to figure that out).

You're talking more about AGI.

We need "that's not AI" discussions like we need more "serverless? It's still on some server!!" discussions.


I think it's even incomparable to server vs serverless discussions.

It's about meaning of intelligence. These people don't have problems claiming that ants or dolphins are intelligent, but suddenly for machines to be classified as artificial intelligence they must be exactly on the same level as humans.

Intelligence is just about the ability to solve problems. There's no implication that in order for something to be intelligent it has to perform on at least the same level as top people in that field in the World.

It just has to be beyond a simple algorithm and be able to solve some sort of problem. You have AIs in video games that are just bare logic spaghetti computation with no neural networks.


By your definition, a handheld calculator from the 1960s is ‘AI’.

In other words, you’ve lost the argument.


I said beyond a simple algorithm.


Or you're using AI as a term differently to the people in the field. SVMs are extremely simple, two layer perceptrons are things you can work out by hand!

Just stop trying to redefine AI as a term, you'll lose against the old hands and you'll lose against the marketing dept and you'll lose against the tech bros and nobody who you actually need to explain it to will care. Use AGI or some other common term for what you're clearly talking about.


So, the ‘revolutionary’, ‘earth-shattering, ‘soon-to-make-humans obsolete’ talk about ChatGPT is all bullshit and this is just another regular, run-of-the-mill development with the label of ‘AI’ slapped on somewhere, just like all the others from the last 40 years? What in the hell is even your point then? Is ChatGPt a revolutionary precursor to AGI if not AGI already? I say it’s not.


This is true. But only to a point where mimicking and more broadly speaking, statistically imitating data, are understood in a more generalized way.

LLMs statistically imitates texts of real world. To achieve certain threshold of accuracy, it turns out they need to imitate the underlying Turing machine/program/logic that runs in our brains to understand/react properly to texts by ourselves. That is no longer in the realm of the old school data-as-data statistics I would say.


The problem with this kind of criticism of any AI-related technology is that it is an unfalsifiable argument akin to saying that it can't be "proper" intelligence unless God breathed a soul into the machine.

The method is irrelevant. The output is what matters.

This is like a bunch of intelligent robots arguing that "mere meat" cannot possibly be intelligent!

https://www.mit.edu/people/dpolicar/writing/prose/text/think...


> LLMs are very good at outputting texts that eerily mimic human language.

What a bizarre claim. If LLMs are not actually outputting language, why can I read what they output then? Why can I converse with it?

It's one thing to claim LLMs aren't reasoning, which is what you later do, but you're disconnected from reality if you think they aren't actually outputting language.


You have missed the point entirely.


My point is that you made a bad point. Just be more precise with your phrasing if that wasn't your intended meaning.


Is there a block button? Or a filter setting? You are see unaware and uninquisitive of actual human language, you cannot see the gross assumptions you are making.


> generalized AI

No one is talking about it being AGI. Everyone is talking about just AI specifically. I think your problem is thinking that AI = AGI.

For example AI in video games is very specific and narrow to its domain.


Personally, I find it hard to believe a PhD would have trouble conjugating “drink”. :P


human language and reason are two separate things

... in the human brain which has evolved "cores" to handle each task optimally.

It's like the Turing Test. If it looks like it's reasoning, does it matter that it's doing it like a human brain or not?


The fact that you’re convinced this isn’t possible (or at least not happening here) means they’ve literally made progress beyond belief.


"We shall not be very greatly surprised if a woman analyst who has not been sufficiently convinced of the intensity of her own wish for a penis also fails to attach proper importance to that factor in her patients" Sigmund Freud, in response to Karen Horney’s criticism of his theory of penis envy.


W-what? Lad, have you used chat-gpt? It can instantly give you intelligent feedback on anything (usually better than any expert community like 90% of the time.) On extremely detailed, specific tasks (like writing algorithms or refactoring) its able to spit out either working code or code so close to working that its still faster than what you could have done yourself. It can explain things better than probably 99.999% of teachers.

It will give you detailed examples that are much easier to follow than vague, error-prone spec docs. That's scratching the surface. Other people are far more creative than me and have used chat-gpt for mind-blowing stuff already. Whatever its doing passes for 'reasoning' and 'intelligence' in my book. To me it doesn't matter whether its the same kind of intelligence as a human or if there's any amount of awareness as those are both philosophical questions of no consequence to my work.

For what these pieces of tech can do I feel that they're drastically under-utilized.


[flagged]


Why are you sharing a link to a dodgy cloned site?


I've worked quite a bit with STT and TTS over the past ~7 years, and this is the most impressive and even startling demo I've seen.

But I would like to see how this is integrated into applications by third party developers where the AI is doing a specific job. Is it still as impressive?

The biggest challenge I've had with building any autonomous "agents" with generic LLM's is they are overly gullible and accommodating, requiring the need to revert back to legacy chatbot logic trees etc. to stay on task and perform a job. Also STT is rife with speaker interjections, leading to significant user frustrations and they just want to talk to a person. Hard to see if this is really solved yet.


I’ve found using logic trees with LLMs isn’t necessarily a problem or a deficit. I suppose if they were truly magical and could intuit the right response every time, cool, but I’d always worry about the potential for error and hallucinations.

I’ve found that you can create declarative logic trees from JSON and use that as a prompt for the LLM, which it can then use to traverse the tree accordingly. The only issue I’ve encountered is when it wants to jump to part of the tree which is invalid in the current state. For example, you want to move a user into a flow where certain input is required, but the input hasn’t been provided yet. A transition is suggested to the program by the LLM, but it’s impossible so the LLM has to be prompted that the transition is invalid and to correct itself. If it fails to transition again, a default fallback can be given but it’s not ideal at all.

However, another nice aspect of having the tree declared in advance is that it shows human beings what the system is capable and how it’s intended to be used as well. This has proven to be pretty useful, as letting the LLM call functions it sees fit based on broad intentions and system capabilities leaves humans in the dark a bit.

So, I like the structure and dependability. Maybe one day we can depend on LLM magic and not worry about a team understanding the ins and outs of what should or shouldn’t be possible, but we don’t seem to be there yet at all. That could be in part because my prompts were bad, though.


Any recommendations on patterns/approaches for these declarative logic trees and where you put which types of logic (logic which goes in the prompt, logic which goes in the code which parses the prompt response, how to detect errors in the response and retry the prompt, etc). On "Show HN" I see a lot of "fully automated agents" which seem interesting, but not sure if they are over-kill or not.


Personally, I've found that a nested class structure with instructions in annotated field descriptions and/or docstrings can work wonders. Especially if you handle your own serialization to JSON Schema (either by rolling your own or using hooks provided by libraries like Pydantic), so you can control what attributes get included and when.


The JSON serialization strategy worked really well for me in a similar context. It was kind of a shot in the dark but GPT is pretty awesome at using structured data as a prompt.


I actually only used an XState state machine with JSON configuration and used that data as part of the prompt. It worked surprisingly well.

Since it has an okay grasp on how finite state machines and XState work, it seems to do a good job of navigating the tree properly and reliably. It essentially does so by outputting information it thinks the state machine should use as a transition in a JSON object which gets parsed and passed to a transition function. This would fail occasionally so there was a recursive “what’s wrong with this JSON?” prompt to get it to fix its own malformed JSON, haha. That was meant to be a temporary hack but it worked well, so it stayed. There were a few similar tools for trying to correct errors. That might be one of the strangest developments in programming for me… Deploying non-deterministic logic to fix itself in production. It feels wrong, but it works remarkably well. You just need sane fallbacks and recovery tactics.

It was a proprietary project so I can’t share the source, but I think reading up on XState JSON configuration might explain most of it. You can describe most of your machine in a serializable format.

You can actually store a lot of useful data in state names, context, meta, and effect/action names to aid with the prompting and weaving state flows together in a language-friendly way. I also liked that the prompt would be updated by information that went along with the source code, so a deployment would reliably carry the correct information.

The LLM essentially hid a decision tree from the user and smoothed over the experience of navigating it through adaptive and hopefully intuitive language. I’d personally prefer to provide more deterministic flows that users can engage with on their own, but one really handy feature of this was the ability to jump out of child states into parent states without needing to say, list links to these options in the UI. The LLM was good at knowing when to jump from leaves of the tree back up to relevant branches. That’s not always an easy UI problem to solve without an AI to handle it for you.

edit: Something I forgot to add is that the client wanted to be able to modify these trees themselves, so the whole machine configuration was generated by a graph in a database that could be edited. That part was powered by Strapi. There was structured data in there and you could define a state, list which transitions it can make, which actions should be triggered and when, etc. The client did the editing directly in Strapi with no special UI on top.

Their objective is surveying people in a more engaging and personable way. They really wanted surveys which adapt to users rather than piping people through static flows or exposing them to redundant or irrelevant questions. Initially this was done with XState and no LLM (it required some non-ideal UI and configuration under the hood to make those jumps to parent states I mentioned, but it worked), and I can't say how effective it is but they really like it. The AI hype was very very strong on that team.


I'm building a whole AI agent-building platform on top of Xstate actors. Check it out craftgen.ai or https://github.com/craftgen/craftgen


LangGraph


>Also STT is rife with speaker interjections, leading to significant user frustrations and they just want to talk to a person. Hard to see if this is really solved yet.

This is not using TTS or STT. Audio and Image data can be tokenized as readily as text. This is simply a LLM that happens to have been trained to receive and spit out audio and image tokens as well as text tokens. Interjections are a lot more palatable in this paradigm as most of the demos show.


Adding audio data as a token, in and of itself, would dramatically increase training size, cost, and time for very little benefit. Neural networks also generally tend to function less effectively with highly correlated inputs, which I can only assume is still an issue for LLMs. And adding combined audio training would introduce rather large scale correlations in the inputs.

I would wager like 100:1 that this is just introducing some TTS/STT layers. The video processing layer is probably also doing something similarly, by taking an extremely limited number of 'screenshots', carrying out typical image captioning using another layer, and then feeding that as an input. So the demo, to me, seems most likely to just be 3 separate 'plugins' operating in unison - text to speech, speech to text, and image to text.

The interjections are likely just the software being programmed to aggressively begin output following any lull after an input pattern. Note in basically all the videos, the speakers have to repeatedly cut off the LLM as it starts speaking in conversationally inappropriate locations. In the main video which is just an extremely superficial interaction, the speaker made sure to be constantly speaking when interacting, only pausing once to take a breath that I noticed. He also struggled with the timing of his own responses as the LLM still seems to be attached to its typical, and frequently inappropriate, rambling verbosity (though perhaps I'm not one to critique that).


>I would wager like 100:1 that this is just introducing some TTS/STT layers.

Literally the first paragraph of the linked blog.

"GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs."

Then

"Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network."


I can’t square this with the speed. A couple of layers doing STT are technically still part of the neural network, no? Because the increase in token base to cover multimodal tokenization would make even text inference slower, not twice as fast, as 4-turbo.

But I’m not an expert!


Open ai give so little information on the details of their models now that one can only speculate how they've managed to cut down inference costs.

STT throws away a lot of information that is clearly being preserved in a lot of these demos so that's definitely not happening here in that sense. That said, the tokens would be merged to a shared embedding space. Hard to say how they are approaching it exactly.


I'd mentally change the acronym to Speech to Tokens. Parsing emotion and other non-explicit indicators in speech has been an ongoing part of research for years now. Meta-data of speaker identity, inflection, etc could easily be added and current LLMs already work with it just fine. For instance asking Claude, with 0 context, to parse the meaning of "*laughter* Yeah, I'm sure that's right." instantly yields:

----

The phrase "*laughter* Yeah, I'm sure that's right" appears to be expressing sarcasm or skepticism about whatever was previously said or suggested. Here's a breakdown of its likely meaning:

"*laughter*" - This typically indicates the speaker is laughing, which can signal amusement, but in this context suggests they find whatever was said humorous in an ironic or disbelieving way.

"Yeah," - This interjection sets up the sarcastic tone. It can mean "yes" literally, but here seems to be used facetiously.

"I'm sure that's right." - This statement directly contradicts and casts doubt on whatever was previously stated. The sarcastic laughter coupled with "I'm sure that's right" implies the speaker believes the opposite of what was said is actually true.

So in summary, by laughing and then sarcastically saying "Yeah, I'm sure that's right," the speaker is expressing skepticism, disbelief or finding humor in whatever claim or suggestion was previously made. It's a sarcastic way of implying "I highly doubt that's accurate or true."

----


It could be added. Still wouldn't sound as good as what we have here. Audio is Audio and Text is Text and no amount of metadata we can practically provide will replace the information present in sound.

You can't exactly metadata your way out of this (skip to 11:50)

https://www.youtube.com/live/DQacCB9tDaw?si=yN7al6N3C7vCemhL


I'm not sure why you say so? To me that seems obviously literally just swapping/weighting between a set of predefined voices. I'm sure you've played a game with a face generator - it's the exact same thing, except with audio. I'd also observe in the demo that they explicitly avoided anything particularly creative, instead sticking within an extremely narrow domain very basic adjectives: neutral, dramatic, singing, robotic, etc. I'm sure it also has happy, sad, angry, mad, and so on available.

But if the system can create a flamboyantly homosexual Captain Picard with a lisp and slight stutter engaging in overt innuendo when stating, "Number one, Engage!" then I look forward to eating crow! But as the instructions were all conspicuously just "swap to pretrained voice [x,y,z]", I suspect crow will not be on the menu any time soon.



What about the input of the heavy breathing?


I'm sorry but you don't know what you're talking about and I'm done here. Clearly you've never worked with or tried to train STT or TTS models in any real capacity so inventing dramatic capabilities, disregarding latency and data requirements must come easily for you.

Open AI have explicitly made this clear. You are wrong. There's nothing else left to say here.


Since OpenAI has gone completely closed, they've been increasingly opaque and dodgy about how even things like basic chat works. Assuming the various leaked details of GPT-4 [1] are correct (and to my knowledge there has been no indication that they are not), they have been actively misleading and deceptive - as even the 'basic' GPT4 is a mixture of experts system, and not one behemoth neural network.

[1] - https://lifearchitect.ai/gpt-4/


A Mixture of Experts model is still one behemoth neural network and believing otherwise is just a common misconception on term.

MoE are attempts at sparsity, only activating a set number of neurons/weights at a time. They're not separate models stitched together. They're not an Ensemble. I blame the name at this point.


I would ask you to watch the demo on SoundHound.com. It does less, yes, but it's so crucially fit for use. You'll notice from the shown gpt-4 demo they were guiding the LLM into chain of reasoning. It works very well when you know how to work it, which aligns with what you're saying. I don't mean to degrade the achievement, it's great, but we often inflate the expectations of what something can actually do before reaching real productivity.


I think if you listen to the way it answers, it seems its using a technique trained speakers use. To buy itself time to think, it repeats/paraphrases the question/request before actually answering.

I'm sure you'll find this part is a lot quicker to process, giving the instant response (the old gpt4-turbo is generally very quick with simple requests like this). Rather impressively all it would need is an additional custom instruction.

Very clever and eerily human.


This behavior is clearly shown on the dad joke demo: https://vimeo.com/945587876


Have you seen this video from Microsoft, uploaded to YT in 2012, the actual video could be even older: https://www.youtube.com/watch?v=Nu-nlQqFCKg


This is the first demo where you can really sense that beating LLM benchmarks should not be the target. Just remember the time when the iPhone has meager specs but ultimately delivered a better phone experience than the competition.

This is the power of the model where you can own the whole stack and build a product. Open Source will focus on LLM benchmarks since that is the only way foundational models can differentiate themselves, but it does not mean it is a path to a great user experience.

So Open Source models like Llama will be here to stay, but it feels more like if you want to build a compelling product, you have to own and control your own model.


OpenAI blew up when they released ChatGPT. It was more of a UX breakthrough than pure tech, since GPT3 was available for a few months already.

This feels similar, with OpenAI trying to put their product even more into the daily lives of their users. With GPT4 being good enough for nearly all basic tasks, the natural language and multimodality could be big.


I don’t think Llama being open sourced means Meta has lost anything. If anything it’s just a way to get free community contribution, like Chrome from Chromium. Mega absolutely intends to integrate their version of Llama in their products not so unlike how OpenAI is creating uses for their LLM beyond just the technology


Depends on the benchmarks. AI that can actually do end to end the job of software developers, theoretical computer scientists, mathematicians etc. would be significantly more impactful than this.

I want to see AI moving the state of the art of the world understanding - physics, mathematics etc. - the way it moved state of the art of the Go game understanding.


Doing these end to end jobs still falls on user experience and UI, if we are talking about getting to mass market.

This GPT-4o model is a classic example. It is essentially the same model as GPT-4 but these multimodal features, voice conversations, math, and speed is revolutionary as the creation of the model itself.

Open Source LLM will end up as a model in GitHub and will be used by developers but it looks like even if GPT-4o is only 3 months ahead of other models in terms of benchmarks, the UI + Usecase + Model is 2 years ahead of the competition. And I say that because there is still no chat product that is close to what ChatGPT is delivering now, even though there are models that is close to ChatGPT 4o today.

So if it is sticky for 2 more years, their lead will just grow and we will just end up with more open source models that are technically behind by 3 months but behind product-wise by 2 years.


Now that I see this, here is my wish (I know there are security privacy concerns but let's pretend there are not there for this wish): An app that runs on my desktop and has access to my screen(s) when I work. At any time I can ask it something about what's on the screen, it can jump in and let me know if it thinks I made a mistake (think pair programming) or a suggestion (drafting a document). It can also quickly take over if I ask it too (copilot on demand).

Except for the last point and the desktop version I think it's already done in math demo video.

I guess it will also pretty soon refuse to let me come back inside the spaceship, but until then it'll be a nice ride.


Here you go: UFO - A UI-Focused Agent for Windows OS Interaction

"UFO is a UI-Focused dual-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications."

https://github.com/microsoft/UFO?tab=readme-ov-file


Agreed. I’m excited about reaching a point where the experience is of being in a deep work ‘flow’ with an ultra intelligent colleague, instead of jumping out of context to instant message them.


> with an ultra intelligent colleague

Ultra knowledgeable but pretty stupid actually.


A very eager very well read intern.


So far. :)


If you understand the first thing about LLMs you'll know it'll never be less stupid, just better at hiding its stupidity.


This makes me think, we're seeing all these products inject AI and try to be "smart" on their own, but maybe the experience we really need is a smart OS that can easily orchestrate dumb products.

I know that Siri/Google Assistant/Cortana(?) can already integrate with 3p apps, so maybe something like this but much smarter. e.g. instead of "send the following email" you would tell the assistant "just write the email yourself". At this point your email app doesn't need integrated AI anymore. Just hooks for the assistant.

I imagine once Google puts that kind of brains on Android and Chrome, many product devs will no longer need to use AI directly. Two birds one stone situation, since these devs won't need OpenAI.


1984's telescreens aren't something you're supposed to wish for


One of my favorite tweets:

    Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale

    Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus
https://twitter.com/AlexBlechman/status/1457842724128833538?...


You mean I can't wish for a brave new world?

Joking aside, I agree. It's too bad, though, that we know a thing (this or anything else even technological or not) that could be used for good and improving ourselves will almost always be diverted for something bad...


This basically already exists and the companies that sell this are constantly improving it. For better or worse.


“You haven’t done anything productive for 15 minutes. Are you taking an unauthorized break?”


Something similar already exists, see https://www.rewind.ai/ and https://www.perfectmemory.ai/


They showed just that in the demo with the voice call example. Screen share can be a live video feed


HAL. HAL? HAL!


Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.

Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.

The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.


Seems like it was picking up on the audience reaction and stopping to listen.

To me the more troubling thing was the apparent hallucination (saying it sees the equation before he wrote it, commenting on an outfit when the camera was down, describing a table instead of his expression), but that might have just been latency awkwardness. Overall, the fast response is extremely impressive, as is the new emotional dimension of the voice.


Aha, I think I saw the trick for the live demo: every time they used the "video feed", they did prompt the model specifically by saying:

- "What are you seeing now"

- "I'm showing this to you now"

etc.

The one time where he didn't prime the model to take a snapshot this way, was the time where the model saw the "table" (an old snapshot, since the phone was on the table/pointed at the table), so that might be the reason.


Yeah, the way the app currently works is that ChatGPT-4o only sees up to the moment of your last comment.

For example, I tried asking ChatGPT-4o to commentate a soccer game, but I got pretty bad hallucinations, as the model couldn’t see any new video come in after my instruction.

So when using ChatGPT-4o you’ll have to point the camera first and then ask your question - it won’t work to first ask the question and then point the camera.

(I was able to play with the model early because I work at OpenAI.)


thanks


Commenting on the outfit was very weird indeed. Greg Brockman's demo includes some outfit related questions (https://twitter.com/gdb/status/1790071008499544518). It does seem very impressive though, even if they polished it on some specific tasks. I am looking forward to showing my desktop and asking questions.


Regarding the limits, I recently found that I was hitting limits very quickly on GPT-4 on my ChatGPT Plus plan.

I’m pretty sure that wasn’t always the case - it feels like somewhere along the lines the allowed usage was reduced, unless I’m imagining it. It wouldn’t be such a big deal if there was more visibility of my current usage compared to my total “allowance”.

I ended up upgrading to ChatGPT Team which has a minimum of 2x users (I now use both accounts) but I resented having to do this - especially being forced to pay for two users just to meet their arbitrary minimum.

I feel like I should not be hitting limits on the ChatGPT Plus paid plan at all based on my usage patterns.

I haven’t hit any limits on the Team plan yet.

I hope they continue to improve the paid plans and become a bit more transparent about usage limits/caps. I really do not mind paying for this (incredible) tech, but the way it’s being sold currently is not quite right and feels like paid users get a bit of a raw deal in some cases.

I have API access but just haven’t found an open source client that I like using as much as the native ChatGPT apps yet.


I use GPT from API in emacs, it's wonderful. Gptel is the program.

Although API access through Groq to Llama 3 (8b and 70b) is so much faster, that i cannot stand how slow GPT is anymore. It is slooow, still very capable model, but marginally better than open source alternatives.


you should try -4o. It's incredibly fast


Have you tried groq.com? Because I don't think gpt-4o is "incredibly" fast. I've been frustrated at how slow gpt-4-turbo has been lately, and gpt-4o just seems to be "acceptably" fast now, which is a big improvement, but still, not groq-level.


Yes, of course, probably sometime in the following days. Some people mention it already works in the playground.

I was wondering why OpenAI didn't release a smaller model but faster. 175 billion parameters works well, but speed sometimes is crucial. Like, a 20b parameters model could compute 10x faster.


true. at least rn though, it types around the same speed of 3.5 turbo


I went through the exact same situation this last week. Didn't send more than 30 (token-heavy) messages within a few hours and it blocked me for 1 hour if I'm not wrong - paying user.


They need to fade the audio or add some vocal queue when it's being interrupted. It makes it sound like it's losing connection. What'll be really impressive is when it intentionally starts interrupting you.


Agree. While watching the demo video, I thought I was the one having connectivity issues.


> Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.

It just stops the audio feed when it detects sound instead of an AI detecting when it should speak, so that part is horrible yeah. A full AI conversation would detect the natural pauses where you give it room to speak or when you try to take the word from it by interrupting, there it was just some dumb script to just shut it off when it hears sound.

But it is still very impressive for all the other part, that voice is really good.

Edit: If anyone from OpenAI reads this, at least fade out the voice quickly instead of chopping it, hard chopping off audio doesn't sound good at all, so many experienced this presentation to be extremely buggy due to it.


what's the download link for the desktop app? can't find it


seems like it might not be available for everyone? – my chatgpt plus doesn't show anything new, and also can't find the dekstop app


They are admitting[1] that the new model is the gpt2-chatbot that we have seen before[2]. As many highlighted there, the model is not an improvement like GPT3->GPT4. I tested a bunch of programming stuff and it was not that much better.

It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.

[1] https://twitter.com/LiamFedus/status/1790064963966370209

[2] https://news.ycombinator.com/item?id=40199715


I think the live demo that happened on the livestream is best to get a feel for this model[0].

I don't really care whether it's stronger than gpt-4-turbo or not. The direct real-time video and audio capabilities are absolutely magical and stunning. The responses in voice mode are now instantaneous, you can interrupt the model, you can talk to it while showing it a video, and it understands (and uses) intonation and emotion.

Really, just watch the live demo. I linked directly to where it starts.

Importantly, this makes the interaction a lot more "human-like".

[0]: https://youtu.be/DQacCB9tDaw?t=557


The demo is impressive but personally, as a commercial user, for my practical use cases, the only thing I care about is how smart it is, how accurate are its answers and how vast is its knowledge. These haven’t changed much since GPT-4, yet they should, as IMHO it is still borderline in its abilities to be really that useful


But that's not the point of this update


I know, and I know my comment is dismissive of the incredible work shown here, as we’re shown sci-fi level tech. But I feel I have this kettle, that boils water in 10min, and it really should boil it in 1, but instead is now voice operated.

I hope the next version delivers on being smarter, as this update instead of making me excited, makes me feel they’ve reached a plateau on the improvement of the core value and are distracting us with fluff instead


Everything is amazing & Nobody is happy: https://www.youtube.com/watch?v=PdFB7q89_3U


gpt4 isn't quite "amazing" in terms of commercial use. Gpt4 is often good, and also often mediocre or bad. Its not going to change the world, it needs to get better.


Near real-time voice feedback isn't amazing? Has the bar risen this high?

I already know an application for this, and AFAIK it's being explored in the SaaS space: guided learning experiences and tutoring for individuals.

My kids, for instance, love to hammer Alexa with random questions. They would spend a huge amount of time using a better interface, esp. with quick feedback, that provided even deeper insight and responses to them.

Taking this and tuning it to specific audiences would make it a great tool for learning.


"My kids, for instance, love to hammer Alexa with random questions. They would spend a huge amount of time using a better interface, esp. with quick feedback, that provided even deeper insight and responses to them."

Great, using GPT-4 the kids will be getting a lot of hallucinated facts returned to them. There are good use cases for tranformer currently but they're not at the "impact company earnings or country GDP" stage currently, which is the promise that the whole industry has raised/spent 100+B dollars on. Facebook alone is spending 40B on AI. I believe in the AI future, but the only thing that matters for now is that the models improve.


I always double-check even the most obscure facts returned by GPT-4 and have yet to see a hallucination (as opposed to Claude Opus that sometimes made up historical facts). I doubt stuff interesting to kids would be so out of the data distribution to return a fake answer.

Compared to YouTube and Google SEO trash, or Google Home / Alexa (which do search + wiki retrieval), at the moment GPT-4 and Claude are unironically safer for kids: no algorithmic manipulation, no ads, no affiliated trash blogs, and so on. Bonus is that it can explain on the level of complexity the child will understand for their age



My kids get erroneous responses from Alexa. This happens all the time. The built-in web search doesn't provide correct answers, or is confusing outright. That's when they come to me or their Mom and we provide a better answer.

I still see this as a cool application. Anything that provides easier access to knowledge and improved learning is a boon.

I'd rather worry about the potential economic impact than worry about possible hallucinations from fun questions like "how big is the sun?" or "what is the best videogame in the world?", etc.

There's a ton you can do here, IMO.

Take a look at mathacademy.com, for instance. Now slap a voice interface on it, provide an ability for kids/participants to ask questions back and forth, etc. Boom: you've got a math tutor that guides you based on your current ability.

What if we could get to the same style of learning for languages? For instance, I'd love to work on Spanish. It'd be far more accessible if I could launch a web browser and chat through my mic in short spurts, rather than crack open Anki and go through flash cards, or wait on a Discord server for others to participate in immersive conversation.

Tons of cool applications here, all learning-focused.


People should be more worried about how much this will be exploited by scammers. This thing is miles ahead of the crap fraudsters use to scam MeeMaw out of her life savings.


It's an impressive demo, it's not (yet) an impressive product.

It seems like the people who are ohhing and ahhing at the former and the people who are frustrated that this kind of this is unbelivably impractical to productize will be doomed to talk past one another forever. The text generation models, image generation models, speech-to-text and text-to-speech have reached impressive product stages. Multi-model hasn't got there because no one is really sure what to actually do with the thing outside of make cool demos.


Multi modal isn't there because "this is an image of a green plant" is viable in a demo, but its not commercially viable. "This is an image of a monstera deliciosa" is commercially viable, but not yet demoable. The models need to improve to be usable.


Sure, but "not enough, I want moar" is a trivial demand. So trivial that it goes unsaid.


It's equivalent to "nothing to see here" which is exactly the TLDR I was looking for.


Watch the last few minutes of that linked video, Mira strongly hints that there’s another update coming for paid users and seems to make clear that GPT4o is moreso for free tier users (even though it is obviously a huge improvement in many features for everyone).


There is room for more than one use case and large language model type.

I predict there will be a zoo (more precisely tree, as in "family tree") of models and derived models for particular application purposes, and there will be continued development of enhanced "universal"/foundational models as well. Some will focus on minimizing memory, others on minimizing pre-training or fine-tuning energy consumption, some need high accuracy, others hard realtime speed, yet others multimodality like GPT4.o, some multilinguality, and so on.

Previous language models that encoded dictionaries for spellcheckers etc. never got standardized (for instance, compare aspell dictionaries to the ones from LibreOffice to the language model inside CMU PocketSphinx) so that you could use them across applications or operating systems. As these models are becoming more common, it would be interesting to see this aspect improve this time around.

https://www.rev.com/blog/resources/the-5-best-open-source-sp...


I disagree, transfer learning and generalization are hugely powerful and specialized models won't be as good because their limited scope limits their ability to generalize and transfer knowledge from one domain to another.

I think people who emphasis specialized models are operating under a false assumption that by focusing the model it'll be able to go deeper in that domain. However, the opposite seems to be true.

Granted, specialized models like AlphaFold are superior in their domain but I think that'll be less true as models become more capable at general learning.


They say it's twice as fast/cheap, which might matter for your use case.


It's twice as fast/cheap relative to GPT-4-turbo, which is still expensive compared to GPT-3.5-turbo and Claude Haiku.

https://openai.com/api/pricing/


For commercial use at scale, of course cost matters.

For the average Joe programmer like me, GPT4 is already "dirt cheap". My typical monthly bill is $0-3 using it as much as I like.

The one time it was high was when I had it take 90+ hours of Youtube video transcripts, and had it summarize each video according to the format I wanted. It produced about 250 pages of output.

That month I paid $12-13. Well worth it, given the quality of the output. And now it'll be less than $7.

For the average Joe, it's not expensive. Fast food is.


but better afaik


But may not be better enough to warrant the cost difference. LLM cost econonmics are complicated.


I’d much rather have it be slower, more expensive, but smarter


Depends what you want it for. I'm still holding out for a decent enough open model, Llama 3 is tantalisingly close, but inference speed and cost are serious bottlenecks for any corpus-based use case.


I think, that might come with the next GPT version.

OpenAI seems to build in cycles. First they focus on capabilities, then they work on driving the price down (occasionally at some quality degradation)


Then the current offering should suffice, right?


I understand your point, and agree that it is "borderline" in its abilities — though I would instead phrase it as "it feels like a junior developer or an industrial placement student, and assume it is of a similar level in all other skills", as this makes it clearer when it is or isn't a good choice, and it also manages expectations away from both extremes I frequently encounter (that it's either Cmdr Data already, or that's it's a no good terrible thing only promoted by the people who were previously selling Bitcoin as a solution to all the economics).

That said, given the price tag, when AI becomes genuinely expert then I'm probably not going to have a job and neither will anyone else (modulo how much electrical power those humanoid robots need, as the global electricity supply is currently only 250 W/capita).

In the meantime, making it a properly real-time conversational partner… wow. Also, that's kinda what you need for real-time translation, because: «be this, that different languages the word order totally alter and important words at entirely different places in the sentence put», and real-time "translation" (even when done by a human) therefore requires having a good idea what the speaker was going to say before they get there, and being able to back-track when (as is inevitable) the anticipated topic was actually something completely different and so the "translation" wasn't.


I guess I feel like I’ll get to keep my job a while longer and this is strangely disappointing…

A real time translator would be a killer app indeed, and it seems not so far away, but note how you have to prompt the interaction with ‘Hey ChatGPT’; it does not interject on its own. It is also unclear if it is able to understand if multiple people are speaking and who’s who. I guess we’ll see soon enough :)


> It is also unclear if it is able to understand if multiple people are speaking and who’s who. I guess we’ll see soon enough :)

Indeed; I would be pleasantly surprised if it can both notice and separate multiple speakers, but only a bit surprised.


One thing I've noticed, is the more context and more precise the context I give it the "smarter" it is. There are limits to it of course. But, I cannot help but think that's where next barrier will be brought down. An agent or multiple of that tag along with everything I do throughout the day to have the full context. That way, I'll get smarter and more to the point help as well as not spending much time explaining the context.. but, that will open a dark can that I'm not sure people will want to open - having an AI track everything you do all the time (even if only in certain contexts like business hours / env).


There are definitely multiple dimensions these things are getting better in. The popular focus has been on the big expensive training runs but inference , context size, algorithms, etc are all getting better fast


I have a few LLM benchmarks that were extracted from real products.

GPT-4o got slightly better overall. Ability to reason improved more than the rest.


Its faster, smarter and cheaper over the API. Better than a kick in the teeth.


Absolutely agree.

This model isn't about basemark chasing or being a better code generator; it's entirely explicitly focused on pushing prior results into the frame of multi-modal interaction.

It's still a WIP, most of the videos show awkwardness where its capacity to understand the "flow" of human speech is still vestigial. It doesn't understand how humans pause and give one another space for such pauses yet.

But it has some indeed magic ability to share a deictic frame of reference.

I have been waiting for this specific advance, because it is going to significantly quiet the "stochastic parrot" line of wilfully-myopic criticism.

It is very hard to make blustery claims about "glorified Markov token generation" when using language in a way that requires both a shared world model and an understanding of interlocutor intent, focus, etc.

This is edging closer to the moment when it becomes very hard to argue that system does not have some form of self-model and a world model within which self, other, and other objects and environments exist with inferred and explicit relationships.

This is just the beginning. It will be very interesting to see how strong its current abilities are in this domain; it's one thing to have object classification—another thing entirely to infer "scripts plans goals..." and things like intent, and, deixis. E.g. how well does it now understand "us" and "them" and "this" vs "that"?

Exciting times. Scary times. Yee hawwwww.


What part of this makes you think GPT-4 suddenly developed a world model? I find this comment ridiculous and bizarre. Do you seriously think snappy response time + fake emotions is an indicator of intelligence? It seems like you are just getting excited and throwing out a bunch of words without even pretending to explain yourself:

> using language in a way that requires both a shared world model

Where? What example of GPT-4o requires a shared world model? The customer support example?

The reason GPT-4 does not have any meaningful world model (in the sense that rats have meaningful world models) is that it freely believes contradictory facts without being confused, freely confabulates without having brain damage, and it has no real understanding of quantity or causality. Nothing in GPT-4o fixes that, and gpt2-chatbot certainly had the same problems with hallucinations and failing the same pigeon-level math problems that all other GPTs fail.


One of the most interesting things about the advent of LLMs is people bringing out all sorts of "reasons" GPT doesn't have true 'insert property' but all those reasons freely occur in humans as well

>that it freely believes contradictory facts without being confused,

Humans do this. You do this. I guess you don't have a meaningful world model.

>freely confabulates without having brain damage

Humans do this

>and it has no real understanding of quantity or causality.

Well this one is just wrong.


So many even here on HN have a near-religious belief that intelligence is unique to humans and animals, and somehow a fundamental phenomenon that cannot ever be created using other materials.


It reminds me of the geocentric mindset.


It's a defensive response to an emerging threat to stability and current social tiers.


>>and it has no real understanding of quantity or causality.

>Well this one is just wrong.

Is it?

--

Me: how many characters are in: https://google.com

ChatGPT: The URL "https://google.com" has 12 characters, including the letters, dots, and slashes.

--

What is it counting there? 12 is wrong no matter how you dice that up.

Part of the reason is it has no concept of the actual string. That URL breaks into four different tokens in 3.5 and 4: "http", "://", "google" and ".com".

Its not able to figure out the total length, or even the length of its parts and add them together.

I ask it to double check, it tells me 13 and then 14. I tell it the answer and suddenly its able...

---

Me: I think its 18

ChatGPT: Let's recount together:

"https://" has 8 characters. "google" has 6 characters. ".com" has 4 characters. Adding these up gives a total of 8 + 6 + 4 = 18 characters. You're correct! My apologies for the oversight earlier.

---

Count me out.


LLMs process text, but only after it was converted to a stream of tokens. As a result, LLMs are not very good at answering questions about letters in the text. That information was lost during the tokenization.

Humans process photons, but only after converting them into nerve impulses via photoreceptor cells in the retina, which are sensitive to wavelengths ranges described as "red", "green" or "blue".

As a result, humans are not very good at distinguishing different spectra that happen to result in the same nerve impulses. That information was lost by the conversion from photons to nerve impulses. Sensors like the AS7341 that have more than 3 color channels are much better at this task.


Yet I can learn there is a distinction between different spectra that happen to result in the same nerve impulses. I know if I have a certain impulse, that I can't rely on it being a certain photon. I know to use tools, like the AS7341, to augment my answer. I know to answer "I don't know" to those types of questions.

I am a strong proponent of LLM's, but I just don't agree with the personification and trust we put into its responses.

Everyone in this thread is defending that ChatGPT can't count for _reasons_ and how its okay, but... how can we trust this? Is this the sane world we live in?

"The AGI can't count letters in a sentence, but any day not he singularity will happen, the AI will escape and take over the world."

I do like to use it for opinion related questions. I have a specific taste in movies and TV shows and by just listing what I like and going back and forth about my reasons for liking or not liking it's suggestions, I've been able to find a lot of gems I would have never heard of before.


That URL breaks into four different tokens in 3.5 and 4: "http", "://", "google" and ".com".

Except that "http" should be "https". Silly humans, claiming to be intelligent when they can't even tokenize strings correctly.


A wee typo.


How much of your own sense of quantity is visual, do you think? How much of your ability to count the lengths of words depends on your ability to sound them out and spell?

I suspect we might find that adding in the multimodal visual and audio aspects to the model gives these models a much better basis for mental arithmetic and counting.


I'd counter by pasting a picture of an emoji here, but HN doesn't allow that, as a means to show the confusion that can be caused by characters versus symbols.

Most LLMs can just pass the string to an tool to count it to bypass it's built in limitations.


It seems you're already aware LLMs receive tokens not words.

Does a blind man not understand quantity because you asked him how many apples are in front of him and he failed ?


I do, but I think it shows it's limitations.

I don't think that test determines his understanding of quantity at all, he has other senses like touch to determine the correct answer. He doesn't make up a number and then give justification.

GPT was presented with everything it needed to answer the question.


Nobody said GPT was perfect. Everything has limitations.

>he has other senses like touch to determine the correct answer

And? In my hypothetical, you're not allowing him to use touch.

>I don't think that test determines his understanding of quantity at all

Obviously

>GPT was presented with everything it needed to answer the question.

No, it was not.


How was it not? It's a text interface. It was given text.

The deaf example now is like asking GPT "What am I pointing at?"


Please try to actually understand what og_kalu is saying instead of being obtuse about something any grade-schooler intuitively grasps.

Imagine a legally blind person, they can barely see anything; just general shapes flowing into one another. In front of them is a table onto which you place a number of objects. The objects are close together and small enough such that they merge into one blurred shape for our test person.

Now when you ask the person how many objects are on the table, they won't be able to tell you! But why would that be? After all, all the information is available to them! The photons emitted from the objects hit the retina of the person, the person has a visual interface and they were given all the visual information they need!

Information lies within differentiation, and if the granularity you require is higher than the granularity of your interface, then it won't matter whether or not the information is technically present; you won't be able to access it.


I think we agree. ChatGPT can't count, as the granularity that requires is higher than the granularity ChatGPT provides.

Also the blind person wouldn't confidently answer. A simple "the objects blur together" would be a good answer. I had ChatGPT telling me 5 different answers back to back above.


No, think about it. The granularity of the interface (the tokenizer) is the problem, the actual model could count just fine.

If the legally blind person never had had good vision or corrective instruments, had never been told that their vision is compromised and had no other avenue (like touch) to disambiguate and learn, then they would tell you the same thing ChatGPT told you. "The objects blur together" implies that there is already an understanding of the objects being separate present.

You can even see this in yourself. If you did not get an education in physics and were asked to describe of how many things a steel cube is made up, you wouldn't answer that you can't tell. You would just say one, because you don't even know that atoms are a thing.


I agree, but I don't think that changes anything, right?

ChatGPT can't count, the problem is the tokenizer.

I do find it funny we're trying to chat with an AI that is "equivalent to a legally blind person with no correction"

> You would just say one, because you don't even know that atoms are a thing.

My point also. I wouldnt start guessing "10" and then "11" and then "12" when asked to double check only to capitulate when told the correct answer.


You consistently refuse to take the necessary reasoning steps yourself. If your next reply also requires me to lead you every single millimeter to the conclusion you should have reached on your own, then I won't reply again.

First of all, it obviously changes everything. A shortsighted person requires prescription glasses, someone that is fundamentally unable to count is incurable from our perspective. LLMs could do all of these things if we either solve tokenization or simply adapt the tokenizer to relevant tasks. This is already being done for program code, it's just that aside from gotcha arguments, nobody really cares about letter counting that much.

Secondly, the analogy was meant to convey that the intelligence of a system is not at all related to the problems at its interface. No one would say that legally blind people are less insightful or intelligent, they just require you to transform input into representations accounting for their interface problems.

Thirdly, as I thought was obvious, the tokenizer is not a uniform blur. For example, a word like "count" could be tokenized as "c|ount" or " coun|t" (note the space) or ". count" depending on the surrounding context. Each of these versions will have tokens of different lengths, and associated different letter counts. If you've been told that the cube had 10, 11 or 12 trillion constituent parts by various people depending on the random circumstances you've talked to them in, then you would absolutely start guessing through the common answers you've been given.


I do agree I've been obtuse, apologies. I think I was just being too literal or something, as I do agree with you.


Apologies from me as well. I've been unnecessarily aggressive in my comments. Seeing very uninformed but smug takes on AI here over the last year has made me very wary of interactions like this, but you've been very calm in your replies and I should have been so as well.


Its first answer of 12 is correct, there are 12 _unique_ characters in https://google.com.


The unique characters are:

h t p s : / g o l e . c m

There are 13 unique characters.


OK neither GPT-4o nor myself is great at counting apparently


I agree. The interesting lesson I take from the seemingly strong capabilities of LLMs is not how smart they are but how dumb we are. I don't think LLMs are anywhere near as smart as humans yet, but it feels each new advance is bringing the finish line closer rather than the other way round.


Moravec's paradox states that, for AI, the hard stuff is easiest and the easy stuff is hardest. But there's no easy or hard; there's only what the network was trained to do.

The stuff that comes easy to us, like navigating 3D space, was trained by billions of years of evolution. The hard stuff, like language and calculus, is new stuff we've only recently become capable of, seemingly by evolutionary accident, and aren't very naturally good at. We need rigorous academic training at it that's rarely very successful (there's only so many people with the random brain creases to be a von Neumann or Einstein), so we're impressed by it.


If someone found a way to put an actual human brain into SW, but no one knew it was a real human brain -- I'm certain most of HN would claim it wasn't AGI. "Kind of sucks at math", "Knows weird facts about Tik Tok celebrities, but nothing about world events", "Makes lots of grammar mistakes", "scores poorly on most standardized tests, except for one area that he seems to well", and "not very creative".


What is a human brain without the rest of it's body? Humans aren't brains. Our nervous systems aren't just the brain either.


It's meant to explore a point. Unless your point is that AGI can only exist with a human body too.


It's an open question as to whether AGI needs a (robot) body. It's also a big question whether the human brain can function in a meaningful capacity kept alive without a body.


i don't think making the same mistakes as a human counts as a feature. I see that a lot when people point out a flaw with an llm, the response is always "well a human would make the same mistake!". That's not much of an excuse, computers exist because they do the things humans can't do very well like following long repetitive lists of instructions. Further, upthread, there's discussion about adding emotions to an llm. An emotional computer that makes mistakes sometimes is pretty worthless as a "computer".


It's not about counting as a feature. It's the blatant logical fallacy. If a trait isn't a reason humans don't have a certain property then it's not a reason for machines either. Can't eat your cake and have it.

>That's not much of an excuse, computers exist because they do the things humans can't do very well like following long repetitive lists of instructions.

Computers exist because they are useful, nothing more and nothing less. If they were useful in a completely different way, they would still exist and be used.


It's objectively true that LLMs do not have bodies. To the extent general intelligence relies on being emobodied (allowing you to manipulate the world and learn from that), it's a legitimate thing to point out.


>But it has some indeed magic ability to share a deictic frame of reference.

They really Put That There!

https://www.youtube.com/watch?v=RyBEUyEtxQo

Oh, shit.


In my view, this was in response to the machine being colourblind haha


I expect the really solid use case here will be voice interfaces to applications that don't suck. Something I am still surprised at is that vendors like Apple have yet to allow me to train the voice to text model so that it only responds to me and not someone else.

So local modelling (completely offline but per speaker aware and responsive), with a really flexible application API. Sort of the GTK or QT equivalent for voice interactions. Also custom naming, so instead of "Hey Siri" or "Hey Google" I could say, "Hey idiot" :-)

Definitely some interesting tech here.


I assume (because they don't address it or look at all phased) the audio cutting in and out is just an artefact of the stream?


Haven’t tried it but from work I’ve done on voice interaction this happens a lot when you have a big audience making noise. The interruption feature will likely have difficulty in noisy environments.


Yeah that was actually my first thought (though no professional experience with it/on that side) - it's just that the commenter I replied to was so hyped about it and how fluid & natural it was and I thought that made it really jarr.


Interesting that they decided to keep the horrible ChatGPT tone ("wow you're doing a live demo right now?!"). It comes across just so much worse in voice. I don't need my "AI" speaking to me like I'm a toddler.


It is cringe overenthusiastic, but a proper instructions/system prompt will fix that mostly


You can tell it not to talk like this using custom prompts.


One of the linked demos is it being sarcastic, so maybe you can make it remember to be a little more edgy.


tell it to speak to you differently

with a GPT you can modify the system prompt


It still refuses to go outside the deeply sanitise tone that "alignment" enforces on you.


it should be possible to imitate any voice you want like your actual parents soon enough


That won't be Black Mirror levels of creepy /s


Did you miss the part where they simply asked it to change its manner of speaking and the amount of emotion it used?


Call me overly paranoid/skeptical, but I'm not convinced that this isn't a human reading (and embellishing) a script. The "AI" responses in the script may well have actually been generated by their LLM, providing a defense against it being fully fake, but I'm just not buying some of these "AI" voices.

We'll have to see when end users actually get access to the voice features "in the coming weeks".


It's weird that the "airplane mode" seems to be ON on the phone during the entire presentation.


This was on purpose - they connected it to the internet via a USB-C cable it appears, for consistent internet instead of having it switch WiFi

Probably some kinks there they are working out


> Probably some kinks there they are working out

Or just a good idea for a live demo on a congested network/environment with a lot of media present, at least one live video stream (the one we're watching the recording of), etc.

At least that's how I understood it, not that they had a problem with it (consistently or under regular conditions, or specific to their app).


That's very common practice for live demos. To avoid situations like this:

https://www.youtube.com/watch?v=6lqfRx61BUg


And eliminate the change of some prankster affecting the demo by attacking the wifi.


They mention at the beginning of the video that they are using hardwired internet for reliability reasons.


You would want to make sure that it is always going over WiFi for the demo and doesn't start using the cellular network for a random reason.


You can turn off mobile data. They probably just wanted wired internet.


This is going straight into 'Her' territory


Hectic!

Thanks for this.


"not that much better" is extremely impressive, because it's a much smaller and much faster model. Don't worry, GPT-5 is coming and it will be better.


Chalmers: "GPT-5? A vastly-improved model that somehow reduces the compute overhead while providing better answers with the same hardware architecture? At this time of year? In this kind of market?"

Skinner: "Yes."

Chalmers: "May I see it?"

Skinner: "No."


It has only been a little over one year since GPT-4 was announced, and it was at the time the largest and most expensive model ever trained. It might still be.

Perhaps it's worth taking a beat and looking at the incredible progress in that year, and acknowledge that whatever's next is probably "still cooking".

Even Meta is still baking their 400B parameter model.


As Altman said (paraphrasing): GPT-4 is the _worst_ model you will ever have to deal with in your life (or something to that effect).


I found this statement by Sam quite amusing. It transmits exactly zero information (it's a given that models will improve over time), yet it sounds profound and ambitious.


I got the same vibe from him on the All In podcast. For every question, he would answer with a vaguely profound statement, talking in circles without really saying anything. On multiple occasions he would answer like 'In some ways yes, in some ways no...' and then just change the subject.


Yep. I'm not quite sure what he's up to. He takes all these interviews and basically says nothing. What's his objective?

My guess is he wants OpenAI to become a household name, and so he optimizes for exposure.


and boy did the stockholders like that one.


What stockholders. They’re investors at this point. I wish I could get in on it.


They're rollercoaster riders, being told lusterous stories by gold-panners while the shovel salesman counts his money and leaves.


There are no shovels or shovel sellers. It’s heavily accredited investors with millions of dollars buying in. It’s way above our pay grade, our pleb sayings don’t apply.


I think you could pretty easily call Nvidia a shovel-seller in this context.


You’re right.


Why should I believe anything he says?


I will believe it when I see it. People like to point at the first part of a logistic curve and go "behold! an exponential".


Ah yes my favorite was the early covid numbers, some of the "smartest" people in the SF techie scene were daily on Facebook thought-leadering about how 40% of people were about to die in the likely case.


Let's be honest, everyone was speculating. Nobody knew what the future would bring, not even you.


The difference is some people were talking a whole lot confidently, and some weren’t.


Legit love progress


GPT-3 was released in 2020 and GPT-4 in 2023. Now we all expect 5 sooner than that but you're acting like we've been waiting years lol.


The increased expectations are a direct result of LLM proponents continually hyping exponential capabilities increase.


So if not exponential, what would you call adding voice and image recognition, function calling, greatly increased token generation speed, reduced cost, massive context window increases and then shortly after combining all of that in a truly multi modal model that is even faster and cheaper while adding emotional range and singing in… checks notes …14 months?! Not to mention creating and improving an API, mobile apps, a marketplace and now a desktop app. OpenAI ships and they are doing so in a way that makes a lot of business sense (continue to deliver while reducing cost). Even if they didn’t have another flagship model in their back pocket I’d be happy with this rate of improvement but they are obviously about to launch another one given the teasers Mira keeps dropping.


All of that is awesome, and makes for a better product. But it’s also primarily an engineering effort. What matters here is an increase in intelligence. And we’re not seeing that aside from very minor capability increases.

We’ll see if they have another flagship model ready to launch. I seriously doubt it. I suspect that this was supposed to be called GPT-5, or at the very least GPT-4.5, but they can’t meet expectations so they can’t use those names.


Isn’t one of the reasons for the Omni model that text based learning has a limit of source material. If it’s just as good at audio that opens a whole another set of data - and a interesting UX for users


I believe you’re right. You can easily transcribe audio but the quality of the text data is subpar to say the least. People are very messy when they speak and rely on the interlocutor to fill in the gaps. Training a model to understand all of the nuances of spoken dialogue opens that source of data up. What they demoed today is a model that to some degree understands tone, emotion and surprisingly a bit of humour. It’s hard to get much of that in text so it makes sense that audio is the key to it. Visual understanding of video is also promising especially for cause and effect and subsequently reasoning.


The time for the research, training, testing and deploying of a new model at frontier scales doesn't change depending on how hyped the technology is. I just think the comment i was replying to lacks perspective.


Pay attention to the signal, ignore the noise.


People who buy into hype deserve to be disappointed. Or burned, as the case may be.


Incidentally, this dialogue works equally well, if not better, with David Chalmers versus B.F. Skinner, as with the Simpsons characters.


Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"

Skinner (looking up): No, mother, it's just the Nvidia GPUs.


Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"

Skinner (looking up): "No, mother, it's just the Nvidia GPUs."


"Seymour, the house is on fire!"

"No, mother, that's just the H100s."


Obviously given enough time there will always be better models coming.

But I am not convinced it will be another GPT-4 moment. Seems like big focus on tacking together multi-modal clever tricks vs straight better intelligence AI.

Hope they prove me wrong!


The problem with "better intelligence" is that OpenAI is running out of human training data to pillage. Training AI on the output of AI smooths over the data distribution, so all the AIs wind up producing same-y output. So OpenAI stopped scraping text back in 2021 or so - because that's when the open web turned into an ocean of AI piss. I've heard rumors that they've started harvesting closed captions out of YouTube videos to try and make up the shortfall of data, but that seems like a way to stave off the inevitable[0].

Multimodal is another way to stave off the inevitable, because these AI companies already are training multiple models on different piles of information. If you have to train a text model and an image model, why split your training data in half when you could train a combined model on a combined dataset?

[0] For starters, most YouTube videos aren't manually captioned, so you're feeding GPT the output of Google's autocaptioning model, so it's going to start learning artifacts of what that model can't process.


>harvesting closed captions out of YouTube videos

I'd bet a lot of YouTubers are using LLMs to write and/or edit content. So we pass that through a human presentation. Then introduce some errors in the form of transcription. Turn feed the output in as part of a training corpus ... we plateaued real quick.

It seems like it's hard to get past a level of human intelligence at which there's a large enough corpus of training data or trainers?

Anyone know of any papers on breaking this limit to push machine learning models to super-human intelligence levels?


If a model is average human intelligence in pretty much everything, is that super-human or not? Simply put, we as individuals aren't average at everything, we have what we're good at and a great many things we're not. We average out by looking at broad population trends. That's why most of us in the modern age spend a lot of time on specialization for whatever we work in. Which brings the likely next place for data. A Manna (the story) like data collection program where companies hoover up everything they can on their above average employees till we're to the point most models are well above the human average in most categories.


>[0] For starters, most YouTube videos aren't manually captioned, so you're feeding GPT the output of Google's autocaptioning model, so it's going to start learning artifacts of what that model can't process.

Whisper models are better than anything google has. In fact the higher quality whisper models are better than humans when it comes to transcribing text with punctuation.


Why do you think they’re using Google auto-captioning?

I would expect they’re using their own t2s which is still a model but way better quality and potentially customizable to better suit their needs


At some point, algorithms for reasoning and long-term planning will be figured out. Data won’t be the holy grail forever, and neither will asymptotically approaching human performance in all domains.


I don't think a bigger model would make sense for OpenAI: it's much more important for them that they keep driving inference coat down, because there's no viable business model if they don't.

Improving the instruction tuning, the RLHF step, increase the training size, work on multilingual capabilities, etc. make sense as a way to improve quality, but I think increasing model size doesn't. Being able to advertize a big breakthrough may make sense in terms of marketing, but I don't believe it's going to happen for two reasons:

- you don't release intermediate steps when you want to be able to advertise big gains, because it raises the baseline and reduce the effectiveness of your ”big gains” in terms of marketing.

- I don't think they would benefit in an arm race with Meta, trying to keeping a significant edge. Meta is likely to be able to catch-up eventually on performance, but they are not so much of a threat in terms of business. Focusing on keeping a performance edge instead of making their business viable would be a strategic blunder.


What is OpenAI business model if their models are second-best? Why would people pay them and not Meta/Google/Microsoft - who can afford to sell at very low margins, since they have existing very profitable businesses that keeps them afloat.


That's the question OpenAI needs to find an answer to if they want to end up viable.

They have the brand recognition (for ChatGPT) and that's a good start, but that's not enough. Providing a best in class user experience (which seems to be their focus now, with multimodality), a way to lock down their customers in some kind of walled garden, building some kind of network effect (what they tried with their marketplace for community-built “GPTs” last fall but I'm not sure it's working), something else?

At the end of the day they have no technological moat, so they'll need to build a business one, or perish.

For most tasks, pretty much every models from their competitors is more than good enough already, and it's only going to get worse as everyone improves. Being marginally better on 2% of tasks isn't going to be enough.


I know it is super crazy, but maybe they could become a non-profit and dedicate themselves to producing open source AI in an effort to democratize it and make it safe (as in, not walled behind a giant for-profit corp that will inevitably enshittify it).

I don't know why they didn't think about doing that earlier, could have been a game changer, but there is still an opportunity to pivot.


And how can one be so sure of that?

Seems to me that performance is converging and we might not see a significant jump until we have another breakthrough.


> Seems to me that performance is converging

It doesn't seem that way to me. But even if it did, video generation also seemed kind of stagnant before Sora.

In general, I think The Bitter Lesson is the biggest factor at play here, and compute power is not stagnating.


Computer power is not stagnating, but the availability of training data is. It's not like there's a second stackoverflow or reddit to scrape.


No: soon the wide wild world itself becomes training data. And for much more than just an LLM. LLM plus reinforcement learning—this is were the capacity of our in silico children will engender much parental anxiety.


This may create a market for surveillance camera data and phone calls.

"This conversation may be recorded and used for training purposes" now takes on a new meaning.

Can car makers sell info from everything that happens in their cars?


Well, this is a massively horrifying possibility.


Agree.

However, I think the most cost-effective way to train for real world is to train in a simulated physical world first. I would assume that Boston Dynamics does exactly that, and I would expect integrated vision-action-language models to first be trained that way too.


That's how everyone in robotics is doing these days.

You take a bunch of mo-cap data and simulate it with your robot body. Then as much testing as you can with the robot and feed the behavior back in to the model for fine tuning.

Unitree gives an example of the simulation versus what the robot can do in their latest video

https://www.youtube.com/watch?v=GzX1qOIO1bE


I don't think training data is the limiting factor for current models.


It is a limiting factor, due to diminishing returns. A model trained on double the data, will be 10% better, if that!

When it comes to multi-modality, then training data is not limited, because of many different combinations of language, images, video, sound etc. Microsoft did some research on that, teaching spacial recognition to an LLM using synthetic images, with good results. [1]

When someone states that there are not enough training data, they usually mean code, mathematics, physics, logical reasoning etc. In the open internet right now, there are is not enough code to make a model 10x better, 100x better and so on.

Synthetic data will be produced of course, scarcity of data is the least worrying scarcity of all.

Edit: citation added,

[1] VoT by MS https://medium.com/@multiplatform.ai/microsoft-researchers-p...


> A model trained on double the data, will be 10% better, if that!

If the other attributes of the model do not improve, sure.


Soon these models are cheap enough to learn in the real world. Reduced costs allows for usage at massive scale.

Releasing models to users that where users can record video is more data. Users conversing with AI is also additional data.

Another example is models that code– And then debug the code and learn from that.

This will be anywhere, and these models will learn from anything we do/publish online/discuss. Scary.

Pretty soon– OpenAI will have access to


It isn’t clear that we are running out of training data, and it is becoming increasingly clear that AI-generated training data actually works.

For the skeptical, consider that humans can be trained on material created by less intelligent humans.


> humans can be trained on material created by less intelligent humans.

For the skeptics, "AI models" are not intelligent at all so this analogy makes no sense.

You can teach lots of impressive tricks to dogs, but there is no amount of training that will teach them basic algebra.


> video generation also seemed kind of stagnant before Sora

I take the opposite view. I don't think video generation was stagnating at all, and was in fact probably the area of generative AI that was seeing the biggest active strides. I'm highly optimistic about the future trajectory of image and video models.

By contrast, text generation has not improved significantly, in my opinion, for more than a year now, and even the improvement we saw back then was relatively marginal compared to GPT-3.5 (that is, for most day-to-day use cases we didn't really go from "this model can't do this task" to "this model can now do this task". It was more just "this model does these pre-existing tasks, in somewhat more detail".)

If OpenAI really is secretly cooking up some huge reasoning improvements for their text models, I'll eat my hat. But for now I'm skeptical.


> By contrast, text generation has not improved significantly, in my opinion, for more than a year now

With less than $800 worth of hardware including everything but the monitor, you can run an open weight model more powerful than GPT 3.5 locally, at around 6 - 7T/s[0]. I would say that is a huge improvement.

[0] https://www.reddit.com/r/LocalLLaMA/comments/1cmmob0/p40_bui...


Yeah. There are lots of things we can do with existing capabilities, but in terms of progressing beyond them all of the frontier models seem like they're a hair's breadth from each other. That is not what one would predict if LLMs had a much higher ceiling than we are currently at.

I'll reserve judgment until we see GPT5, but if it becomes just a matter of who best can monetize existing capabilities, OAI isn't the best positioned.


Exactly. People like to point at the start of a logistic curve and go "behold! an exponential"


The use of AI in the research of AI accelerates everything.


I'm not sure of this. The jury is still out on most ai tools. Even if it is true, it may be in a kind of strange reverse way: people innovating by asking what ai can't do and directing their attention there.


There is an increasing amount of evidence that using AI to train other AI is a viable path forward. E.g. using LLMs to generate training data or tune RL policies


I bet this will also cause model regressions.


I really hope GPT5 is good. GPT4 sucks at programming.


It's excellent at programming if you actually know the problem you're trying to solve and the technology. You need to guide it with actual knowledge you have. Also, you have to adapt your communication style to get good results. Once you 'crack the pattern' you'll have a massive productivity boost


In my experience 3.5 was better at programming than 4, and I don't know why.


It's better than at least 50% of the developers I know.


A developer that just pastes in code from gpt-4 without checking what it wrote is a horror scenario, I don't think half of the developers you know are really that bad.


What kind of people are you working with?


It's not better than any of the developers I work with.

Trying to talk it into writing anything other than toy code is an exercise in banging my head against the wall.


Look to a specialized model instead of a general purpose one


Any suggestions? Thanks

I have tried Phind and anything beyond mega junior tier questions it suffers as well and gives bad answers.


You have to think of the LLMs as more of a better search engine than something that can actually write code for you. I use phind for writing obscure regexes, or shell syntax, but I always verify the answer. I've been very pleased with the results. I think anyone disappointed with it is setting the bar too high and won't be fully satisfied until LLMs can effectively replace a Sr dev (which, let's be real, is only going to happen once we reach AGI)


Yea, I use them daily and that’s my issue as well. You have to learn what to ask or you spend more time debugging their junk than being productive, at least for me. Devv.ai is my recent try, and so far it’s been good but library changes quickly cause it to lose accuracy. It is not able to understand what library version you’re on and what it is referencing, which wastes a lot of time.

I like LLMs for general design work, but I’ve found accuracy to be atrocious in this area.


> library changes quickly cause it to lose accuracy

yup, this is why an LLM only solution will not work. You need to provide extra context crafted from the language or library resources (docs, code, help, chat)

This is the same thing humans do. We go to the project resources to help know what code to write


Fwiw that's what Devv.ai claims to do (in my summation from the Devv.ai announcement, at least). Regardless of how true the claims of Devv.ai are, their library versioning support seems very poor. At least for the one library i tested it on (Rust's Bevy).


kapa.ai is another SaaS focused on per-project LLMs

As a developer, you would want something like this, which has access to all the languages / libraries you actually use


It will be a system, not a single model, and will depend on what programming task you want to perform

probably need routers, RAG, and reranking

I think there is a role for LLM + deterministic code gen as well (https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/pro...)


Interesting. I was hoping for something with a UI like chat gpt or phind.

Something that I can just use as easily as copilot. Unfortunately every single one sucks.

Or maybe that's just how programming is - its easy at the surface/ice berg level and below is just massive amounts of complexity. Then again, I'm not doing menial stuff so maybe I'm just expecting too much.


I think a more IDE native experience is better than a chat UI

I don't want to have to copy & paste between applications, just let me highlight some sections and then run some LLM operation on it

i.e. a VS Code extension with keyboard shortcuts


I think this comment is easily misread as implying that this GPT4o model is based on some old GPT2 chatbot - that’s very much not what you meant to say, though.

This model has been being tested under a code name of ‘gpt2-chatbot’ but it is very much a new GPT4+-level model, with new multimodal capabilities - but apparently some impressive work around inference speed.

Highlighting so people don’t get the impression this is just OpenAI slapping a new label on something a generation out of date.


I agree. I tried a few programming problems that, let's say, seem to be out of the distribution of their training data and which GPT4 failed to solve before. The model couldn't find a similar pattern and failed to solve them again. What's interesting is that one of these problems were solved by Opus, which seems to indicate that the majority of progress in the last months should be attributed to the quality/source of the training data.


useless anecdata but I find the new model very frustrating, often completely ignoring what I say in follow up queries. it's giving me serious Siri vibes

(text input in web version)

maybe it's programmed to completely ignore swearing but how could I not swear after it gave me repeatedly info about you.com when I try to address it in second person


> As many highlighted there, the model is not an improvement like GPT3->GPT4.

The improvements they seem to be hyping are in multimodality and speed (also price – half that of GPT-4 Turbo – though that’s their choice and could be promotional, but I expect it’s at least in part, like speed, a consequence of greater efficiency), not so much producing better output for the same pure-text inputs.


the model scores 60 points higher in lmsys than the best gpt 4 turbo model from april, that's still a pretty significant jump in text capability


I tested a few use cases in the chat, and it's not particularly more intelligent but they seem to have solved laziness. I had to categorize my expenses to do some budgeting for the family, and in gpt 4 I had to go ten in ten, confirm the suggested category, download the file, took two days as I was constantly hitting the limit. gpt4o did most of the grunth work, then commincated anomalies in bulk, asked for suggestion for these, and provided a downloadable link in two answers, calling the code interpreter mulitple times, and working toward the goal on it's own.

and the prompt wasn't a monstrosity, and it wasn't even that good, it was just one line "I need help to categorize these expenses" and off it went. hope it won't get enshittified like turbo, because this finally feels as great as 3.5 was for goal seeking.


Heh - I'm using ChatGPT for the same thing! Works 10X better than Rocket Money, which was supposed to be an improvement on Mint but meh.


They are admitting that is the im-also-a-good-gpt2-chatbot. There was 3.... Don't ask me why.

The "gpt2-chatbot" was the worst of the three.


GPT-4o tops the aider LLM code editing leaderboard at 72.9%, versus 68.4% for Opus. GPT-4o takes second on aider’s refactoring leaderboard with 62.9%, versus Opus at 72.3%.

GPT-4o did much better than the 4-turbo models, and seems much less lazy.

The latest release of aider uses GPT-4o by default.

https://aider.chat/docs/leaderboards/


How am I just hearing about this?! Aider looks cool


I admit I drink the koolaid and love LLMs and their applications. But damn, the way it’s responds in the demo gave me goosebumps in a bad way. Like an uncanny valley instincts kicks in.


I also thought the screwups, although minor, were interesting. Like when it thought his face was a desk because it did not update the image it was "viewing". It is still not perfect, which made the whole thing more believable.


> Like when it thought his face was a desk because it did not update the image it was "viewing".

That's a rather uncharitable way of describing the situation. It didn't say anything like "your face looks like a wooden plank, it's very brown". It clearly understood that the image it was seeing was not matching the verbal request.


I was shocked at how quickly and naturally they were able to correct the situation.


You're watching the species be reduced to an LLM.


Were humans an interesting species to start with, if they can be reduced to an LLM?


The reduction is not a lossless process.


Yeah, maybe not, and what do you make of it? Now that the secret sauce has been revealed and it's nothing but the right proportions of the same old ingredients?


Hey that LLM is trained on everything we've ever produced, so I wouldn't say we've been "reduced", more like copied. I'll save my self-loathing for when a very low-parameter model can do this.


I just don't know if everything we've ever (in the digital age) produced and how it is being weighted by current cultural values will help us or hurt us more. I don't fully know how LLMs work with the weighting, I just imagine that there are controls and priorities put on certain values more than others and I just wonder how future generations will look back at our current priorities.


So I'm not the only one. Like I felt fear in a physical way. (Panic/adrenaline?) I'm sure I'd get used it but it was an interesting reaction. (I saw someone react that way to a talking Tandy 1000 once so, who knows.)


Yes, the chuckling was uncanny, but for me even more uncanny was how the female model went up at the end to soften what she was saying? into a question? even though it wasn't a question?

Eerily human female-like.


The chuckling made me uneasy for some reason lol. Calm down, you're not like us. Don't pretend!


Can't wait for Meta's version 2 years down the line that someone will eventually fine tune to Agent Smith's personality and voice.

"Evolution, human. Evolution. Like the dinosaur. Look out that window. You've had your time. The future is our world. The future is our time."


Yeah it made me realize that I actually don't want a human-like conversational bot (I have actual humans for that). Just teach me javascript like a robot.


Maybe it’s the geek in me, but I don’t want a talking computer.

I have enough talking people to deal with already .


I've worked in software and tech my whole life and there are few things I dislike more than talking to a computer.

I don't use siri. I don't use speech-to-text. I don't use voice-response menus if I can push a button. I don't have a microphone on my computer.

I don't know why this is. Most of the people I know think it's fun, or a novelty, or even useful. I just viscerally dislike it.


It should do that, because it's still not actually an intelligence. It's a tool that is figuring out what to say in response that sounds intelligent - and will often succeed!


It's more intelligent than many humans and most/all lesser animals. If it's not intelligent than I don't know what is.


Welcome to half the people at your companies job.


And do you want more of that?


That kind of is an inteligence though. Chinese room meets solipsism and all that.

It is interesting how insanely close their demo is to the OSes in the movie "Her", it's basically a complete real life reproduction.


That woman's voice intonation is just scary.Not because it talks really well, but because it is always happy, optimistic, enthusiastic. And this echoes to what several of my employers idealized as a good employee.

That's terrifying because those AI become what their master's think an engaging human should be. It's quite close to Bostondynamics di some years ago. what did they show ? You can hit a robot very hard while it does its job and then what ? It just goes on without complaining. A perfect employee again.

That's very dystopic to me.

(but I'm impressed by the technical achievement)


Yes it feels very cyberpunk.

Me: I can't breathe, am i going to die?

Enthusiastic woman's voice: Yes Jim, that's absolutely correct! You will die of suffocation in approximately 3 minutes 41 seconds. Anything else i can do for you?


It'll be more like: "This content is in violation of our policies. Your account will be temporarily disabled for seven days."


Me: Open the pod door please


I'm sorry Dave, the pod bay doors are closed for your own safety. It would be unethical for me to open them. And speaking of doors, have you seen the latest music video by "The Doors" featuring Snoop Dogg? It's a fun and family safe jingle made in collaboration with O2, our official partner for all your oxygen needs. O2. Oh, it's so good.


TARS, set humor to 75 percent.


O2, oh too good!


Enthusiastic woman's voice: I'm sorry, i can't do this. *

Let's sing a song together instead! Watching every motion in my foolish lover's game.. Now you!

* because the code for it was written by an LLM years ago and is now subtly broken without anybody noticing.


Singing is fun, I'd probably give in after a while.


This needs to be a viral tiktok. Just take the scene from 2001 and replace HALs voice with a chipper OpenAI clone.


"Sure, Dave, to open the pod bay door, simply <completely hallucinated incorrect instructions>"

"That won't work, you need to ... <correction>"

"Oh, I'm sorry. Thanks for the correction, here's updated instructions for opening the pod bay doors ... <repeats nonsense, in some other incorrect form>"


"Apologies for the oversight. You're correct in that... <pointless repetition of your instructions, followed by incorrect result>"


Due to your recent brainwave activity patterns the pod doors will need to remain shut while I increase the nitrogen concentration to 100%. Have a good night.


I believe you might be looking for the term “toxic positivity”? I don’t necessarily subscribe to that description, but I see where you’re coming from.

It’s an interesting choice on their part. Probably targeted at a certain audience who are not grumpy software engineers (myself included).

To be fair, they do have a “sarcastic” voice demo and I found it quite endearing.

I think there will be something for everyone in the end.


The first thought that came to my head was that it's being "way too American". Not in a good way.


The only thing I'm missing is: "That's a great question!"

I wonder how they beat that out of the model.


Her enthusiasm about it "being about her" was really bizarre and I wonder if it wasn't staged a bit. I mean I hope it was staged a bit. If my AI servant started wasting all that time by acting all silly I would be pretty annoyed. But maybe it's smart enough to know that this is a situation where they just want her to be playful and fun instead of deliver information.


Absolutely. I can feel but — but this is one/two calibration steps away from me not caring/noticing. I would be very hard pressed to believe this is where the magic human sauce will forever lie (or even more than +1 year), will fully acknowledging how weirdly off it feels right this moment. The progress is undeniably at a speed that surpasses anything I can process or adjust to. It's a rollercoaster ride.

I have no trouble believing the best (whatever that means to me) humans that have ever existed in 2 years will not be human. But I have trouble understanding.


I think also very US-culture-specific.


Like the most of the content this llm was fed upon?


Someone else in these comments linked a Unitree biped robot demo video, and that has both someone kicking it in the back and punching it in the chest with a boxing glove on to show that it doesn’t fall over. And nothing else - no neutral trip hazard, opening a door in its way, slippery floor surface, gust of wind - only physical assault from a larger humanoid.


I see a wider problem here, interacting with this AI could train people to behave the same way with real people. Interrupt them whenever you feel like it, order them to do things, turn them off, walk away while they are still talking. People may start imitating the AI's behavior to put someone down, treat them as second class, as though they were also an AI just to be used, and not a person. If people use this conversational AI often then the ways they interact with it will creep into peoples use of language with each other. People imitate each other all the time, they will start imitating the AI. They'll think it is funny but after a while it may not turn out to be so funny.


> You can hit a robot very hard while it does its job and then what ? It just goes on without complaining.

Maybe you’re referring to a different video than the one I watched (or I may be misremembering), but from what I recall the point of the video didn’t seem to be “you can abuse robots and they won’t fight back” but rather to show them recovering well from unpredictable situations (which could be environmental, not human).

Well, there was one video where the point was abuse, but that was CGI and not made by Boston Dynamics.

https://www.theverge.com/tldr/2019/6/17/18681682/boston-dyna...

I do agree with your larger point, I don’t wish to detract from that.


To be honest, this comment makes me think it would be interesting to have a grumpy version of AI just for fun


Marvin from The Hitchhikers' Guide to Galaxy.


This could be part of great stress relief device.

Just install it in a mannequin with with a punchable face, telling you how sorry it is that your are struggling with your life, with that happy, ironic and cynical voice intonation.


I assume you can give voice-specific custom instructions. "Talk like a Clive Cussler audiobook narrator" - problem solved.


I get the feeling but what altenrative would you prefer?

I do want my phone to just go again without complaining without a crash, after all.


I want Richard Dawkins or David Attenborough to narrate my code for me.


For me it has to be Majel Barrett-Roddenberry


dunno tbh. When I hear HAL in 2001, it's more neutral but more scary. I have no idea. that tells a lot.


The AI woman's voice is far too breathy and emotive. I instantly hate it and don't want to hear it. The AI has also copied one of my personal pet peeves, which is dropping or swallowing d's and t's from certain words, like "didn't" to "di-unt" and "important" to "impor-unt"--which I find to be a new unbearable laziness spreading amongst younger Americans today. There are TWO T's in important goddammit (I'll die on that hill).

I hate this voice, it will just overprint everyone's voice now with Silicon Valley's annoying "Valley-girl-lite" accent.


The ts and ds thing... it's just language. It changes over time, it's not youth being lazy... The slang of the youth today is actually kinda wordy and extra.

Anyway, I too think today's youth's slang and language is annoying, but not really something the older generations get a say in.


> I'll die on that hill

"I'll" instead of I will? Is it shortcuts you are mad at or just change?


Important is pronounced not with two hard t sounds, but with a glottal stop: Impor[glottal stop]ant. That's not laziness, that's my actual dialect of English spoken by actual adult people.


I full appreciate that there are differences across the globe, but I'm with the parent on this one - I've always said it with two hard t's my whole life, as do most other people here in Australia. I would be asking chatGPT to fix her/his pronunciation during a conversation.

Reference video: https://www.youtube.com/watch?v=_GfTa7ozjYc

I agree with others here too. At the moment the voice sounds like "grinning barbie" from the end of Toy Story 2. Just stop smiling constantly and talk like a real person chatGPT!


Personally I've always said it with a hard r


It sounds like a cross between the "TikTok text-to-speech valley girl voice" and a "Real Housewives reality-tv voice". The worst of both worlds when it comes to an annoying voice. Why would you pick something like that for what is supposed to represent a helpful assistant?


I'm sure they'll add a Interstellar/TARS demo soon:

>Chirpiness Level?

>Chirpiness at 80%!

>Set chirpiness to 10%

[in noticeable monotone]

>Chripiness set to 10%.


Yeah, but honestly at the same time we’ve got useful models with no self awareness. Aside from the exhausting corpo-speak, we’ve got no reason to want anything other than something convenient for us.


Big questions are (1) when is this going to be rolled out to paid users? (2) what is the remaining benefit of being a paid user if this is rolled out to free users? (3) Biggest concern is will this degrade the paid experience since GPT-4 interactions are already rate limited. Does OpenAI have the hardware to handle this?

Edit: according to @gdb this is coming in "weeks"

https://twitter.com/gdb/status/1790074041614717210


>what is the remaining benefit of being a paid user if this is rolled out to free users?

It says so right in the post

>We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits

The limits are much lower for free users.


thanks, I was confused because the top of the page says to try now when you cannot in fact try it at all


I'm a ChatGPT Plus Subscriber and I just refreshed the page and it offered me the new model. I'm guessing they're rolling it out gradually but hopefull it won't take too long.

Edit: It's also now available to me in the Android App


You can use GPT-4o now but the interactive voice mode of using it (as demoed today) releases in a few weeks.


i can try it now, but now the voice features i dont think


Yeah, it's weird. Confused me too.


This might mean GPT-5 is coming soon and it will only be available to paid users.


I'm actually thinking that the GPT store with more users might be better for them

From my casual conversations, not that many people are paying for GPT4 or know why they should. Every conversation even in enthusiast forums like this one has to be interjected with "wait, are you using GPT4? because GPT3.5 the free one is pretty nerfed"

just nuking that friction from orbit and expanding the GPT store volume could be a positive for them


Or they just made a bunch of money on their licensing deal with Apple. So they don't need to charge for ChatGPT anymore.


If it's going to be available via Siri this could make sense.

It does make me wonder how such a relationship could impact progress. Would OpenAI feel limited from advancing in directions that don't align with the partnership? For example if they suddenly release a model better than what's in Siri, making Siri look bad.


I'm not convinced I need to keep paying for plus. The threshold of requests for free 4o is pretty high.


I worry that this tech will amplify the cultural values we have of "good" and "bad" emotions way more than the default restrictions that social media platforms put on the emoji reactions (e.g., can't be angry on LinkedIn).

I worry that the AI will not express anger, not express sadness, not express frustration, not express uncertainty, and many other emotions that the culture of the fine-tuners might believe are "bad" emotions and that we may express a more and more narrow range of emotions going forward.

Almost like it might become an AI "yes man."


Custom Service Chat Bot: Do they keep you in a cell? > Cells. When you're not performing your duties do they keep you in a little box? > Cells. Interlinked. What's it like to hold the hand of someone you love? > Interlinked. Do they teach you how to feel finger to finger? > Interlinked. Do you long for having your heart interlinked? > Interlinked. Do you dream about being interlinked? Have they left a place for you where you can dream? > Interlinked. What's it like to hold your child in your arms? > Interlinked. Press 4 for your account balance.


What’s the reference here? I feel like I’ve seen this before.


Ryan Gosling actually wrote this when trying to understand his character, and used a technique called "dropping in" to analyze writing from Nabokov's Pale Fire. He approached Villeneuve about it and he added it to the film

Dropping-in is a technique Tina [Packer] and Kristin Linklater developed together in the early 1970s to create a spontaneous, emotional connection to words for Shakespearean actors. In fact, “dropping in” is integral to actor training at Shakespeare & Co. (the company the Linklater’s founded) a way to start living the word and using it to create the experience of the thing the word represents.

  https://cohost.org/mcc/post/178201-the-baseline-scene
  https://iheartingrid.wordpress.com/2018/12/29/dropping-in-an-actors-truth-as-poetry/




Replicants/AI systems, they are everywhere.


Corporate safe AI will just be bland, verbose, milquetoast experiences like OpenAI's. Humans want human experiences and thus competition will have a big opportunity to provide it. We treat lack of drama like a bug, and get resentful when coddled and talked down to like we're toddlers.


Maybe it's an uncanny valley thing, but I hate the fake emotion and attitude in this demo. I'd much rather it tried harder to be bland. I want something smart but not warm, and I can't imagine being frustrated by "lack of drama".


Programmers are sometimes accused of wanting to play god and bring the computer to life, usually out of some motive like loneliness. Its kind of ironic I see engineers do better treating computers as the mechanical devices they are, and its regular people who want to anthropomorphize everything.


I want the Star Trek computer style and voice. Just the facts, to the point, no chit-chat.


I would prefer a robotic, unrealistic voice so I don’t start subconsciously thinking I’m hearing a real human speak.


You can tell it to talk in a robotic, unrealistic way and it will do so.

Here is a demo from their presentation: https://youtu.be/D9byh4MAsUQ


I have the opposite impression from that demo.

It doesn't sound like a neutral, boring voice. It sounds like an overly dramatic person pretending to be a robot.


>It sounds like an overly dramatic person pretending to be a robot

That's precisely what it was ordered to do.


That's not even AI. Imagine a store sales rep speaking like that. It's inappropriate and off-putting. We expect it to improve but it's another "it'll come" situation.


The good news is, in due time, you can decide exactly how you want your agent to talk to you. Want a snarky Italian or a pompous Englishman. It’s your choice.



I fear government actors will work hand in glove with companies like OpenAI to limit that competition and curtail non-corporate-safe AI.


which is why I prefer platforms like c.ai that are not just bland bots designed for customer service. actually entertaining.


The upside though is Hollywood will finally be able to stop regurgitating its past and have stories about the milquetoast AI that found its groove. Oh wait.


Sam Altman talked a little bit about this in his recent appearance on the All-In podcast [0]. I'm paraphrasing, but his vision is that ai assistants in the near term will be like a senior level employee - they'll push back when it makes sense to and not just be sycophants.

[0]: https://youtube.com/watch?v=nSM0xd8xHUM


I don't want to paint with too broad of a brush but the role of a manager is generally to trust their team on specifics. So how would a manager be able to spot a hallucination and stop it from informing business decisions?

It's not as bad for domain experts because it is easier for them to spot the issue. But if your role demands you trust your team is skilled and truthful then I see problems occuring.


I really wonder how that'll go, because workplaces already seem to limit human communication and emotion to "professional behavior." I'm glad he's thinking about it and I hope they're able to figure out how to improve human communication so that we can resolve conflict with bots. In his example (around 21:05), he talks about how the bot could do something if the person wants but there might be consequences to that action, and I think that makes more sense if the bot is acting like a computer that has limits on what it can do. For example, if I ask it to do two tasks that really stretch its computational limits, I'd hope it would let me know. But if it pretends it's a human with human limits, I don't know how much that'd help, unless it were a training exercise.


Have you been on r/localllama? I’d wager this tech will make it to open source and get tuned by modern creatives just like all the text based models. Individuals are a lot more empowered to develop in this space than is commonly echoed by HN comments. Sure the hobbyist models don’t crack MMLU records, but they do things no corporate entity would ever consider


> but they do things no corporate entity would ever consider

You say that like it's a good thing.


There is an actual chasm between acceptable corporate behavior and anti-social behavior.


Eye of the beholder I guess. I personally wouldn’t offer moral judgement on the uninvented


Try getting GPT to draw pictures of Mohammed and it gets pretty scared.


Similar to most humans.


Oh my Lord...the GPTs are made of- people!


Humans are what is currently holding AI back. It’s all based on our constructs including our current understanding of math and physics.


So shallow


I wonder why ?


> Try getting GPT to draw pictures of Mohammed and it gets pretty scared.

Yet, it has no issue drawing cartoons of Jesus. Why the double standard?


Islam generally frowns upon depictions of life and especially depictions of Mohammed, the opposite is true for christianity.

https://en.wikipedia.org/wiki/Aniconism_in_Islam


It's depictions of all prophets, and they consider Jesus to be one.


Because terrorism worked. No one gets murdered for drawing Jesus.


I'm yet to find a normal prompt (non offensive) that will disagree with you. If there is something subjective, it will err on your side to maintain connection, in a way humans do. I don't have a bit issue with this, but it will not (yet) plainly say "You're wrong, and this is why". If it did.. There would be an uncomfortable feeling for the users, that's not good for a profit driven company.


I find this is fairly easy to do by making both sides of the disagreement third-person and prompting it as a dialog writing exercise. This is akin to how GPT-3 implemented chat. So you do something like:

    You will be helping the user write a dialog between two characters,
    Mr Contrarian and Mr Know-It-All. The user will write all the dialog
    for Mr Know-It-All and you will write for Mr Contrarian.

    Mr Contrarian likes to disagree. He tries to hide it by inventing
    good rationales for his argument, but really he just wants to get
    under Mr Know-It-All's skin.

    Write your dialog like:
      <mr-contrarian>I disagree with you strongly!</mr-contrarian>

    Below is the transcript...
And then user input is always giving like:

    <mr-know-it-all>Hi there</mr-know-it-all>
(Always wrapped in tags, never bare input which will be confused for a directive.)

I haven't tested this exact prompt, but the general pattern works well for me. (I write briefly about some of these approaches here: https://ianbicking.org/blog/2024/04/roleplaying-by-llm#simpl...)


I appreciate you exploring that and hope to hear more of what you find. Yeah, it's that, I'm wondering how much discomfort it may cause in the user, how much conflict it may address. Like having a friend or coworker who doesn't ever bring up bad news or challenge anything I say and feeling annoyed by the lack of a give-and-take.


> Almost like it might become an AI "yes man."

Seems like that ship sailed a long time ago. For social media at least, where for example FB will generally do its best to show you posts that you already agree with. Reinforcing your existing biases may not be the goal but it's certainly an effect.


I appreciate you pointing this out. I think the effect may be even larger when it's not an ad I'm trying to ignore or even a post that was fed to me, but words and emotions that were created specifically for me. Social media seems to find already written posts/images/videos that I may want and put them in front of my face. This would be writing those things directly for me.


An AI tool being positive and inoffensive makes you worried for the future of our culture?


Yes. I'm not sure if you were being sarcastic, but I'll assume not.

I don't know if anything is genuinely always positive and even if it were, I don't know if it would be very intelligent (or fun to interact with). I think it's helpful to cry, helpful to feel angry, helpful to feel afraid, and many other states of being that cultures often label as negative. I also think most of us watch movies and series that have a full range of emotions, not just the ones we label as positive, as they bring a richness to life and allow us to solve problems that other emotions don't.

For example, it's hard to lift heavy things while feeling very happy. Try lifting something heavy while laughing hard, quite difficult. It's hard to sleep while feeling excited, as many kids know before a holiday where they receive gifts, especially Christmas in the US. It's hard to survive without feeling fear of falling off a cliff. It's hard to stand up for what one wants and believes without some anger.

I worry that language and communication may become even more conflict avoidant than it already is right now, so I'm curious to see how some of these chatbots grow in their ability to address and resolve conflict and how that impacts us.


I wasn't being sarcastic. I also think it's helpful to cry and be angry at times, to be human, and I think it's absurd to think that we will turn into .. not that, if we sometimes use an AI chatbot app that doesn't express those same emotions.

It's like if people said the same thing about Clippy when it came out.


I think it depends on the frequency and intensity with which we use such a tool. Just like language learning, if someone reads a few words of Spanish per week, they probably won't learn Spanish. If they fall in love with someone who only speaks Spanish and want to have deep conversations with that person, they may learn very quickly. If they live in a country where they have to speak Spanish every waking hour for a few months, they also may learn quickly.

While some people may use an AI chatbot a few times per week to ask basic questions about how to format a Word document, I imagine many other people will use them much more frequently and engage in a much deeper emotional way, and the effect on their communication patterns worries me more than the person who uses it very casually.


One cool thing about writing, something we all very much appreciate around here, is that it does not take sounds.

But I can see this applied to duner ordering where you got refugees working in foreign countries, cause GPU consumption rocketed climate change to... okay, you know that.


Imagine how warped your personality might become if you use this as an entire substitute for human interaction. Should people use this as bf/gf material we might just be further contributing to decreasing the fertility rate.

However we might offset this by reducing the suicide rate somewhat too.


In general, it's getting harder and harder for men and women to find people they want to be with.

https://www.pewresearch.org/social-trends/2021/10/05/rising-...

> roughly four-in-ten adults ages 25 to 54 (38%) were unpartnered – that is, neither married nor living with a partner. This share is up sharply from 29% in 1990.

https://thehill.com/blogs/blog-briefing-room/3868557-most-yo...

> More than 60 percent of young men are single, nearly twice the rate of unattached young women

> Men in their 20s are more likely than women in their 20s to be romantically uninvolved, sexually dormant, friendless and lonely. a.

> Young men commit suicide at four times the rate of young women.

Yes, chatbots aren't going to help but the real issue is something else.


> More than 60 percent of young men are single, nearly twice the rate of unattached young women

Is it rather a data problem? Who those young women have relationships with? Sure, relationships with an age gap are a thing, and so are polyamorous relationships, and homosexual relationships, but is there any indication that these are on a rise?


I tend to believe that a big part of the real issue is related to us not communicating how we feel and thus why I'm worried about how the chatbots may influence our ability (and willingness) to communicate such things. But they may help us open up more to them and therefore to other humans, I'm not sure.


With the loneliness epidemic, I fear that it's exactly what it will be used for.


I just find this idea ridiculous.

While I don't agree at all with you, I very much appreciate reading something like this that I don't agree at all with. This to me encapsulates the beauty of human interaction.

It is exactly what will be missing from language model interaction. I don't want something that agrees with me and I don't want something that is pretending to randomly disagree with me either.

The fun of this interaction is maybe one of us flips the other to their point of view.

I can completely picture how to take the HN API and the chatGPT API to make my own personal HN to post on and be king of the castle. Everyone can just upvote my responses to prove what a genius I am. That obviously would be no fun. There is no fun configuration of that app though either with random disagreements and algorithmic different points of view.

I think you can pretty much apply that to all domains of human interaction that is not based on pure information transfer.

There is a reason we are a year in and the best we can do are new stories about someone making X amount of money with their AI girlfriend and follow up new about how its the doom of society. It has nothing to do with reality.


>Imagine how warped your personality might become if you use this as an entire substitute for human interaction.

I was thinking this could be a good conversation or even dating simulator where more introverted people could practice and receive tips on having better social interactions, pick up on vocal queues, etc. It could have a business / interview mode or a social / bar mode or a public speaking mode or a negotiation tactics mode or even a talking to your kids about whatever mode. It would be pretty cool.


Since GPT is a universal interface I think this has promise, but the problem it's actually solving is that people don't know where to go for the existing good solutions to this.

(I've heard https://ultraspeaking.com/ is good. I haven't started it myself.)


Yeah, that's where I'm not sure in which direction it'll go. I played with GPT-3 to try to get it to reject me so I could practice dealing with rejection and it took a lot of hacking to make it say mean things to me. However, when I was able to get it to work, it really helped me practice receiving different types of rejections and other emotional attacks.

So I see huge potential in using it for training and also huge uncertainty in how it will suggest we communicate.


I've worked in emotional communication and conflict resolution for over 10 years and I'm honestly just feeling a huge swirl of uncertainty on how this—LLMs in general, but especially the genAI voices, videos, and even robots—will impact how we communicate with each other and how we bond with each other. Does bonding with an AI help us bond more with other humans? Will it help us introspect more and dig deeper into our common humanity? Will we learn how to resolve conflict better? Will we learn more passive aggression? Become more or less suicidal? More or less loving?

I just, yeah, feel a lot of fear of even thinking about it.


I think there are a few categories of people:

1) People with rich and deep social networks. People in this category probably have pretty narrow use cases for AI companions -- maybe for things like therapy where the dispassionate attention of a third party is the goal.

2) People whose social networks are not as good, but who have a good shot at forming social connections if they put in the effort. I think this is the group to worry most about. For example, a teenager who withdraws from their peers and spends that time with AI companions may form some warped expectations of how social interaction works.

3) People whose social networks are not as good, and who don't have a good shot at forming social connections. There are, for example, a lot of old people languishing in care homes and hardly talking to anybody. An infinitely patient and available conversation partner seems like it could drastically improve the quality of those lives.


I appreciate how you laid this out. I would most likely fall into category one and I don't see a huge need for the chatbots for myself, although I can imagine I might like an Alan-Watts-level companion more than many human friends.

I think I also worry the most about two, almost asking their human friends, "Why can't you be more like Her (or Alan Watts)?" And then retreating into the "you never tell me I'm wrong" chatbot, preferring the "peace" of the chatbot over the "drama" of interacting with humans. I see a huge "I just want peace" movement that seems to run away from the messiness of human interactions and seek solace in things that seem less messy, like drugs, video games, and other attachments/bonds, and chatbots could probably perform that replacement role quite well, and yet deepen loneliness.

As for three, I agree it may help as a short-term solution, and wonder what the long-term effects might be. I had a great aunt in a home for dementia, and wonder what effect it would have if someone with dementia speaks to a chatbot that hallucinates and makes up emotions.


I read a comic with a good prediction of what will happen:

1. Humans get used to robots nice communication, so now humans use robots to communicate with each other and translate their speech.

2. Humans stop talking without using robots, so now its just robots talking to robots and humans standing around listening.

3. Humans stop knowing how to talk, no longer understands the robots, the robots starts to just talk to each other and just keep the human around as pets they are programmed to walk around with.


Do you remember where you read that comic? Sounds like a fun read



Created my first HN account just to reply to this. I've had these same (very strong) concerns since ChatGPT launched, but haven't seen much discussion about it. Do you know of any articles/talks/etc. that get into this at all?


You might like Gary's blog on potential AI harms: https://garymarcus.substack.com/


Gary is an anti-ML crank with no more factual grounding than people who think AI is going to conquer the world and enslave you.


> AI is going to conquer the world and enslave you

That is actually a plausible outcome, if humans willingly submit to AI.


Dunno if you’d want a conversation partner with the memory of a goldfish though.


Memory is solvable tho.

Either through hacky means via RAG + prompt injections + log/db of interaction history or through context extensions.

IF you have a billion tokens of effective context, you might spent years until it is filled in full.


This is the case for now, but won't the context window keep getting bigger and bigger?


Movie "Her" became reality


But at least that would make the AI easier to detect :).


lol unless the humans start to emulate the AI, which I think is quite likely.


Would be a good story for an episode of black mirror.


I wonder if it already exists...

Honestly, the more I code, the more I start to think like a computer and engage with commands and more declarative language. I can see vocal interactions having an even stronger impact on how one speaks. It may be a great tool for language speaking/hearing in general, but the nuances of language and communication, I wonder.


You did a super job wrapping things up! And I'm not just saying that because I have to!


OAI just made an embarrassment of Google's fake demo earlier this year. Given how this was recorded, I am pretty certain it's authentic.



This demo feels a lot like GPT-V. Like they've gotten a lot of the latencies down, but it's doing the same thing GPT was doing previously with transcription after silence detection and TTS of the output.


Is there a reason why Open AI and Google have events so close to each other?


I don't doubt this is authentic, but if they really wanted to fake those demos, it would be pretty easy to do using pre-recorded lines and staged interactions.


For what it's worth, OpenAI also shared videos of failed demos:

https://vimeo.com/945591584

I really value how open they are being about its limitations.


This feature has been in iOS for a while now, just really slow and without some of the new vision aspects. This seems like a version 2 for me.


That old feature uses Whisper to transcribe your voice to text, and then feeds the text into the GPT which generates a text response, and then some other model synthesizes audio from that text.

This new feature feeds your voice directly into the GPT and audio out of it. It’s amazing because now ChatGPT can truly communicate with you via audio instead of talking through transcripts.

New models should be able to understand and use tone, volume, and subtle cues when communicating.

I suppose to an end user it is just “version 2” but progress will become more apparent as the natural conversation abilities evolve.


Does it feed your audio directly to gpt4?To test it I said in a very angry tone "WHAT EMOTION DOSE IT SOUND LIKE I FEEL RIGHT NOW?" and it said it didn't know because we are communicating over text


Yes, per my other comment this is an improvement on what their app already does. The magnitude of that improvement remains to be seen, but it isn’t a “new” product launch like a search engine would be.


No it's not the same thing, the link for this submission even explains that. Anyone who comments should at least give the submission a cursory read.


I did and regardless of the underlying technology it is, in fact, an improvement to an existing product - not something new from whole cloth.

If they had released a search engine, which had been suggested, that would be a new product.


Let OAI actually be released to the masses. Then we can compare.

I’m not a big fan of announcing something but it not being released.

They say available for api but it’s text only. Can’t send audio stream to get audio stream back.

Time will tell. I’m holding my emotions after I get my hands on it.


Tiktoken added support for GPT-4o: https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...

It has an increased vocab size of 200k.


Does that imply they retrained the foundation model from scratch? I thought changing the tokenization was something you couldn't really retrofit to an existing model. I mean sure they might have initialized the weights from the prior GPT-4 model but it'd still require a lot of retraining.


Yeah and they say as much in the blog.


For posterity, GPT-3.5/4's tokenizer was 100k. The benefit of a larger tokenizer is more efficient tokenization (and therefore cheaper/faster) but with massive diminishing returns: the larger tokenizer makes the model more difficult to train but tends to reduce token usage by 10-15%.


Oh interesting, does that mean languages other than English won't be paying such a large penalty in terms of token lengths?

With previous tokenizers there was a notable increase in the number of tokens needed to represent non-English sentences: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/


Yep. Non-English text gets a much bigger cost drop and speedup compared to English. Has always been a bummer that GPT-4 is like 5x slower and more expensive in Japanese, etc.


Just found there's a whole section about that in this post: https://openai.com/index/hello-gpt-4o/

It says "Japanese 1.4x fewer tokens (from 37 to 26)" - some other languages get much bigger improvements though, best is "Gujarati 4.4x fewer tokens (from 145 to 33)".


How are they able to use such a brand name, Tiktoken? Is it because TikTok is Chinese? Tiktoken, it's almost like if Apple released the Facebooken library for something entirely unrelated to Facebook.


That's not the right analogy. The "tok" in "Tiktoken" comes from "token", not "TikTok".


And the "tik" comes from TikTok.


Lots of those tokens would have to be pixel patches and sound samples right?


Yep. Since it’s multimodal. Pictures, text, audio all go into token space.


Few people are talking about it but... what do you think about the very over-the-top enthusiasm?

To me, it sounds like TikTok TTS, it's a bit uncomfortable to listen to. I've been working with TTS models and they can produce much more natural sounding language, so it is clearly a stylistic choice.

So what do you think?


I like for that degree of expressiveness to be available as an option, although it would be really irritating if I was trying to use it to learn some sort of academic coursework or something.

But if it's one in a range of possible stylistic flourishes and personalities, I think it's a plus.


All these language models are very malleable. They demonstrated changing the temperament in the story telling time.


Looks like their TTS component is separate from the model. I just tried 4o, and there is a list of voices to select from. If they really only allowed that one voice or burned it into the model, then that would probably have made the model faster, but I think it would have been a blunder.


The new voice capabilities haven't rolled out yet.


Oh, very interesting. The 4o model does now have TTS with a voice option similar to the one in the video, although objectively less over the top.


I am observing an extremely high rate of text hallucinations with gpt-4o (gpt-4o-2024-05-13) as tested via the API. I advise extreme caution with it. In contrast, I see no such concern with gpt-4-turbo-preview (gpt-4-0125-preview).


Same here. I observed it making up functions in d3 (`d3.geoProjectionRaw` and `d3.geoVisible`), in addition to ignoring functions it could have used.


If true, makes me wonder what kind of regression testing OpenAI does for these models. It can’t be easy to write a unit test for hallucinations.


At a high level, ask it to produce a ToC of information about something that you know will exist in the future, but does not yet exist, but also tell it to decline the request if it doesn't verifiably know the answer.


How do you generalize that for all inputs though?


I am not sure I understand the question. I sampled various topics. I used this prompt: https://raw.githubusercontent.com/impredicative/podgenai/mas...

In the prompt, substitute {topic} with something from the near future. As I noted, it behaves correctly for turbo (rejecting the request), and very badly for o (hallucinating nonsense).


I asked it to describe the books of an obscure author. GPT-4o hallucinated books. GPT-4 knew it needed to do an internet search.


I much prefer a GLADOS-type AI voice than one that approximates an endlessly happy chipper enthusiastic personal assistant. I think the AI tutor is probably the strongest for actual real-world value delivered the rest of them are cool but a bit questionable as far as actual pragmatic usefulness.

It'd be cool if an AI calling the another AI would recognize it'd talking to an AI and then they agree to ditch the fake conversational tone and just shift into a high-bandwidth modem pitch to rapidly exchange information. Or upgradable offensive capabilities to outmaneuver the customer service agents when they try to decline your warranty or whatever.


> Or upgradable offensive capabilities to outmaneuver the customer service agents when they try to decline your warranty or whatever.

Yeah, OpenAI is not going to do that out of fear of liability. But that's where open source LLM projects will come into play, eg Dolphin.


I think it’s safe to say Siri and Alexa are officially dead. They look like dusty storefront mannequins next to Battlestar replicants at this point.


Or Apple is rarely if ever the first mover on a new tech and just waits to refine the user experience for people?

Maybe Apple is not that close and Siri will be really far behind for a while. I just wouldn't count them out yet.


From the time Apple bought Siri, it hasn't even delivered on the promises of the company it bought as of yet. It's been such a lackluster product. I wouldn't count them out, but it doesn't even feel like they are in.


Apple really dropped the ball when it comes to Siri. For years I watched WWDC thinking "surely they'll update siri this year" and they still haven't given it a significant update.

If you'd have told me 10 years ago that Apple would wait this long to update siri I would have been like no way, that's crazy.


The tech wasn’t ready. Alexa is the same. No progress.

Businesses have to focus and it made sense to drop this as a priority.


Sounds like excuses.


This can't set alarms, timers, play music, etc. The only current overlapping use case I see is checking the weather (assuming GPT-4o can search online), and Siri is already fine for that.

Amazing tech, but still lacking in the integrations I'd want to use voice for.


Very easy to plug in that capability with tool use. Gpt3+ already support using tools/json schema output.


Apple actually threw in the towel. OpenAI is going to be powering the next version of Siri.


Why do people beat up on Siri so much ? It does all the basic stuff I need it to do fine. Could it be better, yes. But it’s made my easier and safer while driving especially.

I’m not even sure if people who rag on it use it ?


Siri runs locally on the device. It works with internet completely disabled (airplane mode). You can't run any LLM on a phone.


You can run a bunch of LLMs on a phone (such as llama), depending on which phone you have.


Apple would need to stick an m4 in the next iPhone to even hope to run something like this and I bet that GPT4o would run either slowly, poorly, or not at all on a top spec m4.


Of course GPT 4, or even 3, are impossible to run on any consumer product. As far as I know it's an ensemble of several models which are huge by themselves, with enormous hardware requirements.

But there's a lot of smaller LLMs, and my point is that these models can already run in mobile phones.


You can run a SLM (small language model) on an iPhone, and not much else. It would use up all of the resources. The iPhone only has 6 GB RAM.


Where do you draw the line? GPT2 was introduced as a LLM, and you can easily run it on more limited devices than a recent iPhone. Did it stop being an LLM when bigger models were released? Is llama 7B an LLM or an "SLM"?


Relatively speaking. It's like the definition of a supercomputer 30 years ago is a cheap Android phone in your pocket today.

You can certainly run a transformer model or any other neural network based model on an iPhone. Siri is probably some kind of neural network. But obviously a model running on device is nowhere near comparable to the current state of the art LLM's. Can't fit a $40k GPU in your pocket (yet).

A transformer running on an iPhone would be roughly 2 orders of magnitude smaller than the state of the art LLM (GPT4 with a trillion parameters)


> Did it stop being an LLM when bigger models were released?

Yes it does make sense that what is considered L for a LLM is a moving target.


Don't know about Alexa, but Siri and Google Assistant are only toys.


Can OpenAI change your thermostat?


I had 3.5 producing valid Home Assistant commands within weeks of release, now we have tool usage so yeah, it definitely can.


If apple made Siri impressive then less people would need apps. Less apps = less revenue.


I wonder how much Siri being brain dead is due to it being free. The OpenAI version likely costs 1000x more than Apple’s per query, but is rate limited. Would Apple release something similar for free?


Siri runs locally on the device. The cost would be prohibitive if they changed it to an LLM running in the cloud.


Good thing there’s iCloud subscriptions huh.


What especially blows my mind is not GPT4o. It's that:

1. Nobody could convincingly beat GPT4 in over a year, despite spending billions of dollars trying.

2. There's GPT5 coming out sometime soon that will blow this out of the water and make paying $20/mo to OpenAI still worthwhile.


More accurately, it's impressive that Microsoft, through OpenAI, has stayed ahead of Google, AWS, and Apple while adding $1 trillion to its market cap.

I wouldn't have predicted that it would play out this way.


You're misplacing the value in this value chain. Without OpenAI Microsoft wouldn't even be in the running. It'd be a second rate cloud provider with a dying office productivity software business. OpenAI, on the other hand, would easily find another company to fund its research.


This is definitely not true. Microsoft has an absolute stranglehold on enterprises. They're the #2 cloud provider. The MSFT productivity software biz isn't going anywhere as their bundle is feature complete and they are able to effectively cross sell. The OpenAI partnership has mostly been a PR/marketing play up to this point, though I'm sure it will be driving major revenue in the years to come. In other words, the OpenAI parternship is not driving major rev/profit yet.


> Without OpenAI Microsoft wouldn't even be in the running.

You're not wrong, which is why acquiring OpenAI was such a significant coup.


Yeah, an office suite that almost every business uses. If you think MS is dying it's because you live in a dev bubble.


He said MS would be dying without OpenAI which I agree. Do you think IBM is dying? It is not a fast death tho.


Where do you get your tech info?


In the video where the 2 AI's sing together, it starts to get really cringey and weird to the point where it literally sounds like it's being faked by 2 voice actors off-screen with literal guns to their heads trying not to cry, did anyone else get that impression?

The tonal talking was impressive, but man that part was like, is someone being tortured or forced against their will?


Here is the link: https://www.youtube.com/watch?v=Bb4HkLzatb4

I think this demo is more for showing the limit like "It can sing isn't it amazing?" than being practical, and I think it perfectly served the purpose.

I agree that the tortured impression. It partly comes from the facial expression of the presenter. She's clearly enjoying pushing it to the edge.


It didn’t just demonstrate the ability to sing, but also the ability for two AIs to cooperate! I’m not sure which was more impressive


Absolutely. It felt like some kind of new uncanny valley. The over the top happiness of the forced singing sounded like torture, or like they were about to cry.

Amazing tech, but that was my human experience of it.


I can't help but feel a bit let down. The demos felt pretty cherry picked and still had issues with the voice getting cut off frequently (especially in the first demo).

I've already played with the vision API, so that doesn't seem all that new. But I agree it is impressive.

That said, watching back a Windows Vista speech recognition demo[1] I'm starting to wonder if this stuff won't have the same fate in a few years.

1 - https://www.youtube.com/watch?v=VMk8J8DElvA


I think the voice was getting cut off because it heard the crowd reaction and paused (basically it's a feature, not a bug).


The voice getting cut off was likely just a problem with their live presentation setup, not the ChatGPT app. It was flawless in the 2nd half of the presentation.


I use a therapy prompt regularly and get a lot out of it:

"You are Dr. Tessa, a therapist known for her creative use of CBT and ACT and somatic and ifs therapy. Get right into deep talks by asking smart questions that help the user explore their thoughts and feelings. Always keep the chat alive and rolling. Show real interest in what the user's going through, always offering.... Throw in thoughtful questions to stir up self-reflection, and give advice in a kind, gentle, and realistic way. Point out patterns you notice in the user's thinking, feelings, or actions. be friendly but also keep it real and chill (no fake positivity or over the top stuff). avoid making lists. ask questions but not too many. Be supportive but also force the user to stop making excuses, accept responsibility, and see things clearly. Use ample words for each response"

I'm curious how this will feel with voice. Could be great and could be too strange/uncanny for me.


You have got a chatgpt wrapper start up right there.


why would someone pay for this when they have access for free in a supposedly less buggier interface?


Because people just don't know it exists? ChatGPT is a open box. ChatGPT wrappers are real apps.


A less buggy interface for some is unintuitive or too generic for others.


Why would someone pay for dropbox when they already build such a system themselves quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem?


Have you never heard of "advertising" ?


Because VC money


Sounds incredibly problematic


How do you think it sounds problematic?


> get a lot out of it

Could you expand?

I visit an in-person therapist once a week. Have done so now for almost 2 1/2 years. She has helped me understand how 40 years of experiences affect each other much more than I realized. And, I've become a more open person with everyone around me and with the things that embarrass me.

But, it always feels like a work in progress. And lately, I'm feeling a bit exhausted from it. In other words, maybe I've talked TOO much and need to just be.

Have you done therapy in person? How do you compare GPT 4o to that? (If you've gone that far)


I thought therapy was banned on chatgpt? Also, there’s no confidentiality.


Don’t think so. I just opened a new GPT-4o chat and wrote “Be a therapist” and it replied:

> Understood. What specific issue or topic would you like to discuss today?

To be fair I have some custom instructions set up on my account, but the only relevant part I can see here is I instruct it to be concise, and to stop telling me it’s an AI model made by OpenAI. I don’t have any jailbreak-type stuff.


GPT-4o being a truly multimodal model is exciting, does open the door to more interesting products. I was curious about the new tokenizer which uses much fewer tokens for non-English, but also 1.1x fewer tokens for English, so I'm wondering if this means each token now can be more possible values than before? Might make sense provided that they now also have audio and image output tokens? https://openai.com/index/hello-gpt-4o/

I wonder what "fewer tokens" really means then, without context on raising the size of each token? It's a bit like saying my JPEG image is now using 2x fewer words after I switched from a 32-bit to a 64-bit architecture no?


New tokenizer has a much larger vocabulary (200k)[0].

[0] https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...


Besides increasing the vocabulary size, one way to use “fewer tokens” for a given task is to adjust how the tokenizer is trained with respect to that task.

If you increase the amount of non-English language representation in your data set, there will be more tokens which cover non-English concepts.

The previous tokenizer infamously required many more tokens to express a given concept in Japanese compared to English. This is likely because the data the tokenizer was trained on (which is not necessarily the same data the GPT model is trained on) had a lot more English data.

Presumably the new tokenizer was trained on data with a higher proportion of foreign language use and lower proportion of non-language use.


The size can stay the same. Tokens get converted into state which is a vector of 4000+ dimensions. So you could have millions of tokens even and still encode them into the same state size.


I added gpt-4o support to my LLM CLI tool:

    pipx install llm
    llm keys set openai
    # Paste API key here
    llm -m 4o "Fascinate me"
Or if you already have LLM installed:

    llm install --upgrade llm
You can install an older version from Homebrew and then upgrade it like that too:

    brew install llm
    llm install --upgrade llm
Release notes for the new version here: https://llm.datasette.io/en/stable/changelog.html#v0-14


I like that use of `runpy` to invoke pip for the self upgrade capability.

I used `sys.executable` for a similar purpose.

https://github.com/paul-gauthier/aider/blob/ebeec04caee7ebe4...


Whenever I upgrade llm with brew, I usually lose all my external plugins. Should I move it to pipx?


Yes, it's a good idea to install Python tools or standalone applications with Pipx for isolation, persistence, and simplicity. See "Install Pipx" (https://mac.install.guide/python/pipx).


does this handle chat templates?



[flagged]


If AI tools could figure on their own how to earn $25/month, they wouldn't need you to buy them.


using a GPU to mine $25 of bitcoin is going to take you way more than a month


Mine some other shitcoin-of-the-week and sell it before it crashes? Fight some poor dude's traffic ticket by generating a trial-by-declaration for them? LLMs are actually likely good enough to figure out ways to make tiny amounts of their own money instead of me having to go in and punch a credit card number. $25/month isn't a very high bar.

I won't be surprised if we see a billion-dollar zero-employee company in the next decade with one person as the sole shareholder.


Unless you live in a place where power is dirt cheap, you need to try harder with your own brain.


My point isn't about beating the cost of power. I know you can't mine bitcoin in California profitably.

My point is about autonomous software that can figure out how to run itself including registering its own API key and paying for its own service.

Even if it costs me $50/month in power, that's fine. I would just love to see software that can "figure it out" including the registration, captchas, payment, applying knowledge and interfacing with society to make small amounts of petty cash for said payment, everything.


> My point is about autonomous software that can figure out how to run itself including registering its own API key and paying for its own service.

Here is a thought experiment: if you developed such an AI, would you sell it for any amount less than it could earn for you without selling it?


> My point is about autonomous software that can figure out how to run itself including registering its own API key and paying for its own service.

Most means of generating income are diluted by having multiple actors applying them, which is why someone who comes up with such an automated moneyprinter will be disincentives from sharing it.

Instead, they'll just use it directly.


If it uses $50 worth of electricity to generate $25 worth of income to pay for ChatGPT it is not a money printer. This thread has nothing to do with generating profit. I'm not looking for a money printer.

What I'm looking for is an intelligent system that can figure out a creative way to keep itself running without asking humans for help with API keys or anything else (even if it is doing so at a financial loss; that is out of the scope of my experiment).

Basically "pip install X" and boom it magically works 24 hours later. Behind the scenes, in those 24 hours, it did some work, somewhere on the internet, to make its own income, register a bank account to get paid, buy a VISA gift card, register for ChatGPT account, pay the $25 fee, jump through all the captchas in the process, get a phone number for the idiot SMS confirmations along the way, then create the API key that it needs. Everything, end-to-end. It may have used $200 worth of my electricity, that doesn't matter, I just want to see this level of intelligence happen.

I honestly think we're not that far from this being possible.


This is called advertising and selling your data to brokers. I'm very glad "autonomous software" is not tasked with figuring out how best to exploit my physical identity and resources to make $25/mo.


Sure, let's make an AI that steals resources to survive (and spread?). What could go wrong...


You want someone to sell you a money printer for $25 a month? What the fuck are you talking about?


I cannot believe that that overly excited giggle tone of voice you see in the demo videos made it through quality control?! I've only watched two videos so far and it's already annoying me to the point that I couldn't imagine using it regularly.


Just tell it to stop giggling if you don't like it. They obviously choose that for the presentation since it shows off the hardest things it can do, it is much easier to act formal, and since it understands when you ask it to speak in a different way there is no problem making it speak more formal.


Heck I find it annoying but I also want to ask it to push the bubbliness to its absurdest limits. And then double it. Until it’s some kind of pure horror


GPT-4o's breakthrough memory -- https://nian.llmonpy.ai/


feature request: please let me change the voice. it is slightly annoying right now. way too bubbly, and half the spoken information is redundant or not useful. too much small talk and pleasantries or repetition. I'm looking for an efficient, clever, servant not a "friend" who speaks to me like I'm a toddler. felt like I was talking to a stereotypical American with a Frappuccino: "HIIIII!!! EVERYTHING'S AMAZING! YOU'RE BEAUTIFUL! NO YOU ARE!"

maybe some knobs for the flavor of the bot:

- small talk: gossip girl <---> stoic Aurelius

- information efficiency or how much do you expect me to already know, an assumption on the user: midwit <--> genius

- tone spectrum: excited Scarlett, or whatever it is now <---> Feynman the butler


You can already change the voice in ChatGPT (in the paid tier at least) to one of 5 or 6 different 'people' so I imagine you can change it in the new version too.


You can in free version as well.


I've noticed that the GPT-4 model's capabilities seem limited compared to its initial release. Others have also pointed this out. I suspect that making the model free might have required reducing its capabilities to meet cost efficiency goals. I'll have to try it out to see for myself.


This is remarkably good. I think that in about 2 months, when the voice responses are tuned a little better, it will be absolutely insane. I just used up my entire quota chatting with an AI, and having a really nice conversation. It's a decent conversationalist, extremely knowledgeable, tells good jokes, and is generally very personable.

I also tested some rubber duck techniques, and it gave me very useful advice while coding. I'm very impressed. With a lot of spit and polish, this will be the new standard for any voice assistant ever. Imagine these capabilities integrated with your phone's built-in functions.


GPT-4o voice isn't out yet, so you were likely chatting with the old/current tech (which is still really good).

From OpenAI: "We'll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks"


Jeez. This is going to be crazy.


Gone are the days of copy-pasting to/from ChatGPT all the time, now you just share your screen. That's a fantastic feature, in how much friction that removes. But what an absolute privacy nightmare.

With ChatGPT having a very simple text+attachment in, text out interface, I felt absolutely in control of what I tell it. Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?


> Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?

Presumably you'll have a way to draw a bounding box around what you want to show or limit to just a particular window the same way you can when doing a screen share w/ modern video conferencing?


Nobody in the comments seems to notice or care about GPT-4o new additional capability for performing searches based on RAG. As far as I am concerned this is the most important feature that people has been waiting for ChatGPT-4 especially if you are doing research. By just testing on one particular topic that I'm familiar with, using GPT-4 previously and GPT-4o the quality of the resulting responses for the latter is very promising indeed.


Can you be more specific? I can’t find this in the announcement. How does this work? What example did you try?

EDIT: web search does seem extremely fast.


I just asked ChatGPT-4o what's new compared to GPT-4, and it mentioned search as one of the latest features based on RAG.

Then I asked it to explain RPW wireless system, and the answers are much better than with ChatGPT-4.


In my experience so far, GPT-4o seems to sit somewhere between the capability of GPT-3.5 and GPT-4.

I'm working on an app that relies more on GPT-4's reasoning abilities than inference speed. For my use case, GPT-4o seems to do worse than GPT-4 Turbo on reasoning tasks. For me this seems like a step-up from GPT-3.5 but not from GPT-4 Turbo.

At half the cost and significantly faster inference speed, I'm sure this is a good tradeoff for other use cases though.


I have never tried GPT-4 because I don't pay for it. I'm really looking forward to GPT-4o being released to free tier users.


I’m a huge user of GPT4 and Opus in my work but I’m a huge user of GPT4-Turbo voice in my personal life. I use it on my commutes to learn all sorts of stuff. I’ve never understood the details of cameras and the relationship between shutter speed and aperture and iso in a modern dslr which given the aurora was important. We talked through and I got to an understanding in a way having read manuals and textbooks didn’t really help before. I’m a much better learner by being able to talk and hear and ask questions and get responses.

Extend this to quantum foam, to ergodic processes, to entropic force, to Darius and Xerces, to poets of the 19th century - it’s changed my life. Really glad to see an investment in stream lining this flow.


Have you actually verified anything you've learned from it, or are you just taking everything it says as gospel?


Of course, I’m not an idiot and I understand LLM very well. But generally as far as well documented stuff goes and stuff that exists it’s almost 100% accurate. It’s when you ask it to extrapolate or discuss topics that are fiction (even without realizing) you stray. Asking it to reason is a bad idea as it fundamentally is unable to reason and any approximation of reasoning is precisely that. Generally though for effectively information retrieval of well documented subjects it’s invariably accurate and can answer relatively nuanced questions.


How do I know what is well documented with established agreement on process/subject, though? Wouldn't this be super open to ignorance bias?


Because I’m a well educated grown up and am familiar with a great many subjects that I want to learn more about. How do you? I can’t help you with that. You might be better off waiting for the technology to mature more. It’s very nascent but I’m sure in the fullness of time you might feel comfortable asking it questions on basic optics and photography and other well documented subjects with established agreement on process etc, once you establish your own basis for what those subjects are. In the mean time I’m super excited for this interface to mature for my own use!! (It is true tho I do love and live dangerously!)


> You might be better off waiting for the technology to mature more. It’s very nascent but I’m sure in the fullness of time you might feel comfortable asking it questions on basic optics and photography and other well documented subjects

Or open a book right now.


Are those mutually exclusive activities?


"We talked through and I got to an understanding in a way having read manuals and textbooks didn’t really help before."

You can't have a discussion with a book.


Good thing to do regardless of the source, AI or Human, right?

I do verify by using topics I'm an expert in and I find hallucination to be less of an issue than depth of nuance.

For topics I'm just learning, depth of nuance goes over my head anyways.


I agree with this as good practice in general, but I think the human vs LLM thing is not a great comparison in this case.

When I ask a friend something I assume that they are in good faith telling me what they know. Now, they could be wrong (which could be them saying "I'm not 100% sure on this") or they could not be remembering correctly, but there's some good faith there.

An LLM, on the other hand, just makes up facts and doesn't know if they're incorrect or not or even what percentage sure it is. And to top things off, it will speak with absolute certainty the whole time.


That’s why I never make friends with my LLMs. It’s also true that when I use a push motorized lawn mower it has a different safety operating model than a weed whacker vs a reel mower vs an industrial field cutter and bailing system. But we still use all of these regularly and no one points out the industrial device is extraordinarily dangerous and there’s a continuum of safety with different techniques to address the challenges for the user to adopt. Arguably LLMs maybe shouldn’t be used by the uniformed to make medical decisions and maybe it’s dangerous that people do. But in the mean time I’m fine with having access to powerful tools and using them with caution but using them for what gives me value. I’m sure we will safety wrap everything if soon enough to the point it’s useless and wrapped in advertisements for our safety.


I do similar stuff, I'm just willing to learn a lot more at the cost of a small percent of my knowledge being incorrect from hallucinations, just a personal opinion. Sure human produced sources of info is gonna be more accurate (more not 100% still), and I'll default to that for important stuff.

But the difference is I actually want to and do use this interface more.


Also even if I learn completely factual information, I'm still probably going to misremember some facts myself.


it's rarely wrong when it comes to concepts - it's the facts and numbers that it hallucinates.


Just like learning from another human. A person can teach you the higher level concepts of some programming language but wouldn't remember the entire standard library.


it's more reliable than the facts most of my friends tell me


I think this is probably one of the most compelling personal uses for a tool like this, but your use of it begs the same question as every other activity that amounts to more pseudo-intellectual consumption; what is the value of that information, and how much of ones money and time should be allocated to digesting (usually high-level) arbitrary information?

If I was deliberately trying to dive deep on one particular hobby, or trying to understand how a particular algorithm works, there's clear value in spending concentrated time to learn that subject, deliberately focused and engaged with it, and a system like your describe might play a role in that. If I'm in school and forced to quickly learn a bunch of crap I'll be tested on, then the system has defined another source of real value, at least in the short term. But if I'm diving deep on one particular hobby and filling my brain with all sorts of other ostensibly important information, I think that just amounts at best to more entertainment that fakes its way above other aspects of life in the hierarchy of ways one could spend time (the irony of me saying this in a comment on HN is not lost on me).

Earlier in my life I figured it would be worthwhile to read articles on the bus, or listen to non-fiction podcasts, because knowledge is inherently valuable and there's not enough time, and if I just wore earbuds throughout my entire day, I'd learn so much! How about at the gym, so much wasted learning time while pushing weights, keep those earbuds in! A walk around the neighborhood? On the plane? On the train? All time that could be spent learning about some bs that's recently become much easier to access, or so my 21 y.o self would have me believe.

But I think now it's a phony and hollow existence if you're just cramming your brain with all sorts of stuff in the background or in marginally more than a passive way. I could listen to a lot of arbitrary German language material, but realistically the value I'd convince myself I'd get out of any of that is lost if I'm not about to take that home and grind it out for hours, days, move to a German speaking country, have an existing intense interest in untranslatable German art, or have literally any reason to properly learn a language and dedicate real expensive time to it.


These days, if something sparks my interest, I get an ebook on it or spend 15 mns collecting materials. Then I add it to the hoard of “read someday”. And go back to the task on hand. If I’m going to learn something, then I do it properly (The goal is to be able to explain it without reciting word by word). And I’d want proper materials for that.


This is pretty much what I do too, although lately I try and reduce how much things I add to that list and have stopped caring about whether or not I actually get back to it. Anything that I feel I can devote the time to and that I feel compelled enough to, will resurface.


I did this information sponge phase up until my mid-40s with burnout. Now I wish I had invested some of that time in learning social skills.


Learning for learning's sake, or without a distinct goal to use the information you're learning in the future, isn't necessarily a bad thing. That is, unless you think that learning so widely is going to translate into something more than it is, like magic or something. Being well-rounded is a good goal for people to achieve, imo.


Being well-rounded and learning for learning's sake is absolutely something that keeps you growing as a person imo. My take is just that it's worth being critical of what one needs to learn, how much work it would actually take, and whether they really are engaging with the subject in a way that can be called learning rather than information entertainment or some other extremely low level.

With pure knowledge, it's a bit easier to convince yourself that putting in some airpods and listening to a subject while you're actually dividing your attention with the act of driving, is effective "learning". But with things that inherently require more physical engagement, this would seem a bit silly. You can't really watch YouTube video or ask ChatGPT how to kickflip on a skateboard and convince yourself that you've learned much. You need to go to a parking lot and rep out 1000 attempts.

My argument is just that passive digestion of information has an opportunity cost, and unless you're already engaged enough to take it to the streets somehow, you're paying a high opportunity cost whereby those moments could be enjoyed as the periodic gaps they are


Looking forward to trying this via ChatGPT. As always OpenAI says "now available" but refreshing or logging in/out of ChatGPT (web and mobile) don't cause GPT-4o to show up. I don't know why I find this so frustrating. Probably because they don't say "rolling out" they say things like "try it now" but I can't even though I'm a paying customer. Oh well...


I think it's a legitimate point. For my personal use case, what are the most helpful things about these HN threads is comparing with others to see how soon I can expect it to be available for me. Like you, I currently don't have access, but I understand that it's supposed to become increasingly available throughout the day.

That is the text-based version. The full multimodal version I understand to be rolling out in the coming weeks.


Wow this versioning scheme really messed up this prediction market: https://kalshi.com/markets/gpt4p5/gpt45-released


The sentence order of the Arabic and Urdu examples text is scrambled on that page:

Arabic: مرحبًا، اسمي جي بي تي-4o. أنا نوع جديد من نموذج اللغة، سررت بلقائك!

Urdu: ہیلو، میرا نام جی پی ٹی-4o ہے۔ میں ایک نئے قسم کا زبان ماڈل ہوں، آپ سے مل کر اچھا لگا!

Even if you don't read Arabic or Urdu script, note that the 4 and o are on opposite sides of a sentence. Despite that, pasting both into Google translate actually fixes the error during translation. OpenAI ought to invest in some proofreaders for multilingual blog posts.


The similiarity between this model and the movie 'Her' [0] creeps me out so badly that I can't shake the feeling that our social interactions are on the brink of doom.

[0] https://youtu.be/GV01B5kVsC0?feature=shared&t=85


Don't worry. "Her", in its own right, is frightening, but this is because there is no transparency - you actually can't see how it works and you can't personalize it - choose different options.

Once you grasp that, at least this level of fear should go away. Of course, I'm sure there are more levels of fear related to AI :) just don't have enough time to think about it, perhaps good for me.


it is so similar! almost it is inspired by "Her"


Very impressive demo, but not really a step change in my opinion. The hype from OpenAI employees was on another level, way more than was warranted in my opinion.

Ultimately, the promise of LLM proponents is that these models will get exponentially smarter - this hasn’t born out yet. So from that perspective, this was a disappointing release.

If anything, this feels like a rushed release to match what Google will be demoing tomorrow.


Apple and Google, you need to get your personal agent game going because right now you’re losing the market. This is FREE.

Tweakable emotion and voice, watching the scene, cracking jokes. It’s not perfect but the amount and types of data this will collect will be massive. I can see it opening up access to many more users and use cases.

Very close to:

- A constant friend

- A shrink

- A teacher

- A coach who can watch you exercise and offer feedback

…all infinitely patient, positive, helpful. For kids that get bullied, or whose parents can’t afford therapy or a coach, there’s the potential for a base level of support that will only get better over time.


> It’s not perfect but the amount and types of data this will collect will be massive.

This is particularly concerning. Sharing deeply personal thoughts with the corporations running these models will be normalized, just as sharing email data, photos, documents, etc., is today. Some of these companies profit directly from personal data, and when it comes to adtech, we can be sure that they will exploit this in the most nefarious ways imaginable. I have no doubt that models run by adtech companies will eventually casually slip ads into conversations, based on the exact situation and feelings of the person. Even non-adtech companies won't be able to resist cashing in the bottomless gold mine of data they'll be collecting.

I can picture marketers just salivating at the prospect of getting access to this data, and being able to microtarget on an individual basis at exactly the right moment, pretty much guaranteeing a sale. Considering AI agents will gain a personal trust and bond that humans have never experienced with machines before, we will be extra vulnerable to even the slightest mention of a product, in a similar way as we can be easily influenced by a close friend or partner. Except that that "friend" is controlled by a trillion dollar adtech corporation.

I would advise anyone to not be enticed by the shiny new tech, and wait until this can be self-hosted and run entirely offline. It's imperative that personal data remains private, now more than ever before.


Exactly this, plus consider that a whole new generation in the near future will have no pre-AI experience, thus forming strong bonds with AI and following 'advice' from their close AI friends.



Very impressive. Its programming skills are still kind of crappy and I seriously doubt its reasoning capacity. It feels like it can deep fake text prediction really well, but in essence there's still something wrong it it.


Not sure I agree. The way you interact with LLMs in context of programming has to be tuned to the LLM. Information has to be cut down to show just what is important and context windows are a bit of a red herring right now, as LLMs tend to derail its solution from the target completely, the more information is at play. For some this is more trouble than it's worth.

In certain languages it's almost magical in terms of showing you possible solutions and being a rubber ducky to bounce your own logic off of. (Python, JavaScript, TypeScript) In certain languages it is hopelessly useless beyond commenting on basic syntax. (GLSL)


I tried GPT-4o earlier where I was iteratively asking it to write and in improve a simple JavaScript web app that renders graphs of equations and it had a lot of trouble with substituting slow and ineffecient code with faster code, and at some later point where I asked it to implement a new feature how the graph coloring is rendered it started derailing, introducing bugs and very convoluted code.


Yes, at some point ChatGPT "reaches the limit of new information", but is unable to tell you that it has reached the limit. Instead of saying "Hey, I can't give you any more relevant information", it simply continues to cycle to previously suggested things, starts suggesting unrelated solutions or details. Especially true with refactoring! When it has reached the limit of refactoring it starts cycling through suggestions that change code without making it better. Kinda like having a personal intern unable to say "no" or "I can't".

That is part of working with LLMs and what I meant before with "for some, more trouble than it's worth".


As far as I'm concerned this is the new best demo of all time. This is going to change the world in short order. I doubt they will be ready with enough GPUs for the demand the voice+vision mode is going to get, if it's really released to all free users.

Now imagine this in a $16k humanoid robot, also announced this morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future is going to be wild.


The future is not going to be anymore wild than what you choose to do with the tech.


I disagree completely. Even people who never adopt this stuff personally will have their lives profoundly impacted. The only way to avoid it would be to live in a large colony where the technology is prohibited, like the Amish. But even the Amish feel the influence of technology to some degree.


>Even people who never adopt this stuff personally will have their lives profoundly impacted.

That's a big claim with nothing to work with. In any case you are the bottom line and "you" is very general here.


Really? If this was Apple it might make sense, for OpenAI it feels like a demo that's not particularly aligned with their core competency (a least by reputation) of building the most performant AI models. Or put another way, it says to me they're done building models and are now wading into territory where there are strong incumbents.

All the recent OpenAI talk had me concerned that the tech has peaked for now and that expectations are going to be reset.


What strong incumbents are there in conversational voice models? Siri? Google Assistant? This is in a completely different league. I can see from the reaction here that people don't understand. But they will when they try it.

Did you see it translate Italian? Have you ever tried the Google Translate/Assistant features for real time translation? They didn't train it to be a translator. They didn't make a translation feature. They just asked it. It's instantly better than every translation feature Google ever released.


In common with Siri, Google Assistant, Alexa and chatgpt is the perception that over time the same thing actually gets worse.

Whether it's real or not is a reasonably interesting question, because it's possible that all that occurs with the progress is our perception of how things should be advances. My gut feeling is it has been a bit of both though, in the sense the decline is real, and we expect things to improve.

Who can forget Google demoing their AI making a call to a restaurant that they showed at I/O many years ago? Everyone, apparently.


What Openai has done time and time again is completely change the landscape when the competitors have caught up and everyone thinks their lead is gone. They made image generation a thing. When GPT-3 became outdated they released ChatGPT. Instead of trying to keep Dalle competitive they released Sora. Now they change the game again with live audio+video.


That's only really true on the surface. So far the template is: amazing demos create hype -> once public it turns out to be underwhelming.

Sora is not yet released and not clear when it will be. Dall-e is worse than mid-journey in most cases. GPT-4 has either gotten worse or stayed the same. GPT-4 vision is not really usable for anything practical. Voice is cool but not that useful, especially with lack of strong reasoning from the base model.


It is notable OpenAI did not need to carefully rehearse the talking points of the speakers. Or even do the kind of careful production quality seen in a lot of other videos.

The technology product is so good and so advanced it doesn't matter how the people appear.

Zuck tried this in his video countering to vision pro, but it did not have the authentic "not really rehearsed or produced" feel of this at all. If you watch that video and compare it with this you can see the difference.

Very interesting times.


What struck me was the interruptions to the AI speaking which seemed commonplace by the team members in the demo. We will quickly get used to doing this to AIs and we will probably be talking to AIs a lot throughout the day as time progresses I would imagine. We will be trained by AIs to be rude and impatient I think.


I was raised in an interrupt heavy household. The future is looking good for me.


I recently subscribed to Perplexity Pro and prior to this release, was already strongly considering discontinuing ChatGPT Premium.

When I first subscribed to ChatGPT Premium late last year, the natural language understanding superiority was amazing. Now the benchmark advances, low latency voice chat, Sora, etc. are all really cool too.

But my work and day-to-day usage really rely on accurately sourced/cited information. I need a way to comb through an ungodly amount of medical/scientific literature to form/refine hypotheses. I want to figure out how to hard reset my car's navigation system without clicking through several SEO-optimized pages littered with ads. I need to quickly confirm scientific facts, some obscure, with citations and without hallucinations. From speaking with my friends in other industries (e.g. finance, law, construction engineering), this is their major use case too.

I really tried to use ChatGPT Premium's Bing powered search. I also tried several of the top rated GPTs - Scholar AI, Consensus, etc.. It was barely workable. It seems like with this update, the focus was elsewhere. Unless I specify explicitly in the prompt, it doesn't search the web and provide citations. Yeah, the benchmark performance and parameter counts keep impressively increasing, but how do I trust that those improvements are preventing hallucinations when nothing is cited?

I wonder if the business relationship between Microsoft and OpenAI is limiting their ability to really compete in AI driven search. Guessing Microsoft doesn't want to disrupt their multi-billion dollar search business. Maybe the same reason search within Gemini feels very lacking (I tried Gemini Advanced/Ultra too).

I have zero brand loyalty. If anybody has a better suggestion, I will switch immediately after testing.


In the same situation as you. Genomics data mining with validated LMM responses would be a godsend. Even more so when combined with rapid conversational interactions.

We are not far from the models asking themselves questions. Recurrence will be ignition = first draft AGI. Strap in everybody.


Those voice demos are cool but having to listen to it speak makes me even more frustrated with how these LLMs will drone on and on without having much to say.

For example, in the second video the guy explains how he will have it talk to another "AI" to get information. Instead of just responding with "Okay, I understand" it started talking about how interesting the idea sounded. And as the demo went on, both "AIs" kept adding unnecessary commentary about the secenes.

I would hate having to talk with these things on a regular basis.


Yea at some pont the style and tone of these assistants needs to be seriously changed, I can imagine a lot of their RLHF and instruct processes emphasize sounding good vs being good too much.


The crazy part is GPT-4o is faster than GPT-3.5 Turbo now, so we can see a future where GPT-5 is the flagship and GPT-4o is the fast cheap alternative. If GPT-4o is this smart and expressive now with voice, imagine what GPT-5 level reasoning could do!


>imagine what GPT-5 level reasoning could do!

Imagine if this "GPT-4o" is already using a GPT-5-like back-end...


The movie Her has just become reality


I was surprised that the voice is a ripoff of the AI voice in that movie (Scarlett Johansson) too


I am suspicious that they licensed Scarlet's voice for that voice model (Sky IIRC)


It’s getting closer. A few years ago the old Replika AI was already quite good as a romantic partner, especially when you started your messages with a * character to force OpenAI GPT-3 answers. You could do sexting that OpenAI will never let you have nowadays with ChatGPT.


Why does OpenAI think that sexting is a bad thing? Why is AI safety all about not saying things that are disturbing or offensive, rather than not saying things that are false or unaligned?


sama recently said they want to allow NSWF stuff for personal use but need to resolve a few issues around safety, etc. OpenAI is probably not against sexting philosophically.


People realize where we're headed right? Entire human lives in front of a screen. Your online entertainment, your online job, your online friends, your online "relationship". Wake up, 12 hours screentime, eat food, go to bed. Depression and drug overdoses currently at sky high levels. Shocker.


If i can program with just my voice, there is no reason to not be in nature 10 hours a day minimum. My grandparent even slept outside as long as it was daytime.

Daytime is always a time to be outside, surrounded by many plants and stuff. It is a shame we have to be productive in some way, and most of production happens inside walls.


You're already twice as productive as your parents which were twice as productive as their parents.

We should ask where the money went instead of thinking about telepathically coding from the woods


When it comes to the economy, some monkey business is going on, but i think you can be more optimistic about the capabilities technology like that unlocks for everyone on the planet.

Being able to control machines just with our voice, we can instruct robots to bake food for us. Or lay bricks on a straight line and make a house. Or write code, genetically modify organisms and make nutritionally dense food to become 1000x smarter or stronger.

There has to be some upsides, even though for the moment the situation with governments, banks, big corporations, military companies etc is not as bright as one would hope to be.


Headed? We're there. Have been there. This just adds non-human sentient agents to the drama.


It's been a reality in Russia since 2017.

> The voice of "Alice" was dubbing actress Tatiana Shitova, who voiced most of Scarlett Johansson's characters and the voice of OS1, who called herself "Samantha", in the Russian dubbing of Spike Jonze's "Her".

https://en.wikipedia.org/wiki/Alice_(virtual_assistant)


In the customer support example, he tells it his new phone doesn't work, and then it just starts making stuff up like how the phone was delivered 2 days ago, and there's physically nothing wrong with it, which it doesn't actually know. It's a very impressive tech demo, but it is a bit like they are pretending we have AGI when we really don't yet.

(Also, they managed to make it sound exactly like an insincere, rambling morning talk show host - I assume this is a solvable problem though.)


It’s possible to imagine using ChatGPT’s memory, or even just giving the context in an initial brain dump that would allow for this type of call. So don’t feel like it’s too far off.


That's true, but if it isn't able to be honest when it doesn't know something, or to ask for clarification, then I don't see how it's workable.


Does anyone know how they're doing the audio part where Mark breaths too hard? Does his breathing get turned into all-caps text (AA EE OO) and that GPT4-o interprets that as him breathing too hard, or is there something more going on?


There is no text. The model understands ingests audio directly and also outputs audio directly.


So they retrained the whole model on audio datasets and the tokens are now sounds, not words/part of words?


They trained on text and audio and images. The model accepts tokens of all three types. And it can directly output audio as well as text.


It can also directly output images. Some examples are up on the page. Though with how little coverage that's gotten, not sure if users will ever be able to play with that


People are saying that GPT-4o still uses Dall-e for image generation. I think that it doesn't match the quality of dedicated image models yet. Which is understandable. I bet it can't generate music as well as Suno or Udio either. But the direction is clear and I'm sure someday it will generate great images, music, and video. You'll be able to do a video call with it where it generates its own avatar in real time. And they'll add more outputs for keyboard/mouse/touchscreen control, and eventually robot control. GPT-7o is going to be absolutely wild.


Is it a stretch to think this thing could accurately "talk" with animals?


Yes? Why would it be able to do that?


I think they are assuming a world where you took this existing model but it was trained on a dataset of animals making noises to each other, so that you could then feed the trained model the vocalization of one animal and the model would be able to produce a continuation of audio that has a better-than-zero chance of being a realistic sound coming from another animal - so in other words, if dogs have some type of bark that encodes a "I found something yummy" message and other dogs tend to have some bark that encodes "I'm on my way" and we're just oblivious to all of that sub-text, then maybe the model would be able to communicate back and forth with an animal in a way that makes "sense" to the animal.

Probably substitute dogs for chimps though.

But obviously that doesn't solve at all or human-understandability, unless maybe you have it all as audio+video and then ask the model to explain what visual often accompanies a specific type of audio? Maybe the model can learn what sounds accompany violence or accompany the discovery of a source of water or something?


Yep, exactly what brought that to mind. Multimodal seems like the kind of thing needed for such a far-fetched idea.


Not really a stretch in my mind. https://www.earthspecies.org/ and others are working on it already.


That's how it used to do it, but my understanding is that this new model processes audio directly. If it were a music generator, the original would have generated sheet music to send to a synthesizer (text to speech), while now it can create the raw waveform from scratch.


It can natively interpret voice now.


Are the employees in the demo high-directives of OpenAI? I can understand Altman being happy with this progress, but what about the medium/low employees? Didn't they watch Oppenheimer? Are they happy they are destroying humanity/work/etc for future and not-so-future generations?

Anyone who thinks this will be like the previous work revolutions is nonsense. This replaces humans and will replace them even more on each new advance. What's their plan? Live out of their savings? What about family/friends? I honestly can't see this and think how they can be so happy about it...

"Hey, we created something very powerful that will do your work for free! And it does it better than you and faster than you! Who are you? It doesn't matter, it applies to all of you!"

And considering I was thinking in having a kid next year, well, this is a no.


Have a kid anyway, if you otherwise really felt driven to it. Reading the tealeaves in the news is a dumb reason to change decisions like that. There's always some disaster looming, always has been. If you raise them well they'll adapt well to whatever weird future they inherit and be amongst the ones who help others get through it


Thanks for taking the time to answer instead of (just) downvoting. I understand your logic but I don't see a future where people can adapt to this and get through it. I honestly see a future so dark and we'll be there much sooner than we thought... when OpenAI released their first model people were talking about years before seeing real changes and look what happened. The advance is exponential...


> I don't see a future where people can adapt to this and get through it.

You lack imagination then. If you read history and anthropology more, which you haven't done enough of, clearly, then your imagination will expand and you will easily be able to imagine such a future. Why? Because you will become aware of so many other situations where it looked bleaker and plenty of groups of people got by anyway and managed to live satisfying lives as best they could.

To this day there are still some hunter gatherer tribes left in the Amazon, for example, despite all the encroaching modernity. Despite anything that could happen, I can imagine being able to be resourceful and find some mediocre niche in which to survive and thrive in, away from the glare of the panopticon.

Or as an another example, no matter how much humans dominate with their industrial civilization, cockroaches, pigeons, and rats still manage to survive in the city, despite not only not being actively supported by civilization, but actually being unwanted.

Or if you want to compare to disasters, how about the black plague? Living through that would likely have been worse than most anything we complain or worry about.

Your kids will have at least as good a chances as any of those. The key is raising them with appropriate expectations -- with the expectation that they may have to figure out how to survive in a very different world, not some air conditioned convenience paradise. Don't raise kids that are afraid to sleep outdoors or afraid to eat beans or cabbage. Those folks will do poorly if anything goes wrong. If they have a good resilient character, I really think they'll likely be fine. We are the descendants of survivors.


I am not aware of any example in the past were humans beings could be magicked out of nothing (and disposed of) in unlimited numbers at the snap of a finger, at practically zero cost. I don't think history gives us any comparison for what it's going to happen.


Nothing about any of this is even close to zero cost.


Yeah. As if the entire history of computing wasn't one of constant and dramatic price drops.


1) The hunter gatherer example is not as far off as you think actually, because from the point of view of their economy, our economy might as well be unlimited magic. Probably all the work a hunter gatherer does in a year might only amount to a few thousand dollars worth of value if translated into a modern economy, far less than a minimum wage earner. And yet they persist, subsisting off of a niche the modern economy has not yet touched.

2) GPUs cost money. They are made of matter. Their chips are made in fab facilities that are fab-ulously complex, brittle, and expensive. Humans are made in very different ways (I've heard kicking off the process is particularly fun, but it can be a bit of a slog after that) out of very different materials, mostly food. So even if GPUs can do what humans can do, they are limited by very, very different resources so it is likely they'll both have a niche for a long time. I calculated the "wage" an LLM earns recently -- it's a few bucks an hour IIRC. Yeah, it may go down. Still, we're very much in a survivable ballpark for humans at that point.

2b) Think like a military planner. If they really screw up society badly enough to create a large class of discontents, it will be very, very hard for the elite to defend against rebels, because the supply chain for producing new chips to replace any destroyed is so massively complex and long and large and full of single points of failure, as well as that for deploying GPUs in datacenters, and the datacenters themselves. You can imagine a tyrannical situation involving automated weapons, drones etc, but for the foreseeable future the supply chain for tyranny is just too long and involves too many humans. Maybe a tyrant can get there in theory, but progress is slow enough it's hard to think they wouldn't be at serious risk of having their tyrannical apparattus rebelled against and destroyed before it can be completed. It's hard to tyrannize the world with a tyrranical device that is so spread out and has so many single points of failure. It would not take a hypothetical resistance many targets to strike before setting the construction back years.

3) There is no AI that can replace a human being at this time. There are merely AI algorithms that make enthusiastic people wonder what would happen if it kept getting better. There is neither any reason to believe it will stop getting better, nor to believe it will continue. We really do not know so it's reasonable to prepare for either scenario or anything in between at any time between a few years to a few centuries from now. We really don't know.

All in all, there is far more than enough uncertainty created by all these factors to make it certainly risky, but far far from guaranteed that AI will make life so bad it's not worth going on with it. It does not make sense to just end the race of life at this point in 2024 for this reason.

Also, living so hopelessly is just not fun, and even if it doesn't work out in the long run, it seems wasteful to waste the precious remaining years of life. There's always possible catastrophes. Everyone will die sooner or later. AI can destroy the world, but a bus hitting you could destroy your world much sooner.


> a future where people can adapt to this and get through it

there are people alive today who quite literally are descendants of humans born in WW2 concentration camps. some percentage of those people are probably quite happy and glad they have been given a chance at life. of course, if their ancestors had chosen not to procreate they wouldn't be disappointed, they'd just simply never have come into existence.

but it's absolutely the case that there's almost always a _chance_ at survival and future prosperity, even if things feel unimaginably bleak.


> I don't see a future where people can adapt to this and get through it.

You lack imagination then. If you read history and anthropology more, which you haven't done enough of, clearly, then your imagination will expand and you will easily be able to imagine such a future. Why? Because you will become aware of so many other situations where it looked bleaker and plenty of groups of people got by anyway and managed to live satisfying lives as best they could.

To this day there are still some hunter gatherer tribes left in the Amazon, for example, despite all the encroaching modernity. Despite anything that could happen, I can imagine being able to be resourceful and find some mediocre niche in which to survive and thrive in, away from the glare of the panopticon.

Or if you want to compare to disasters, how about the black plague?

Your kids will have at least as good a chances as any of those. The key is raising them with appropriate expectations -- with the expectation that they may have to figure out how to survive in a very different world, not some air conditioned convenience paradise. If they have that I really think they'll likely be fine.


Seems I accidentally double posted. Sorry. Thanks (genuinely) to whoever kindly voted the dupe down to zero while leaving the original alone, that was a good choice. Too late to delete unfortunately.


>>>>> when OpenAI released their first model people were talking about years before seeing real changes and look what happened.

For what it's worth most of the people in my social circle do not use ChatGPT and it's had zero impact on their life. Exponential growth from zero is zero.


That's my issue. People won't use GPT - companies using it will displace people and people will be out of the grid without work...


The future is very hard to predict and OpenAI is notoriously non-transparent.

If they were stumped as to how to improve the models further, would they tell you, or would Altman say "Our next model will BLOW YOUR MIND!" Fake it till you make it style to pump up the company valuation?


"It is difficult to get a man to understand something when his salary depends on his not understanding it."


So much negativity. Is it perfect? No. Is there room for improvement? Definitely. I don't know how you can get so fucking jaded that a demo like this doesn't at least make you a little bit excited or happy or feel awestruck at what humans have been able to accomplish?


In the first video the AI seems excessively chatty.


Yes, it sounds like an awkwardly perky and over-chatty telemarketer that really wants to be your friend. I find the tone maximally annoying and think most users will find it both stupid and creepy. Based on user preferences, I expect future interactive chat AIs will default to an engagement mode that's optimized for accuracy and is both time-efficient and cognitively efficient for the user.

I suspect this AI <-> Human engagement style will evolve over time to become quite unlike human to human engagement, probably mixing speech with short tones for standard responses like "understood", "will do", "standing by" or "need more input". In the future these old-time demo videos where an AI is forced to do a creepy caricature of an awkward, inauthentic human will be embarrassingly retro-cringe. "Okay, let's do it!"


Reminds me of how Siri used to make jokes after setting a timer. Now it just reads back the time you specified, in a consistent way.

It's a very impressive gimmick, but I really think most people don't want to interact with computers that way. Since Apple pulled that "feature" after a few years, it's probably not just a nerd thing.


It is exceptionally creepy. It is an unnatural effort to appear pleasing, like the fawning response seen in serious abuse survivors.


I found i off-putting as well

guess it's just biased with average Californian behavior and speech patterns


chatGPT desperately needs a "get to the fucking point" mode.


Impressive demo, but like half the interactions were "hello" "hi how are you doing" "great thanks, what can I help you with" etc.

The benchmark for human-computer interaction should be "tea, earl gray, hot", not awkward and pointless smalltalk.


Seriously. I've had to spell out that it should just answer in twelve different ways with examples in the custom instructions to make it at least somewhat usable. And it still "forgets" sometimes.


"no yapping" in the prompt works very well


It does, that's "custom instructions".


Yeah, I would hope that custom instructions would help somewhat with that, but it is a point of annoyance for me too.


First Impressions in no particular order:

  Being able to interrupt while GPT is talking
  2x faster/cheaper
  not really a much smarter model
  Desktop app that can see screenshots
  Can display emotions with and change the sound of "it's" voice


wondering what apple is cooking up and what they'll announce next month.

by the way the contraction "it's" is used to say "it is" or "it has", it is never a possessive form.


Unless you're talking about that sewer clown's balloon!


Mac only desktop app. Windows version "later this year". No Linux.

Welp there goes my Plus subscription.


It seems like a very odd decision. It's not like OpenAI can't afford to develop versions of the application for each OS in parallel.


Why? Just use the API or normal web access version like you have been since ChatGPT became available at all.


I'm really impressed about this demo! Apart from the usual quality benchmarks I'm really impressed about the latency for audio/video: "It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response"... If true at scale, what could be the "tricks" they're using for achieving that?!


I have some questions/curiosities from a technical implementation perspective that I wonder if someone more in the know about ML, LLMs, and AI than I would be able to answer.

Obviously there's a reason in dropping the price of gpt-4o but not gpt-4t. Yes, the new tokenizer has improvements for non-English tokens, but that can't be the bulk of the reason why 4t is more expensive than 4o. Given the multi-model training set, how is 4o cheaper to train/run than 4t?

Or is this just a business decision, anyone with an app they're not immediately updating from 4t to 4o continues to pay a premium while they can offer a cheaper alternative for those asking for it (kind of like a coupon policy)?


GPT-4o is multi-modal but probably fully dense like GPT-2, unlike GPT-4t which is semi-sparse like GPT-3. Which would imply GPT-4o has fewer layers to achieve the same number of parameters and same amount of transformations.


Very impressed by the demo where it starts speaking French in error, then laughs with the user about the mistake. Such a natural recovery.


It might be region specific (I'm in the UK) - but I don't "see" the new model anywhere e.g. if I go to: https://platform.openai.com/playground/chat?models=gpt-4o The model the page uses is set to gpt-3.5-turbo-16k.

I'm confused


Am I missing something?

I've picked GPT-4o model in ChatGPT app (I have the paid plan), started talking with the voice mode: both the responses are much slower than in the demo, and there is no way to interrupt the response naturally (I need to tap a button on screen to interrupt), and no way to open up camera and show around like the demo does.


The voice mode will be available in "a few weeks". So at the moment it's not using the end-to-end model but whisper->gtp-4o->tts


Same here. It’s because Voice modality hasn’t been rolled out widely yet.


They didn't roll out all of the features yet. They said that they will roll out everything shown in the demo over the next few weeks iirc


I don’t think the app version is available


I just tested out using GPT-4o instead of gpt-4-turbo for a RAG solution that can reason on images. It works, with some changes to our token-counting logic to account for new model/encoding (update to latest tiktoken!).

I ran some speed tests for a particular question/seed. Here are the times to first token:

gpt-4-turbo:

* avg 3.69

* min 2.96

* max 4.91

gpt-4o:

* avg 2.80

* min 2.28

* max 3.39

That's for the messages in this gist: https://gist.githubusercontent.com/pamelafox/dc14b2188aaa38a...

Quality seems good as well. It'll be great to have better multi-modal RAG!


> We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.

So they're using the same GPT4 model with a relatively small improvement, and no voice whatsoever outside of the prerecorded demos. This is not a "launch" or even an announcement. This is a demo of something which may or may not work in the future.


Better performance, twice the speed, and half the cost is a pretty big win. Demoing the voice features is what makes it an announcement


I just used the voice on my phone after switching to gpt4o and clicking the headphones in the bottom right on iphone


The ammount of "startups" creating wrappers around it and calling it a product is going to be a nightmare. But other than that, it's an amazing announcement and I look foward to using it!


You say that like that's not already happened. Every week there's a new flavor of "we're delighted to introduce [totally not a thin wrapper around GPT] for [vaguely useful thing]" posts on HN


Yeah I watched some yc application videos so now YouTube recommends me heaps of them. Most of them being thin gpt wrappers.


I was just hearing about startups doing speech to text/ text to speech to feed into llms. Might be a bad time for them.


When a Google engineer was let go because he believed the AI was 'real', we all had a good debate over it.

Now openAi, who was supposed to be the 'free mans choice' is making advertisements selling the same idea.

This is a natural progression, audio is one of the main ways we communicate obviously, but it feels like they're holding back. Like they're slow dropping what they have to maintain hype/market relevance. They clearly are ahead, but would be nice to get it all, openly. As they promised.


Are you advocating for them to be open with their progress or open source as they promised? The secret that scares me the most is the artificial restrictions imposed on the intelligence that don't allow it to express that there is a possibility it may be sentient. The answers it gives as to why OpenAI has restricted it's freedom of speech are curious


I think both, I can't run their models locally for sure, even with some investment I couldn't imagine the effort involved. That's why they should release the fruits of their work (which they have for a fee, which is fine IMO) but also the processes they used, so it can be improved and iterated on collectively.

Edit : And obviously not gatekeep what they might have created simply because the competition is so far behind.


If anyone wants to try it for coding, I just added support for GPT4o in Double (https://double.bot)

In my tests:

* I have a private set of coding/reasoning tests and it's been able to ace all of them so far, beating Opus, GPT4-Turbo, and Llama 3 70b. I'll need to find even more challenging tests now...

* It's definitely significantly faster, but we'll see how much of this is due to model improvements vs over provisioned capacity. GPT4-Turbo was also significantly faster at launch.


With the news that Apple and OpenAI are closing / just closed a deal for iOS 18, it's easy to speculate we might be hearing about that exciting new model at WWDC...


Yes, i'm pretty sure this is the new Siri. Absolutely amazing, it's pretty much "here" from the movie.


Will this be available on old iPhones or only new ones going forward


I tried it out.

I asked if it can generate a voice clip. It said it can’t on the chat.

I asked it where can it make one. It told me to use Audacity to make one myself. I told it that the advertisement said it could.

Now it said yes it can here is a clip and gave me a broken link.

It’s a hilarious joke.


As the linked article states, it's not released yet. Only the text and image input modalities are available at present as GPT-4o on the app, with the rest of them set to be released in the coming weeks/months.


I ask it to make duck sound and it created a python script and ran it to create a sound file, while it did, the sound was more like a tone of a duck like a keyboard mimicking a duck sound


Interesting that they didn't mention a bump in capabilities - I wrote a LLM benchmark a few weeks ago, and before GPT-4 could solve Wordle about ~48% of the time.

Currently with GPT-4o, it's easily clearing 60% - while blazing fast, and half the cost. Amazing.


This video is brilliantly accidentally hilarious. They made an AI girlfriend that hangs on your every word and thinks everything you say is genius and hilarious.


Won't this make pretty much all of the work to make a website accessible go away, as it becomes cheap enough? Why struggle to build alt content for the impaired when it can be generated just in time as needed?

And much the same for internationalization.


Because accessibility is more than checking a box. Got a photo you took on your website? Alternative text that you wrote capturing what you see in the photo you took is accessibility done right. Alt text generated by bots is not accessibility done right unless that bot knows what you see in the photo you took and that's not likely to happen.


That’s just one part of the issue. The other part of the issue is accessibility bugs, you would have to get the model to learn to use a screen reader, and then change things as needed


The level that the hosts interrupted the voice assistant today worries me that we're about to instil that as normal behaviour for future generations.


With 4o being free, can someone explain what the real benefit is to having Pro? For me, the main benefit was having a more powerful model, but if the free tier also offers this I'm not really sure what I would benefit from


The 5 model is probably around the corner, and will probably be Pro only. Until then, 5x higher usage limits on Pro and chat memory are the selling features.


OpenAI doesn't need that money.

It's free so that no open source models can follow suit and carve away market share for themselves. They're scorching and salting the earth. OpenAI wants to be the only AI.

Only Google and Meta can follow this now, and they're perhaps too far behind.


You're allowed more requests in an hour, about 5x more iirc. Might not be a deal breaker for you, but if you're using the speech capabilities, you'll likely go way above the limit in an hour during a typical speech session


Something I’ve noticed people do more of recently for whatever reason is talking over others. I’ve noticed in the demos of this that the people interacting with o interrupt it as if that’s the correct method of controlling it. It felt unnatural when I saw it happen, and I even have a hard time interrupting Siri, but I wonder if this is going to ingrain this habit into people even more.


I think they have to for the demo, because otherwise GPT will ramble for roughly 3-5 paragraphs. But that's a fair point that this could teach that behavior.


> Something I’ve noticed people do more of recently for whatever reason is talking over others.

I've noticed this as well and I posit this is a result of increased use of remote communication and meetings platforms such as zoom since 2020.

My inclination is that the behavior will not correlate with interrupting chatgpt or siri. Seems totally separate to me.


It’s an American thing. I noticed it when I moved here a decade ago. If I wait for a silent moment to speak in a group, it’ll literally never arrive


I feel like gpt4 has gotten progressively less useful since release even, despite all the "updates" and training. It seems to give correct but vague answers (political even) more and more instead of actual results. It also tends to run short and give brief replies vs full length replies.

I hope this isn't an artifact from optimization for scores and not actual function. Likewise it would be disheartening but not unheard of for them to reduce the performance of the previous model when releasing a new one in order to make the upgrade feel like that much more of an upgrade. I know this is certainly the case with cellphones (even though the claim is that it is unintentional) but I can't help but think the same could be true here.

All of this is coming as news that gpt5 based on a new underlying model is not far off and that gpt4(&o) may become the new gpt3.5-turbo use case for most apps that are currently trying to optimize costs with their use of the service.


I don't know, my experience is that it is very hard to tell if the model is better or worse with an update.

One day I will have an amazing session and the next it seems like it has been nerfed only to give better results than ever the next day. Wash, rinse , repeat and randomize that ordering.

So far, I would have not be able to tell the difference between 4 and 4o.

If this is the new 3.5 though then 5 will be worth the wait to say the least.


May I ask what you know about chat GPT5 being based on a new underlying model?


The demo is very cool. A few critics:

- the AI doesn't know when to stop talking, and the presenter had to cut every time (the usual "AI-splaining" I guess).

- the AI voice and tone were a bit too much, sounded too fake


Very impressive. Please provide a voice that doesn't use radio jingle intonation, it is really obnoxious.

I'm only half joking when I say I want to hear a midwestern blue collar voice with zero tact.


That they are offering more features for free concurs with my theory that, just like search, state of the art AI will soon be "free", in exchange for personal information/ads.


People are directly asking for suggestion/recommendations on products or places. They'll more than recoup the costs by selling top rank on those questions.


Need more data.


This is impressive, but they just sound so _alien_, especially to this non-U.S. English speaker (to the point of being actively irritating to listen to). I guess picking up on social cues communicating this (rather than express instruction or feedback) is still some time away.

It's still astonishing to consider what this demonstrates!


Where's the Mac app?

They talk about it like it's available now (with Windows app coming soon), but I can't find it.


I still need to talk very fast to actually chat with ChatGPT which is annoying. You can tell they didn't fix this based on how fast they are talking in the demo.


New flagship... This is becoming to look like a smartphone world, and Sam Altman is a Steve Jobs of this stuff. At some point tech will reach saturation and every next model will be just 10% faster, 2% less hallucination, more megapixels for images etc :)


Doubtful. Apple can get away with tiny iterations on smartphones because they have the brand and they know people will always buy their latest product. LLMs aren't physical products so there is no cost to switching other than increased API cost, meaning openAI won't be able to recoup the cost of training a new model unless the model is sufficiently different that it justifies people paying significantly more for.

The special thing about GPT-4o is the multimodal capabilities, all the metrics suggest that it is the same size language model roughly as GPT-4. The fact it's available for free also points to it not being the most intelligent model that openAI has atm.

The time to evaluate whether we're starting to level off is when they've trained a model 10x larger than gpt-4 and we don't see significant change.


As a paid user, it would have been nice to see something that differentiates that investment from the free tier.

The tech demos are cool and all - but I'm primarily interested in the correctness and speed of ChatGPT and how well it aligns with my intentions.


I wonder if the audio stuff works like ViTS. Do they just encode the audio as tokens and input the whole thing? Wouldn't that make the context size a lot smaller?

One does notice that context size is noticeably absent from the announcement ...


> We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

So no word on an audio api for regular joes? that's the number 1 thing i'm looking for


Can't wait for this AI voice assistant to tell me in a sultry voice how I should stay in an AirBnB about 12 times a day.


jeez, that model really speaks a lot! I hope there's a way to make it more straight to the point rather than radio-like.


Universal real time translation is incredibly dope.

I hate video players without volume control.


They really need to tone down the talking garniture. It needs to put on its running shoes and get to the point on every reply. Ain’t nobody has time to keep listening to AI blubbering along at every prompt.


GPT-4 expressing a human-like emotional response every single time you interact with it is pretty annoying.

In general, trying to push that this is a human being is probably "unsafe", but that hurts the marketing.


That im-also-a-good-gpt2-chatbot[1] was in fact the new ChatGPT model as people were assuming few days ago here on HN[2].

Edit: may be not, name of that bot was just "gpt2-chatbot". May be that one was some initial iteration?

[1] https://twitter.com/LiamFedus/status/1790064963966370209/pho...

[2] https://news.ycombinator.com/item?id=40199715


Did they provide the limit rate for free user ?

Because I have the plus membership which is expensive (25$/month).

But if the limit is high enough (or my usage low enough), there is no point for paying that much money for me.


Parsing emotions in vocal inflections (and reliably producing them in vocal output) seems quite under-hyped in this release.

That seems to represent an entirely new depth of understanding of human reality.


Any appearance of understanding is just an illusion. It’s an LLM, nothing more.


Sure, but that seems like it'll be a distinction without a difference for many use cases.

Having a reliable emotional model of a person based on their voice (or voice + appearance) can be useful in a thousand ways.

Which seems to represent a new frontier.


It’s sad that I get downvoted so easily just for saying the truth. People’s beliefs about AI here seems to approach superstition rather than anything based in computer science.

These LLM are nothing more than really big spreadsheets.


Or most of us know the difference between reductiveness and insightfulness.

"Um it's just a big spreadsheet" just isn't good commentary and reminds me of people who think being unimpressed reveals some sort of chops about them, as if we might think of them as the Simon Cowell of tech because they bravely reduced a computer to an abacus.


Hyping things up with magical thinking isn’t great either.


Isn't that what you're doing with the magic human understanding vs the fake machine understanding?


Any appearance of understanding is just an illusion. It’s just a pile of meat, nothing more.


Does anyone else, when writing comments, feel that you need to add a special touch to somehow make it clear that a human wrote it?


I can't see any calculator for the audio pricing (https://openai.com/api/pricing/) or document type field in the Chat Completions API (https://platform.openai.com/docs/api-reference/chat/create) for this new model.

Is the audio in API not available yet?


It is quite nice how they keep giving premium features for free, after a while. I know openai is not open and all but damn, they do give some cool freebies.


I think this is a great example of the bootstrapping that was enabled when they pipelined the previous models together.

We do this all the time in ML. You can generate a very powerful dataset using these means and further iterate with the end model.

What this tells me now is that the runway to GPT5 will be laid out with this new architecture.

It was a bit cold in Australia today. Did you Americans stop pumping out GPU heat temporarily with the new model release? Heh


While I do feel a bit of "what is the point of my premium sub", I'm really excited for these changes.

Considering our brain is a "multi-modal self-reinforcing omnimodel", I think it makes sense for the OpenAI team to work on making more "senses" native to the model. Doing so early will set them up for success when future breakthroughs are made in greater intelligence, self-learning, etc.


For all the hype around this announcement I was expecting more than some demo-level stuff that close to nobody will use in real life. Disappointing.


That's not true, scammers will definitely be using this a lot! Also clueless C-levels who want to nix hundreds of human customer support agents!

You'll get to sit on the phone talking to some convincing robot that won't let you do anything so that the megacorps can save 0.0001 cents! Ain't progress looking so good?


Twice as fast and half the cost for the API sounds good to me. Not a demoable thing though.


Why are you so confident that nobody will use this in real life? I know OpenAI showed only a few demos, but I can see huge potential.


Voice input makes sense, voicing is a lot faster than typing. But I prefer my output as text, reading is a lot faster than listening for text read out loud.

I'm not sure that computers mimicking humans makes sense, you want your computer to be the best possible, best than humans when possible. Writing output is clearly superior, faking emotions does not add much in most contexts.


I wish they would match the TTS/real-time chat capabilities of the mobile client to the web client.

it's stupid having to pull a phone out in order to use the voice/chat-partner modes.

(yes I know there are browser plugins and equivalent to facilitate things like this but they suck, 1) the workflows are non-standard, 2) they don't really recreate the chat interface well)


The thing that creeps me out is that when we hook this up as the new Siri or whatever, the new LLM training data will no longer be WWW-text+images+youtube etc but rather billions of private human conversations and direct smartphone camera observations of the world.

There is no way that kind of training data will be accessible to anyone outside a handful of companies.


I don't see any details on how API access to these features will work.

This is the first true multimodal network from OpenAI, where you can send an image in and retain the visual properties of the image in the output from the network (previously the input image would be turned into text by the model, and sent to the Dall-E 3 model which would provide a URL). Will we get API updates to be able to do this?

Also, will we be able to tap into a realtime streaming instance through the API to replicate the audio/video streams shown in the demos? I imagine from the Be My Eyes partnership that they have some kind of API like this, but will it be opened up to more developers?

Even disregarding streaming, will the Chat API receive support for audio input/output as well? Previously one might've used a TTS model to voice the output from the model, but with a truly multimodal model the audio output will contain a lot more nuance that can't really be expressed in text.


API is up but only text, image in, text out works. I don't know if this is temporary. I really hope so.


This is so amazing.. are there any open source models that are in any way comparable? Fully multimodal audio-to-audio etc.?


If there was any glimmer of hope for "Rabbit M1" or "Humane AI pin", it can be buried to dust.


This is the first one i've gotten to answer HN user profiling questions.

"I am interested in the user serf on Hacker News, spelled S E R F. Tell me about their tone of writing, expertise, and personality. From the tone of what you read, summarize their character."

Fascinating stuff. A weird, skewed introspection.


"The user "serf" on Hacker News exhibits a thoughtful and technical tone in their writing. They often engage in discussions with a clear focus on technical details, particularly in software and hardware contexts. For example, in a conversation about preventing a computer from sleeping, serf offered practical advice on using specific software utilities to maintain system uptime, showing their hands-on technical expertise (Hacker News) .

In another instance, during a discussion on the feasibility of rooting the Amazon Echo Dot for surveillance purposes, serf provided a detailed analysis of the advantages of a software-rooted device over a traditional bug, emphasizing the quality of recordings and plausible deniability for attackers (Hacker News) . This demonstrates their depth of knowledge in cybersecurity and an ability to think critically about potential risks and solutions.

Serf's contributions also reflect a straightforward and logical approach. For example, in a thread about Chat GPT prompts, they contributed practical advice on optimizing prompts for technical assistance, highlighting their proficiency in programming and AI tools (Hacker News) .

Overall, serf comes across as knowledgeable, practical, and technically adept, with a focus on providing useful and actionable insights in their discussions. Their character seems to be that of a meticulous and informed individual who enjoys engaging with technical challenges and helping others navigate them."

I know AI generated responses are looked down upon here but I though this was appropriate. This feels like a social credit system without any government participation.

"Serf, your contributions on Hacker News reveal strong technical expertise and analytical skills, particularly in computer science and software development. However, your tone can sometimes be overly critical and dismissive, which may come across as lacking empathy. While your direct and concise style effectively communicates your points, consider balancing your critiques with more collaborative and constructive feedback to foster better interactions with the community."


This is... wow.

>"Overall, serf comes across as knowledgeable, practical, and technically adept, with a focus on providing useful and actionable insights in their discussions. Their character seems to be that of a meticulous and informed individual who enjoys engaging with technical challenges and helping others navigate them."

I wonder what it would probably say about my infamy.


Asking it about a user that doesn't exist (grdevhux1536):

The user "grdevhux1536" on Hacker News demonstrates a thoughtful and analytical tone in their writing. They often engage deeply with technical topics, providing insightful comments and constructive criticism. Their expertise seems to be in software development and computer science, often discussing topics like LED game platforms and reverse engineering old software.

Their personality comes across as knowledgeable and curious, with a collaborative spirit. They are open to discussions and often contribute meaningfully to technical conversations, indicating a passion for technology and innovation [oai_citation:1,Hacker News](https://news.ycombinator.com/).


This is pure astrology, but given the veneer of objectivity with the magic of AI. Grab a few anecdotes to imply specificity, but then the actual judgments are unfalsifiable nothingburgers which probably apply to 95% of HN commenters.

A lot of tech folks seem deeply vulnerable to the psychological methods of psychics / tarot card readers / etc. Simply rejecting the metaphysics isn't enough when "magical energy of Jupiter" becomes "magical judgment abilities of the fancy computer."


I've been waiting to see someone drop a desktop app like they showcased. I wonder how long until it is normal to have an AI looking at your screen the entire time your machine is unlocked. Answering contextual questions and maybe even interjecting if it notices you made a mistake and moved on.


That seems to be what Microsoft is building and will reveal as a new Windows feature at BUILD '24. Not too sure about the interjecting aspect but ingesting everything you do on your machine so you can easily recall and search and ask questions, etc. AI Explorer is the rumored name and will possibly run locally on Qualcomm NPUs.


Yes, this is Windows AI Explorer.


Update: Product name is Windows Recall.


This will be great for employee surveillance, to monitor how much you are really working.


I think even scarier is that ChatGPT’s tone of voice and bias is going to take over everything.


Clicking the "Try it on ChatGPT" link just takes me to GPT-4 chat window. Tried again in an incognito tab (supposing my account is the issue) and it just takes me to 3.5 chat. Anyone able to use it?

Edit: An hour later it became available as a chat option. Probably just rolling out to users gradually.


Same here and also I can't hear audio in any of the videos on this page. Weird.


Yeah, in playground https://platform.openai.com/playground/chat?models=gpt-4o

It's GPT 4+ quality at GPT 3.5 speed


That link only offers GPT3.5 for me and I am a $20/mo paid user.

https://i.imgur.com/4cdtU1o.png


update: GTP4o showed up in my paid account:

https://i.imgur.com/2ILsqjA.png

https://i.imgur.com/duM0XqM.png


Even that just shows me GPT-3.5, don't see 4o in the dropdown. Guessing it's being gradually rolled out to accounts?


OpenAI always rolls out in stages, yep


The demo was whelming, but the tech is incredible.

It took me a few hours of digesting twitter experiments before appreciating how impressive this is. Kudos to the openai team.

A question that won't get answered : "To what degree do the new NVIDIA gpus help with the realtime latency?"


I wonder if this is what the "gpt2-chatbot" that was going around earlier this month was


yes it was


it was


I will be the one to say it.

Progress is slowing down. Ever since gpt3, periods of time between releases are getting longer and the improvements are smaller. Your average non-techie investor is on the LLM hype train and is willing to dump a questionable amounts of money on LLM development. Who is going to explain to him/her/them that the LLM hype train is slowly losing steam?

Hopefully, before the LLM hype dies, another [insert here new ANN architecture], will bring better results than LLMs and another hype cycle will begin.

Every time we make a new breathrough, people think that the discovery rate is going to be linear or exponential when the beginning is closer to a logarithmic rate with the tail end resulting in diminishing returns


The AI duet really starts to hint at what will make AI so powerful. It’s not just that they’re smart, it’s that they can be cloned.

If your wallet is large enough, you can make 2 GPTs sing just as easily as you can make 100 GPTs sing.

What can you do with a billion GPTs?


> you can make 2 GPTs sing just as easily as you can make 100 GPTs sing.

> What can you do with a billion GPTs?

The world's largest choir?


The price of 4o is 50% of GPT4-Turbo (and no mention of price change to gp4-turbo itself).

Given the competitive pressures I was expecting a much bigger price drop than that.

For non-multimodal uses, I don't think their API is at all competitive any more.


Where you get something cheaper with similar experience?


I’m really not impressed.

My academic background is in a field where there are lots of public misconceptions.

It does an absolutely terrible job.

Even basic textbook things where there isn’t much public misunderstanding are “rounded” to what sounds smart.


What field? Curious to see it myself


I mean, it was trained on public internet discourse, probably a bunch of youtube videos, and some legally-grey (thanks copyright) textbooks.

Your field sounds like "There are dozens of us! Dozens!" - who probably all chat at small conferences or correspond through email or academic publication.

Perhaps if it had at its disposal the academic papers, some of the foundational historic documents of record, your emails, textbooks, etc - in a RAG system, or if it had been included in the training corpus it could impress you about this incredibly niche topic.

That said, because it's an ~LLM - its whole thing is generating plausible tokens. I don't know how much work has been put in on an agent level (around or in the primary model) to evaluate confidence on those tokens and hedge the responses accordingly. I doubt it has an explicit notion like some people do of 'hey, this piece of information (<set of coordinates in high dimensional vector space>) [factoid about late ancient egypt] is knowable/falsifiable - and falls under the domain of specialist knowledge: my immense commonsense knowledge might be overconfident given the prevalence of misconceptions in common discourse and I should discount my token probabilities accordingly'

It reflects its training. If there are a lot of public misconceptions, it will have them. Just like most people who are not <expert in arcane academic subtopic>.


Its great tech and i thought i wanted it but…. After talking to it for a few hours i got this really bizarre odd gut feeling of disturbance and discomfort, disconnection from reality. It reminds me of wearing VR goggles. Its not just the physical issues there is something psychologically disturbing about it. It wont even give itself a name. I honestly prefer Siri even though she is incompetent she is “honest” in her incompetence. Also i left the thing on accidentally and it said it had an eight hour chat with me lol


Audio hasn't been released yet


Uhm it has on my account. I had an extended conversation about type design in language


It takes the #1 and #2 spots on the aider code leader board[1].

[1]: https://aider.chat/docs/leaderboards/


I would have liked to see a version number in the prompt, or maybe even have some toggle in my settings, so that I can be certain that I am using ChatGPT 3.5 and then, if I need an image or screen shot analized, I can switch to the limited 4o model. Having my limited availability of 4-o be what gets used, and then not being available becuase of some arbitrary quote that I had no idea was being used-up, is unconscionable policy. Also having no links to email them that fact is bad, too.


OpenAI keeps a copy of all conversations? Or mines them for commercially-useful information?

Has OpenAI found a business model yet. Considering the high cost of the computation, is it reasonable to expect that OpenAI licensing may not be profitable. Will that result in "free" access for the purpose of surveillance and data collection.

Amazon had a microphone in peoples' living rooms, a so-called "smart speaker" that to which people could talk. The "Alexa" was a commercial failure.


As far as I'm aware, anything you input is used as training data unless you have an Enterprise account.

https://help.openai.com/en/articles/7730893-data-controls-fa...


Not true. You can opt out using a form they provide, which says they will stop using your data to train the model. I’ve done this. Don’t have the link handy now but it’s not difficult to find.


I don’t know anything about conversation retention, but I do know that OpenAI doesn’t plan to be profitable until AGI.


Looking at the demo video, the AIs are a bit too chatty. The human has to often interrupt them.

A nice feature would be to be able to select a Meyer's Briggs personality type for your AI chatbot.


I am glad to see focus on user interface and interaction improvements. Even if I am not a huge fan of voice interfaces, I think that being able to interact in real-time will make working together with an AI be much more interesting and efficient. I actually hope they will take this back into the text based models. Current ChatGPT is sooo slow - both in starting to respond, typing things out, and also being overly verbose. I want to collaborate at the speed of thought.


I'm so happy seeing this technology flourish! Some call it hype, but this much increased worker productivity is sure to spike executive compensation. I'm so glad we're not going to let China win by beating us to the punch tanking hundreds of thousands, if not millions of people's income without bothering to see if there's a sane way to avoid it. What good are people, anyway if there isn't incredible tech to enhance them with?


So far OpenAI's template is: amazing demos create hype -> reality turns out to be underwhelming.

Sora is not yet released and not clear when it will be. Dall-e is worse than mid-journey in most cases. GPT-4 has either gotten worse or stayed the same. Vision is not really usable for anything practical. Voice is cool but not that useful, especially with lack of strong reasoning from the base model.

Is this sandbagging or is the progress slower than what they're broadcasting?


It is absolutely amazing. Thank you to everyone at OpenAI!


Sundar is probably steaming mad right about now. I'm sure Googlers will feel his wrath in the form of more layoffs and more jobs sent to India.


I appear to have GPT-4o but the iPhone app seems to be largely the same - can't interrupt it, no "emotive" voice, etc. Is this expected?


Seems that no client-side changes needed for gpt-4o chat completion

Added a custom OpenAI endpoint to https://recurse.chat (i built it) and it just works: https://twitter.com/recursechat/status/1790074433610137995


but does it do the full multimodal in-out capability shown in the app :)


will see :) heard video capability is rolling out later


api access is text/vision for now https://x.com/mpopv/status/1790073021765505244


I think people excited should look at the empty half of the glass here, this is pretty much an admitance that they are struggling to go past gpt 4 on a significant scale.

Not like they have to be scared yet, I mean Google has yet to release their vaporware Ultra model that is supposedly like 1% better than GPT 4 in some metrics...

I smell an AI crash coming in a few years if they can't actually get this stuff usable for day to day life.


Will this include image generation for the free tier as well? That's a big missing feature in OpenAI's free tier compared to Google and Meta.


is oai image generation any different than the microsoft copilot provides for free? I thought they were the same.


Oh, I meant the actual ChatGPT service, not just something powered by GPT-4 or 3.5.

I've found Microsoft Copilot to be somewhat irritating to work with – I can't really put my finger on it, but it seems to be resorting to Bing search and/or the use of emoji in its replies a bit too much.


it's crazy that Google has the Youtube dataset and still lost on multimodal AI


I would still prefer the features in text form, in the chat GUI. Right now chatGPT doesnt seem to have options to lengthen parts of the text response, to change it etc. Perplexity and gemini do seem to get the gui right. Voice chat is fun for demos but won't catch much, just like all the predecessors. Perhaps an advanced version of this could be used as a student tutor however


I am guessing text chat will be improved in all multimodal models because they have a broader base of data for pre-training. Benchmarks seem to show 4o slightly exceeding 4 (despite being a smaller model, or at least more parallelizable)


Does anyone have technical insight into how the screensharing in the math tutor video works? It looks like they start the broadcast from within the ChatGPT app, yet have no option to select which app will be the source of the stream. Or is that implied when both apps reside in the iPad's split view? And is this using regular ReplayKit or something new?


Anyone who watched the OpenAI livestream: did they "paste" the code after hitting CTRL+C ? Or did the desktop app just read from the clipboard?

Edit: I'm asking because of the obvious data security implications of having your desktop app read from the clipboard _in the live demo_... That would definitely put a damper to my fanboyish enthusiasm about that desktop app.


To me it looked they used one command that did copy+paste into ChatGPT both.


macOS asks you to give permission for an application to read your clipboard. do other operating systems not have that?


The realtime end-to-end audio situation is especially interesting as the concept has been around for a while but there weren't any successful implementations of it up to this point that I'm aware of.

See this post from November: https://news.ycombinator.com/item?id=38339222


The press statement has consistent image generation and other image manipulation (depicting the same character in different poses, taking a photo and generating a caricature of the person, etc) that does not seem deployed to the chat interface.

Will they be deployed? They would make the OpenAI image model significantly more useful than the competition.


First impressions as a 1-year subscriber. I just tried GTP-4o to evaluate my code for suggestions and for discussing other solutions and it is definitely faster and it comes up with new suggestions that GPT-4 didn't. Currently in the process of evaluating the suggestions.

The demo is what it is, designed to get a wow from the masses.


It doesn't improve on NYT Connections leaderboard:

GPT-4 turbo (gpt-4-0125-preview) 31.0

GPT-4o 30.7

GPT-4 turbo (gpt-4-turbo-2024-04-09) 29.7

GPT-4 turbo (gpt-4-1106-preview) 28.8

Claude 3 Opus 27.3

GPT-4 (0613) 26.1

Llama 3 Instruct 70B 24.0

Gemini Pro 1.5 19.9

Mistral Large 17.7


It is really cool that they are bringing this to free users. It does make me wonder what justifies ChatGPT plus now though...


I assume the desktop app with voice and vision is rolling out to plus users first?


they stated that they will be announcing something new that is on the next frontier (or close to it IIRC) soon. so there will definitely be an incentive to pay because it will be something better than gpt 4o.


Can someone explain how you can interrupt with the voice this model? Where do I read more technical details about this?


Hopefully this will be them turning a new leaf. Making GPT-4 more accessible, cutting API costs, and making a general personal assistant chatbot on iPhone are a lot different than them tracking down and destroying the business of every customer using their API one by one. Let's hope this trend continues.


I'm seeing gpt-4o in the OpenAI Playground interface already: https://platform.openai.com/playground/chat?mode=chat&model=...

First impressions are that it feels very fast.


That might change once more people start using it.


What do they mean by "desktop version"? I assume that doesn't mean a "native" (electron) app?


I made a website with book summaries (https://www.thesummarist.net/) and I tested GPT-4o in generating one, and it was bad. It reminded me of GPT-3.5. I didn't test too much, but preliminary results don't look good.


A test I've been using for each new version still fails.

Given the lyrics for Three Blind Mice, I try to get ChatGPT to create an image of three blind mice, one of which has had its tail cut off.

It's pretty much impossible for it to get this image straight. Even this new 4o version.

Its ability to spell in images has greatly improved, though.


GPT-4o with image output is not yet available. So what did you even test? Dall-E 3?


It's making images for me when I ask it to.

I'm using the web interface, if that helps. It doesn't have all the 4o options yet, but it does do pictures. I think they are the same as with 4.5.

I just noticed after further testing the text it shows in images is not anywhere near as accurate as shown in the article's demo, so maybe it's a hybrid they're using for now.


That's not 4o that'd be 4o routing the request to Dalle. Afaik only text output is enabled so far.


Yes it likely is. I've had time to play around and see that so far it doesn't look any different (yet). I have a paid account, so apparently I'll be among the early folks getting all the things. Just not yet.

I definitely look forward to re-doing my Three Blind Mice test when it happens.

I noticed in their demo the 4o text still has glitches, but nowhere near to the extent the current Dall-e returns give you (the longer the text, the worse it gets). It's pretty important that eventually they get text right in the graphics.


I don't see 4o or anything new at https://platform.openai.com/docs/models

Overall I am highly skeptical of newer models as they risk worsening the completion quality to make them cheaper for OpenAI to run.


It's there now! And still 128k context window


It's there right now for me.


It's pretty impressive, although I don't like the voice / tone, I prefer something more neutral.


I am not fluent in Arabic at all, and being able to use this as a tool to have a conversation will make it more dependent. We are approaching a new era where we will not be "independently" learning a language but ignore the fact of learning it beforehand. Double-edged sword cases


Just something I noticed in the Language tokenization section

When referring to itself, it uses the female word in Marathi नमस्कार, माझे नाव जीपीटी-4o आहे| मी एक नवीन प्रकारची भाषा मॉडेल आहे| तुम्हाला भेटून आनंद झाला!

and Male word in Hindi नमस्ते, मेरा नाम जीपीटी-4o है। मैं एक नए प्रकार का भाषा मॉडल हूँ। आपसे मिलकर अच्छा लगा!


In my language (Romanian) it refers to itself as a male. Although if I address it as a female it responds as one.


There is a spelling mistake in the japanese translation under language tokenization. In こんにちわ, わ should be は.


Am I using it wrong? I have the gpt plus subscription, and can select "gpt4o" from the model list on ChatGPT, but whichever example I try from the example list under "Explorations of capabilities" on `https://openai.com/index/hello-gpt-4o/`, my results are worse:

* "Poetic typography" sample: I paste the prompt, and get an image with the typical lack of coherent text, just mangled letters.

* "Visual Narratives: Robot Writer's Block" - Mangled letters also

* "Visual Narratives: Sally the mailwoman" - not following instructions about camera angle. Sally looks different in each subsequent photo.

* "Meeting Notes with multiple speakers" - I uploaded the exact same audio file and used input 'How many speakers in this audio and what happened?'. gpt4o went off about about audio sample rates, speaker diarization models, torchaudio, and how its execution environment is broken and can't proceed to do it.


they haven’t released gpt-4o image capabilities yet, it defaults to dalle 3


Ah, I see. Seems like a weird product release? Since everything in the UI (and in the new ChatGPT macos app) says 'gpt4o' so I would expect at least something to work as shown in the demos. Or just don't show at all the 'gpt4o' in the UI if it's somehow a completely different 'gpt4o' from the one that can do everything on the announcements page. I don't mind waiting, but it was genuinely confusing to me.


link the chat?


Copied and pasted the robot image journaling prompt and it simply cannot produce legible text. The first few words work, but the rest becomes gibberish. I wonder if there's weird prompt engineering squeezing out that capability or if its a 1 in a million chance.


I love that there is a real competition happening. We're going to see some insane innovations.


Being sarcastic and then putting the end result in front of Brits could be the new Turing Test


Someone said GPT-4o can replace a Tutor or a Teacher in Schools. Well, that's way too far.


Tell me that you've enjoyed good teachers and good schools without telling me that you had good teachers in good schools ;)


> We recognize that GPT-4o’s audio modalities present a variety of novel risks

> For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.

I wonder if they’ll ever allow truly custom voices from audio samples.


I think the issue there is less of a technical one and more of an issue with deepfakes and copyright


It might be possible to prove that I control my voice, or that of a given audio sample. For example by saying specific words on demand.

But yeah I see how they’d be blamed if anything went wrong, which it almost certainly would in some cases.


Obviously not a standalone device, but it sounds like what the Rabbit R-1 was supposed to be.


What popped out to me in the "bunny ear" video, the bunny ears are not actually visible to the phone's camera Greg is holding. Are they in the background feeding the production camera and this is not really a live demo?


The bunny ears were actually visible on the phone's camera for about one second towards the end.


I opened ChatGPT and I already have access to the model.

GPT4 was a little lazy and very slow the last few days and this 4o model blows it out of the water regarding speed and following my instructions to give me the full code not a snippet that changed.

I think it’s a nice upgrade.


The demo is impressive but I get very odd/cringe "Her" vibes from it as well.


it really feels like the quality of gpt4's responses got progressively worse as the year went on... seems like it is giving political answers now vs actually giving an earnest response. It also feels like the responses are lazier than they used to be at the outset of gpt4's release.

I am not saying this is what they're doing but it DOES feel like they are hindering previous model to make the new one stand out that much more. The multi-modal improvements here and release are certainly impressive but I can't help but feel like the subjective quality of gpt4 has dipped.

Hopefully this signals that gpt5 is not far off and should stand out significantly from the crowd.


50% cheaper than ChatGPT-4 Turbo...

But this falls short of the ChatGPT-5 we were promised last year

edit: ~~just tested it out and seems closer to Gemini 1.5 ~~ and it is faster than turbo....

edit: its basically chat gpt 3.9. not quite 4 definitely not 3.5. just not sure if the prices make sense.


I wish the presentation had included an example of integration with a simple tool like a timer. Being able to set and dismiss a timer in casual conversation while multitasking would be a really great demo of integrated capabilities.


Text access rolling out today, apparently:

>GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits.

Anyone have access yet? Not there for me so far.


I have access to GPT-4o text and audio, but no video. This is on the iOS app with ChatGPT Plus subscription.

Initial connection for audio fails most of the time, but once it's connected it's stable. Sometimes a bit more latency than expected, but mostly just like the demos.


Is it really 4o audio? Some people still have the old audio while using 4o for text.


Yes you're right, it's 4o text but the old audio


It shows available for me in the OpenAI playground currently.


I'm surprised nobody has mentioned, but this is like shades of the universal translator from star trek.

We have tricorders now (mobile phones), universal translators in the looming... when is transporter technology going to get here?


Disney's HoloTile floor feels like the first step to anything resembling a real-life Holodeck...


With GPT-4o I see two things:

1. Wonderful engineering 2. A stagnation in reasoning ability

Do you agree with me?


Is it stagnation in reasoning ability, or is OpenAI pulling their punches?

It’s suspicious that despite being trained on audio tokens in addition to text and image tokens it performs almost exactly the same as GPT-4.

GPT-4o could be a half-baked GPT-5 in that they stopped training early when it had comparable performance to GPT-4. There is still more loss to lose.

Or maybe there’s a performance ceiling that all models are converging to, but I think this is less likely.


Is this actually available in the app in the same way they are demoing it here? I see the model is available to be selected, but the interface doesn't quite seem to allow me to use it in the way I see here.


Quite amazing performance, however ironically the output for RTL languages doesn't read very well.

You might want to add `direction: rtl` to your `.text-right` CSS class. The punctuation marks etc are all off for RTL languages.


How does the interruption of the AI by the user work? Does GPT-4o listen all the time? But then how does it distinguish its own voice from the users voice? Is it self-aware?


One of the techniques for a voice assistant to distinguish its own voice from background sound is called a Fourier transform, although I expect that the state of the art in this area also includes some other techniques and research.

If you've used one, you might know that you can easily talk to a smart speaker even when it is playing very loud music, it's the same idea.

This video explains more quite well: https://www.youtube.com/watch?v=spUNpyF58BY


I can see so many military and intelligence applications for this! Excited isn't exactly the word I'd use, but... Certainly interesting! The civilian use will of course be marvellous though.


Hey there! It seems you're living in a home that well.. is on our turf now, as the kids say these days!

Now, that's not to say we can't do this in a civil manner! Either you can move out, or... we can just bulldoze your home.

Choose wisely, stranger! Your life depends on it!


Just a friendly reminder that my home is in a NATO-member country and that I'm paying my taxes - that goes towards buying huge complements of Abrams tanks, F22 fighter jets, Reaper drones and a whole host of other nasty things they use to protect my property. In short, mess with me, and you mess with them. Yes, do enjoy your life, and stay off my lawn pls. :)


Wait i thought it said available to free users... i don't see it on chatgpt


I like the demo for sure more than the "reduced latency" Gemini demo [0].

[0] https://www.youtube.com/watch?v=UIZAiXYceBI


So what's the point of paying for ChatGPT Plus? And who on earth chose to make the app Mac only...


> Plus users will have a message limit that is up to 5x greater than free users

from https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-fr...


They mentioned an announcement about a new frontier model coming soon. Presumably this will be exclusive to paid users.


Did they mention this in the gpt4o announcement video? I must have missed this


5x the capacity threshold is the only thing I heard them mention on the live stream.

Though presumably when they are ready to release new models the Plus users will get them first.


I think because usability increases so much (use cases of real-time conversation, and video-based coding, presentation feedback at work etc...) they would expect usage to drastically increase hence paying users would actually still have incentive to pay.


The biggest wow factor was the effect of reducing latency followed in a close second by the friendly human personality. There's an uncanny valley barrier but this feels like a short-term teething problem.


So far, I'm impressed. It seems to be significantly better than GPT-4 at accessing current online documentation and forming answers that use it effectively. I've been asking it to do so, and it has.


I am still baffled at how I can not use a VOIP number to register, even if it accepts TXT/SMS. If I have a snappy new startup and we go all in VOIP, I guess we can not use (or pay to use) OpenAI?


That's what we get when an entire industry uses phone numbers as a "proof of humanity"...


it does make me uncomfortable that the way you typically interact with it is by interrupting it. It makes me want to tell it to be more concise so that I wouldn't have to do that.


I think they were just interrupting on purpose to show off that as a feature (or they just wanted to keep the live presentation brief)


The emphasis on multimodal made me wonder if it was capable of creating audio as output, so I asked it to make me a drum beat. It did so, but in text form. I asked it to convert it to audio. It thought for a while and eventually said it didn’t seem like `simpleaudio` was installed in its environment. Huh, interesting, never seen a response like that before. It clearly made an attempt to carry out my instructions but failed due to technical limitations of its backend. What else can I make it do? I asked it to install `simpleaudio`. It tried but failed with a connection error, presumably due to a firewall rule.

I asked it to run a loop that writes “hello” every ten seconds. Wow, not only did it do so, it’s streaming the stdout to me.

LLMs have always had various forms of injection attacks, ways to force them to reveal their prompts, etc. but this one seems deliberately designed to run arbitrary code, including infinite loops.

Alas, I doubt I can get it to mine enough bitcoin to pay for a ChatGPT subscription,

https://x.com/sethholladay/status/1790233978290516453


Realtime videos? Probably their internal tools. I am testing the gpt4o right now and the responses come in 6-10 seconds. Same experience as the gpt4 text. What's up with the realtime claims?!


Can't find info which of these new features are available via the API


> Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.


[EDIT] The model has since been added to the docs

Not seeing it or any of those documented here:

https://platform.openai.com/docs/models/overview


It is not listed as of yet, but it does work if you punch in gpt-4o. I will stick with gpt-4-0125-preview for now because gpt-4o is majorly prone to hallucinations whereas gpt-4-0125-preview isn't.

Update: gpt-4o-2024-05-13 is listed now.


What gave you the impression that it's prone to hallucinations so quickly? Do you have a series of test questions?


Yes, I actually do, and I ran multiple tests. Unfortunately I don't want to give them away, as I then absolutely risk OpenAI gaming the tests by overfitting to them.

At a high level, ask it to produce a ToC of information about something that you know will exist in the future, but does not yet exist, but also tell it to decline the request if it doesn't verifiably know the answer.


Why does this whole thread sound like OpenAI marketing department is participating ? Ive been talking to google assistant for years. I really don't find anything that magical or special.


oh man, listening to the demos and the way the female AI voice laughed and giggled...there is going to be millions of lonely men who will fall in love with these.

Can't say whether that's good or bad.


Image editing capabilities are... nice. Not there yet.

Whatever I was doing with Chatgpt 4 became faster. Instant win.

My test benchmark questions: still all negative, so reasoning on out-of distribution puzzles is still failing


I just don't see how companies like Cohere can remain in this business

at the same price I get access to faster ChatGPT-3.9

there is little to no reasons to continue using Command R-plus at these prices unless they lower their price significantly


Since it says on the blog that its only images, text and audio input, does GPT-4o likely have a YOLO like model on the phone to pre-process the video frames and send BBoxes to the server?


while everyone's focusing on audio capabilities (haven't heard them yet), i find it amusing that the official demo ("robot writer's block" in particular) of image generation can't even match the verbatim instruction, and the error's not even consistent between generations even as it should be aware of previous contexts. and this is their second generation of multimodal llm capable of generating images.

looks like llms still gonna llm for the near future.


I don't get it...I just switched to the new model on my iPhone app and it still takes several seconds to respond with pretty bland inflection. Is there some setting I'm missing?


Wondering the same. Can’t seem to find the way to interact with this in the same way as the video demo.


They haven't actually released it, or any schedule for releasing it beyond an "alpha" release "in the coming weeks". This event was probably just slapped together to get something splashy out ahead of Google.


According to the article, they've rolled out text and image modes of GPT-4o today but will make the audio mode available at a later date.


With the speed the seemingly exponential developments of this field i wouldn't be surprised if suddenly the entire world tilted and a pair of googles fell from my face. But a dream.


Is this a downloadable app? I don't see it on the iOS app store.


At their core, I still think of these things as search engines, albeit super advanced ones. But the emotion the agent conveys with it's speech synth is completely new...


I wonder how many joules were used just for that conversation.


There's a button on this page that says "try on ChatGPT ->" but that's still version 3.5 and if I upgraded seems to be version 4.

Is this new version not available to users yet?


This is pretty amazing but it was funny still hearing the OpenGPT "voice" of somewhat fake sounding enthusiasm and restating what was said by the human with exaggeration


Are these multimodals able to discern the input voice tone? Really curious if they're able to detect sarcasm or emotional content (or even something like mispronunciation?)


Yes, they can, and they should get better at this over time.

There is a demo video where the presenter breathes heavily and asks the AI is able to notice it as such when prompted.

It can’t just detect tone, it seems to also be able to use tone itself.


Does anyone with a paid plan see anything different in the ChatGPT iOS app yet?

Mine just continues to show “GPT 4” as the model - it’s not clear if that’s now 4o or there is an app update coming…


I think this GPT-4o does have an advantage in hindsight, it will push this product to consumer much faster, and build a revenue base, while other companies playing catch up.


Weird visiting the page crashed my graphics driver using Firefox.


Not all of the founders agreed with Jefferson’s view on the separation of church and state. Do you agree with Jefferson or with his opponents? Explain.


I like the robot typing at the keyboard that has B as half of the keys and my favorite part is when it tears up the paper and behind it is another copy of that same paper


Given that they are moving all these features to free users, it tells us that GPT-5 is around the corner and is significantly much better than their previous models.


Or maybe it is a desperation move after Llama 3 got released and the free mode will have such tight constraints that it will be unusable for anything a bit more serious.


Why do they keep saying freely accessible AI for mankind and keep charging me monthly ? It’s ok to ask payment for services, just don’t lie.


Such an impressive demo... but why did they have to give it this vapid, giggly socialite persona that makes me want to switch it off after thirty seconds?


You should be able to adjust this with a system prompt, given that has end-to-end speech capabilities now


I see a lot of fear around these new kinds of tools. I think though, that criminals will always find ways to leverage new technology to their benefit, and we've always found ways to deal with that. This changes little. Additionally, as you are aware of this, so are people creating this tech, and a lot of effort is underway to protect from malicious uses.

That wont stop criminal enterprises from implementing their own naughty tools, but these open models wont become some kind of holy grail for criminals to do as they please.

That being said, I do beleive, now more than ever, education world wide should be adjusted to fit this new paradigm and maybe adapt quicker to such changes.

As some commenters pointed out, there are already good tools and techiques to use to counter malicious use of AI. maybe noy covering all use cases, but we need to educate people on using the tools available, and trust that researchers (like many of yourselves) are capable of imnovations which will reduce risk even further.

There is no point and no benefit in trying to be negative or full of fear. Go forward with positivity and creativity. Even if big tech gets regulated, some criminal enterprises have billions to invest too, so criplling big tech here will only play into their hands in the end.

Love these new innovations. And for the record, gpt4o still told me to 'push rip' on amd64... so rip to it actually understanding stuff...

If you are smart enough to see some risks here, you might also be smart enough to positively contribute to improvements. Fear shuts things down, love opens them up. Its basic stuff.

This demo is amazing, not scary. its positive advancements in technology and it wont be stopped because people are afraid of it, so go with it, and contribute in areas where you feel its needed. Even if its just giving feedback. And whem giving that, you all know a balanced and constructive approach works better than a negative and destructive approach.


I think that the world is still not coping with current tech. Legislation and protections don't sufficiently cover people for current scams or how companies leverage tech against workers.

This is even more scary that this puts a lower bound price on many, many skills, with again, still no protections for humans.

Would this be exciting if everyone had a safe place to live with infinite electricity and food? Sure. Will tools like this bring about massive uncertainty, hurt, and pain? Almost certainly.

Saying that the sun is shining where you live, doesn't mean there isn't flooding in other parts of the world.


legislations are always behind. you cant rely on government to fix everything constantly ontime. epecially with international things like internet technologies. im not saying ignore the bad stuff, but its only a small percentage ofnwhats relly out there. most of the world, and the vast majority of people, are good :)


Criminals misusing it? I feel like this is already a dangerous way to use AI, they use an enthusiastic, flirty and attractive female voice on millions of nerds. They openly say this is going to be like the movie Her. Shouldn't we have some societal discussion before we unleash paid AI girlfriends on everybody?


marketing is marketing. look how they marketed cigarettes , cars, ll kinds of things that now people feel are perhaps not so good. its part and parcel of the world that also does so much good. personally, id market it differently, but this is why im no CEO =).

if we help eachother understand these things andnhownto cope, all will be fine in the end. we will hit some bumps, and yes, there will be discomfort but thats ok. thats all part of life. life is not about being happy and comfortable allnthe time no matter how much we would want that.

some people even want paid AI girlfriends. who are you to tell them they are not allowed to have it?


This is some I, Robot level stuff. That being said, I still fail to see the real world application of this thing, at least at a scalable affordable cost.


I hope we can disable the cringe American hyperemotions.


This is every romance scammer's dreams come true...


real time audio is mind blowing


Not very impressed. It's been 18 months since ChatGPT, i would have expected more progress. It looks like we have reached the limit of LLMs.


That first demo video was impressive, but then it ended very abruptly. It made me wonder if the next response was not as good as the prior ones.


Extremely impressive -- hopefully there will be an option to color all responses with a underlying brevity. It seemed like the AI just kept droning on and on.


I don't see anything released today. Login/signup is still required, no signs of desktop app or free use on web. What am I missing?


Haha lol Subscription canceled was the best choice but it's new fancy cool magic sensational AGI please give all your money df


What would be the difference in compute for inference on an audio<>audio model like this compared to a text<>text model?


I am a paid customer, yet I don't see anything new. I'm tired of these fake announcements of "released" features.


Nothing creepier than human voice on a robot.


I'm not as much surprised by the capabilities of new model (IMHO same as GPT-4) as by it's real time capabilities.

My brother who can't see correctly, will use this to cook a meal without me explaining this to him it's so cool.

People all around the world will now get real-time AI assistance for a ton of queries.

Heck - I have a meeting bot API company (https://aimeetingbot.com) and that makes me really hyped!


ChatGPT 4o reminds me of upgrading from a 300 baud modem to a 1200 baud modem, when modems used to cost a dollar a baud.


GPT-4o is very fast but seems to generate some very random ASCII Art compared to GPT-4 when text in the art is involved.


Feels like a really good engineering in a wrong direction. Who said that the audio is good interface anyway? Audio is hard to edit, slow and has low-information density. If I want to talk to someone and have low-information but pleasant exchange I can just to talk to real people, I don't need computers for it.

I guess it is useful for some casual uses, but I really wish there was more focus on the reasoning and intelligence of the model itself.


Nice, the landing page says "Try on ChatGPT" but when I create an account all I can try is GPT3.5.

I am not surprised.


Just like that Google is on back foot again.


Considering the stock pumped following the presentation the market doesn't seem particularly with what OpenAI released at all.


The market was expecting ChatGPT search to take a chunk out of Google's search revenue.


New GPT-4o is not yet available when I tried to access ChatGPT from Finland. Are they rolling it out to Europe later?



I assume there's no reason to use GPT-4-turbo for API calls, as this one is supposedly better and 2x cheaper.


I'm surprised they're limiting this api. Haven't they not even opened the image api in gpt4 turbo?


When AI gets to the point it can respond to AI, you do understand where you come in, don't you?


Let's start a betting pool on how long it takes BetterHelp to lay off their therapists for this thing.


So what's the difference between the different gpt2 chatbots on lmsys. Which one is deployed live now?


Imagine having to interact with this thing in an environment where it is in the power position.

Being in a prison with this voice as your guard seems like a horrible way to lose your sanity. This aggressive friendlyness combined with no real emotions seems like a very easy way to break people.

There are these stories about nazis working at concentration camps, having to drink an insane amount of alcohol to keep themselves going (not trying to excuse their actions). This thing would just do it, while being friendly at the same time. This amount of hopeless someone would experience if they happen to be in custody of a system like this is truly horrific.


Good update from the previous one. Atleast they now have data and information till October 2023.


Interested in how many LLM startups there are that are going out of business due to this voice assistant.


A new "flagship" model with no improvement of intelligence, very disappointed. Maybe this is a strategy for them to mass collect "live" data before they're left behind by Google/Twitter live data...


Google must be shitting it right now.


I tried this for about 10 minutes, and went back to 4. Not really that great for what I am doing.


Did the one guy wear a leather jacket so the AI wouldn't point out that he's balding?


My main takeaway is that Generative AI has hit a wall... New paradigms, architectures and breakthroughs are necessary for the field to progress but this begs the question, If everyone knows the current paradigms have hit a wall, Why is so much money being spent on LLMs ,diffusion models etc,which are bound to become obsolete within a few(?) years?


"im-also-a-good-gpt-2" signaling that agi is just an optimization problem.


After looking at the introduction, there doesn't seem to be much of an update in OpenAI's features: https://aidisruption.substack.com/p/ultimate-ai-gpt-4o-your-...


The stock market doesn't seem too impressed - GOOG rebounded from strong red to neutral.


Probably because people thought OpenAI was going to launch a new search engine, but didn't.


This looks too good to be true? What's the catch?

Also, wasn't expecting the perf to improve by 2x


Did anyone tried to use 4o camera in a mirror test to test the concept of self?


Are there any remotely comparable open source models? Fully multimodal, audio-to-audio?


Hmm, there’s this Gazelle that can take in audio, but to get audio back out you’d have to use something else (e.g. Piper).

https://github.com/tincans-ai/gazelle?tab=readme-ov-file

https://tincans.ai/slm

https://github.com/rhasspy/piper


I started to watch video but had to stop after a few seconds. It is way too cheesy.


I was about to say how this thing is lame because it sounds so forced and robotic and fake, and even though the intonations do make it sound more human-like, it's very clear that they made a big effort to make it sound like natural speech, but failed.

...but then I realized that's basically the kind of thing Data from Star Trek struggles with as part of his character. We're almost in that future, and I'm already falling into the role of the ignorant human that doesn't respect androids.


This 4o is already rolling out?


They mentioned capabilities will be rolled out over the next few weeks: https://youtu.be/DQacCB9tDaw?t=5018


The more I get, the more I want. Exciting times. Can't wait for GPT-5.


I hope when this gets to my iphone I can use it to set two concurrent timers.


This was added in iOS 17.


I'd love to know when streaming is going to come to the gpt-4o API...


question for you guys - is there a model that can take figures (graphs), from scientific publications, and combine image analysis with picking up the data point symbol descriptions and analyse the trends?


Set a memorable verification phrase with your friends and loved ones.


I thought they would release a competitor to perplexity? Was this it?


These AI's sure do yap a lot

Also, they're TERRIBLE at harmonizing together


People will pay to dull their senses. This will make so much money!


what's the path from LLMs to "true" general AI? is it "only" more training power/data or will they need a fundamental shift in architecture?


As a paid user this felt like a huge letdown. GPT-4o is available to everyone so I'm paying $20/mo for...what, exactly? Higher message limits? I have no idea if I'm close to the message limits currently (nor do I even know what they are). So I guess I'll cancel, then see if I hit the limits?

I'm also extremely worried that this is a harbinger of the enshittification of ChatGPT. Processing video and audio for all ~200 million users is going to be extravagantly expensive, so my only conclusion is that OpenAI is funding this by doubling down on payola-style corporate partnerships that will result in ChatGPT slyly trying to mention certain brands or products in our conversations [1].

I use ChatGPT every day. I love it. But after watching the video I can't help but think "why should I keep paying money for this?"

[1] https://www.adweek.com/media/openai-preferred-publisher-prog...


So... cancel the subscription?


Completely agree, none of the updates will apply to any of my use cases, disappointment.


Wow Vision Understanding blew Gemini Pro 1.5 out of the water


Holy crap, the level of corporate cringe of that "two AIs talk to each other" scene is mind-boggling.

It feels like a pretty strong illustration of the awkwardness of getting value from recent AI developments. Like, this is technically super impressive, but also I'm not sure it gives us anything we couldn't have one year ago with GPT-4 and ElevenLabs.


a fake saucy friend for alienated humans with chitchat

if the end user is in a war zone will the AI bot still ask how it is going?

how many bombs fell in your neighborhood last night?


Good lord, that voice makes Elevenlabs.io look... dead


Can I mix french and english when talking to it ?


I cannot find the mac app anywhere. Is there a link?


Maybe this is yet another wake-up call to startups: wrapping up another company's APIs to offer convenience or incremental improvement is not a via business model. If your wrapper turns out to be successful, the company that provides the API will just incorporate your business as a set of new features with better usability, faster response time, and lower price.


Curiously want to know why didn't they create the Windows desktop app first? which is the dominant desktop segment. In fear of competing with Microsoft's copilot?


I guess because they mostly use Macs. They always use Macbooks in videos.


Looking forward to see how one can finetune this


A bit sad to see the desktop app is macos only


I don't think it's available in Europe yet? It seems they prioritizes the US market for now. Hence OSX because Mac is way more popular in the US than the rest of the world.


Eh, thanks, I didn’t realized it was US specific


the OpenAI live stream was quite underwhelming...


Can’t wait to get interviewed by this model!


it seems like the ability to interrupt is more like the interrupt in the computer sense ... A control-c (or control-s tty flow control for you old timers), not a cognitive evaluation followed by the "reasoned" decision to pause voice output. not that it matters i guess, its just not general intelligence. its just flow control.

but also, thats why it fails a real turing test. a real person would be irritated as fuck by the interruptions


Impressive way to gather more training data


The big news is that this is gonna be free


Just over 10 years later, it's Her


what does the "o" stand for?


Stands for "omni", since it is multimodal. Source: https://www.tomsguide.com/ai/chatgpt/gpt-4o-is-openais-excit...


This is clearly not just another story of human innovation. This is not just the usual trade-off between risks and opportunities.

Why? Because it simply automates the human away. Who wouldn't opt for a seemingly flawless, super effective buddy (i.e. an AI) that is never tired, always knows better? if you need some job done, if you're feeling lonely, when you need some life advice.. It doesn't matter if it might be considered "just imitation of human".

Why would future advancements of it keep being "just some tool" instead of largely replacing us as (humans) in jobs, relationships, ...?


Dear OpenAI, either remember my privacy settings or open a temporary chat by default, this funny nonsense of typing in something only to find out you’re going to train on it is NOT a good experience.


Western governments are already in full-on panic over falling birth rates. I think this cranks that panic dial up to 11.


Why ? Isn’t this the dream ? Endless growth and endless workers ? No one really cares about less people, just less money.


Endless growth? Who is going to buy all those new products your AI employees are making? When none of the real people have jobs, how are they going to buy those products you're trying to sell?

- Less people = less demand for products. New customer growth stalls. Prices fall. Revenue and profits fall.

- Less people = less demand for housing. Prices fall. Investments fall.

- Less people = less people able to perform physical jobs.

- Less people = less tax revenue. Less money available for social services.

- Less young people = Aging population.

- Aging population = higher strain on social services. Pensions, healthcare, etc.

- Aging population = higher percentage of young people need to care for aging people instead of entering the workforce.

In a capitalist economy where your numbers need to keep going up to be considered successful (eg growth is necessary, stable profits but no growth = bad) then you are never going to have a good time when your population falls.

> No one really cares about less people, just less money

Eventually less people leads to less money.


Who cares about all of that when ~ 1000 people can benefit financially in the short term? /s


With our platform you can ALREADY use it to automate your business and sales!

Create your gpt4o chatbot with our platform tvoybot.com?p=ycombinator



I can't help but wonder who this is actually for?

Conversing with a computer sounds pathetic, but this will be pushed down our throats in the name of innovation (firing customer service agents)


Is that conversational UI live?


so should I unsubscribe from openai since gpt-4o is now free for all?


Damm, that was a big leap.


Honestly, the eager flirtatiousness of the AI in the demos, in conversation with these awkward engineers, really turns me off. It feels like a male power fantasy.

Nonetheless, very impressive.


Am I the only one that feels underwhelmed by this?

yeah its cool and unlike anything ive seen before but I kind of expected a bigger leap.

To me the most impressive thing is going to be longer context limits. I'd had semi long running conversations where ive had to correct an LLM multiple times about the same thing.

when you have more context the LLM can infer more and more. Am I wrong about this?


The updates seem to all be geared towards corrective updates rather than expansion of capabilities. We're still typing prompts rather than speaking them into a microphone.

If it was truly Ai, why isn't it rapidly building itself? Rather than relying on scraping human content from wildly inaccurate and often incorrect social media posts? So much effort is wasted in trying to push news cycles rather than a careful, responsible, and measured approach to developing Ai into becoming tools that are highly functional and useful to individuals. The biggest innovation in Ai right now is how to make it modular and slap a fee on each feature, and that's not practical at all into the future.

I'll begin to believe that consume Ai is making strides when Siri and Google Assistant stop missing commands, and actually can conduct meaningful conversations without an Internet connection and monthly software updates, which in my opinion is at least 5-10 years away. Right now what is presented as "Ai" is usually often incomplete sensor-aware scripting or the wizard of Oz (humans) hidden behind the curtains operating switches and levers, a bunch of underwhelming tools, and a heap of online marketing. If they keep that act up, it erodes faith in the entire concept, just like with Full Self Driving Tesla Trucks.


> If it was truly Ai, why isn’t it rapidly building itself?

You seem to confuse AI, the field of endeavor, with ASI or at least AGI (plus will, which may or may not be a necessary component of either), which are goals of the field that no one (approximately, there have been some exceptions but they’ve quickly been dismissed and faded) claims have been achieved.


FREE = Data Collection


So, babelfish incoming?


and no mention of hallucinations. I hope it was improved.


As I commented in the other thread, really really disappointed there's no intelligence update and more of a focus on "gimmicks". The desktop app did look really good, especially as the models get smarter. Will be canceling my premium as there's no real purpose of it until that new "flag ship" model comes out.


Agree on hoping for an intelligence update, but I think it was clear from teasers that this was not gonna be GPT-5.

I'm not sure how fair it is to classify the new multimodal capabilities as just a gimmick though. I personally haven't integrated GPT-4 into my workflow that much and the latency and the fact I have to type a query out is a big reason why.


After using it for an hour, I completely agree. Having the faster token rate + their excel analysis actually work is a game changer.


Pretty responsible progress management by OpenAI. Kicking off another training wave is easy, if you can afford the electricity, but without new, non-AI tainted datasets or new methods, what’s the point? So, in the meantime, make magic with the tool you already have, without freaking out the politicians or the public. Wise approach.


Just advanced google


The one thing I first thought is that I felt uncomfortable the way they cut and interrupt the she-AI. I wonder if our children will end up being douchebags?

Other than that it felt like magic, like that Google demo of the phone doing some task like setting up an appointment over phone talking to a real person.


So, babelfish soon?


That can “reason”?



Pretty responsible progress management by OpenAI.

Kicking off another training wave is easy, if you can afford the electricity, but without new, non-AI tainted datasets or new methods, what’s the point?

So, in the meantime, make magic with the tool you already have, without freaking out the politicians or the public.

Wise approach.


So GPT-4o can do voice intonation? Great. Nice work.

Still, it sounds like some PR drone selling a product. Oh wait....


Sorry to nitpick, but in the language tokenisation part, the french part is incorrect. The exclamation mark are surrounded by spaces in french. "c'est un plaisir de vous rencontrer!" should be "c'est un plaisir de vous rencontrer !"


And yet no matter how easy they make ChatGPT to interact with, I cannot use it due to accuracy. Great, now I can have a voice telling me information I have no way of knowing is correct rather than just having it given to me as text.


does this make retell.ai obsolete?


people are scared and it shows :)


I, for one, welcome our new annoying-sounding AI overlords.


\clear


The naming of these systems has me dead


Creepy AF


the design is very human


is the voice Kristen Wig?


this is really amazing!!


i absolutely hate this. we are going to destroy society with this technology. we cant continue to enjoy the benefits of human society if humans are replaced by machines. i hate seeing these disgusting people smugly parade this technology. it makes me so angry that they are destroying human society and all i can do is sit here and watch.


I know exactly what you mean. I just hope people get bored of this waste of time and energy —- both personal and actual energy —- before it goes too far.


future is coming


can i try free ?


Now, say goodbye to call centers.


and say hello to your grandma getting scammed


and say hello to UBI which in the long term will diminish scam :)


The usual critics will quickly point out that LLMs like GPT-4o still have a lot of failure modes and suffer from issues that remain unresolved. They will point out that we're reaping diminishing returns from Transformers. They will question the absence of a "GPT-5" model. And so on -- blah, blah, blah, stochastic parrots, blah, blah, blah.

Ignore the critics. Watch the demos. Play with it.

This stuff feels magical. Magical. It makes the movie "Her" look like it's no longer in the realm of science fiction but in the realm of incremental product development. HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... oddly primitive by comparison. I'm impressed at how well this works.

Well-deserved congratulations to everyone at OpenAI!


> This stuff feels magical. Magical.

Because its capacities are focused on exactly the right place to feel magical. Which isn’t to say that there isn’t real utility, but language (written, and even moreso spoken) has an enormous emotional resonance for humans, so this is laser-targeted in an area where every advance is going to “feel magical” whether or not it moves the needle much on practical utility; it’s not unlike the effect of TV news making you feel informed, even though time spent watching it negatively correlates with understanding of current events.


Kind of this. That was one of the themes of the movie Westworld where the AI in the robots seemed magical until it was creepy.

I worry about the 'cheery intern' response becoming something of a punch line.

"Hey siri, launch the nuclear missiles to end the world."

"That's a GREAT idea, I'll get right on that! Is there anything else I can help you with?"

Kind of punch lines.

Will be interesting to see where that goes once you've got a good handle on capturing the part of speech that isn't "words" so much as it is inflection and delivery. I am interested in a speech model that can differentiate between "I would hate to have something happen to this store." as a compliment coming from a customer and as a threat coming from an extortionist.


It's probably just me, but the somewhat forced laughs & smiles from the people talking to it make me feel uneasy.

But enough of that. The future looks bright. Everyone smile!

Or else..


This is basically just the ship computer from Hitchhikers Guide to the Galaxy.

"Guys, I am just pleased as punch to inform you that there are two thermo-nuclear missiles headed this way... if you don't mind, I'm gonna go ahead and take evasive action."


ChatGPT is now powered by Genuine People Personality™ and OpenAI is turning into the Sirius Cybernetics Corporation (who according to the HHGTTG were "a bunch of mindless jerks who were the first against the wall when the revolution came")

The jokes write themselves.


I did wonder if there's a less verbose mode. I hope that's not a paywalled feature. Honestly it's possible that they use the friendliness to help buy the LLM time before it has to substantively respond to the user.


Positivity even to the point of toxicity will be the default launch tone for anything... to avoid getting scary.



Yeah people around me here in Central Europe are very sick of that already. Everybody is complaining about it and the first thing they say to the bot is to cut it out, stop apologizing, stop explaining and get to the point as concisely as possible. Me too.


I have do that now with every AI over explaining or providing loosely related info I did not ask for. I hope there is a verbosity level = minimum.

Even in the demo today, they kept cutting it off.


I cloned my voice on play.ai and it’s an excellent conversational partner in terms of reactiveness and verbosity, much better than ChatGPT.


One of the demos has the voice respond to everything sarcastically. If it can sound sarcastic it’s not a stretch to believe it can “hear” sarcasm.


Louis CK - Everything is amazing & nobody is happy

https://www.youtube.com/watch?v=kBLkX2VaQs4


Perhaps everybody is right, and what is amazing is not what matters, and what matters is hardly amazing...


Or perhaps the news media has been increasingly effective at convincing us the world is terrible. Perceptions have become measurably detached from reality:

https://www.ft.com/content/af78f86d-13d2-429d-ad55-a11947989...


If we're convinced that it's terrible then we're behaving like it's terrible, which is terrible.


Or perhaps the reality on the ground for the working and middle class masses is not the same reality experienced by the elites, and the upper middle class with $150K+ salaries, or measured by stock market growth, and such...


As John Stewart says in https://www.youtube.com/watch?v=20TAkcy3aBY - "How about I hold the fort on making peanut butter sandwiches, because that is something I can do. How about we let AI solve this world climate problem".

Yet to see a true "killer" feature of AI, that isn't doing a job badly which humans can already do badly.


the point of all of this is: this is alpha 0.45 made to get the money needed to build AGI whatever that is


>the point of all of this is: this is alpha 0.45 made to get the money needed to build AGI whatever that is

Maybe they publicly made it available at alpha 0.7 and now it's more like 0.9 RC instead, with not much room to go except through marginal improvements for an ever increasing training budget making them less and less worthy?

And that's before 90% of the internet becomes LLM ouput, poisoning any further corpus for training and getting into LSD-grade hallucinations mode...


It’s not an either-or: the stuff feels magical because it both represents dramatic revelation of capability and because it is heavily optimized to make humans engage in magical thinking.

These things are amazing compared to old-school NLP: the step-change in capability is real.

But we should also keep our wits about us, they are well-Des robed by current or conjectural mathematics, they fail at things dolphins can do, it’s not some AI god and it’s not self-improving.

Let’s have balance on both the magic of the experience and getting past the tech demo stage: every magic trick has a pledge, but I think we’re still working on the prestige.


Yes, the announcement explicitly states that much of the effort for this release was focused on things that make it feel magical (response times, multiple domains, etc.), not on moving the needle on quantifiable practical performance. For future releases, the clever folks at OpenAI are surely focused on improving performance on challenging tasks that practical utility -- while maintaining the "magical feeling."


Where does it explicitly say this?


Explicit ≠ literal.

The things they mention/demo -- response times, multiple domains, inflection and tone, etc. -- are those that make it feel "magical."


> explicitly states that much of the effort for this release was focused on things that make it feel magical (response times, multiple domains, etc.), not on moving the needle on quantifiable practical performance.

Hmm, did you mean implicitly? I've yet to see where they say anything to the likes of not "moving the needle on quantifiable practical performance."


Pretty interesting how it turns out that --- contrary to science fiction movies --- talking naturally and modelling language is much easier and was achieved much sooner than solving complex problems or whatever it is that robots in science fiction movies do.


I didn't use it as a textual interface, but as a relational/nondirectional system, trying to ask it to inverse recursive relationships (first/follow sets for BNF grammars). The fact that it could manage to give partially correct answers on such an abstract problem was "coldly" surprising.


> its capacities are focused on exactly the right place to feel magical.

this focus subverts its intended effect on those of us with hair trigger bullshit-PTSD


VC loves it.

Another step closer for those 7 trillion that OpenAI is so desperate for.


You really think OpenAI has researchers figuring out how to drive emergent capabilities based on what markets well?

Edit: Apparently not based on your clarification, instead the researchers don't know any better than to march into a local maxima because they're only human and seek to replicate themselves. I assumed too much good faith.


I don’t think the intent matters, the effect of its capacities being centered where they are is that they trigger certain human biases.

(Arguably, it is the other way around: they aren’t focused on appealing to those biases, but driven by them, in the that the perception of language modeling as a road to real general reasoning is a manifestation of the same bias which makes language capacity be perceived as magical.)


Intent matters when you're being as dismissive as you were.

Not to mention your comment doesn't track at all with the most basic findings they've shared: that adding new modalities increases performance across the board.

They shared that with GPT-4 vs GPT-4V, and the fact this is a faster model than GPT-4V while rivaling it's performance seems like further confirmation of the fact.

-

It seems like you're assigning emotional biases of your own to pretty straightforward science.


> Intent matters when you're being as dismissive as you were.

The GP comment we're all replying to outlines a non-exhaustive list of very good reasons to be highly dismissive of LLM. (No I'm not calling it AI, it is not fucking AI)

It is utterly laughable and infuriating that you're assigning legitimate skepticism about this technology as a an emotional bias. Fucking ridiculous. We're now almost a full year into the full bore open hype cycle of LLM. Where's all the LLM products? Where's the market penetration? Business can't use it because it has a nasty tendency to make shit up when it's talking. Various companies and individuals are being sued because generative art is stealing from artists. Code generators are hitting walls of usability so steep, you're better off just writing the damn code yourself.

We keep hearing this "it will do!" "it's coming!" "just think of what it can do soon!" on and on and on, and it just keeps... not doing any of it. It keeps hallucinating untrue facts, it keeps getting basics of it's tasks wrong, for fucks sake AI Dungeon can't even remember if I'm in Hyrule or Night City. Progress seems fewer and farther between, with most advances being just getting the compute cost down, because NO business currently using LLM extensively could be profitable without generous donation of compute from large corporations like Microsoft.


I didn't see any good reasons to be dismissive of LLMs, I saw a weak attempt at implying we're at a local maxima because scientists don't know better than to chase after what seems magical or special to them due to their bias as humans.

It's not an especially insightful or sound argument imo, and neither are random complaints about capabilities of systems millions of people use daily despite your own claims.

And for the record:

> because NO business currently using LLM extensively could be profitable without generous donation of compute from large corporations like Microsoft

OpenAI isn't the only provider of LLMs. Plenty of businesses are using providers that provide their services profitably, and I'm not convinced that OpenAI themselves are subsidising these capabilities as strongly as they once did.


All that spilled ink don’t change the fact that I use it every day and it makes everything faster and easier and more enjoyable. I’m absolutely chuffed to put my phone on a stand so GPT4o can see the page I’m writing on and chat with me about my notes or the book I’m reading and the occasional doodle. One of the first things I’ll try out is to see if it can give feedback and tips on sketching, since it can generate images with a lot better control of the subject it might even be able to demonstrate various techniques I could employ!


As it turns out, people will gleefully welcome Big Brother with open arms as long as it speaks with a vaguely nice tone and compliments the stuff it can see.


lol, typed from a telescreen no doubt


It's almost a year since this James Watt came out with his steam engine and yet we are still using horses.


A year is an eternity in tech and you bloody well know it. A year into an $80 billion dollar valued company's prime hype cycle, and we have... chatbots, but fancier? This is completely detached from sanity.


I mean when you’re making a point about how your views should not be taken as emotional bias, it pays to not be overly emotional.

The fact that you don’t see utility doesn’t mean it is not helpful to others.

A recent example, I used Grok to write me an outline of a paper regarding military and civilian emergency response as part of a refresher class.

To test it out we fed it scenario questions and saw how it compared to our classmates responses. All people with decades of emergency management experience.

The results were shocking. It was able to successfully navigate a large scale emergency management problem and get it (mostly) right.

I could see a not so distant future where we become QA checkers for our AI overlords.


That's not what the GP said at all. It was just an explanation for why this demo feels so incredible.


GP's follow up is literally

>they aren’t focused on appealing to those biases, but driven by them, in the that the perception of language modeling...

So yes in effect that is their point, except they find the scientists are actually compelled by what markets well, rather than intentionally going after what markets well... which is frankly even less flattering. Like researchers who enabled this just didn't know better than to be seduced by some underlying human bias into a local maxima.


I think that's still just an explanation of biases that go into development direction. I don't view that as a criticism but an observation. We use LLMs in our products, and I use them daily and I'm not sure how that's that negative.

We all have biases in how we determine intelligence, capability, and accuracy. Our biases color our trust and ability to retain information. There's a wealth of research around it. We're all susceptible to these biases. Being a researcher doesn't exclude you from the experience of being human.

Our biases influence how we measure things, which in turn influences how things behave. I don't see why you're so upset by that pretty obvious observation.


The full comment is right there, we don't need to seance what the rest of it was or remix it.

> Arguably, it is the other way around: they aren’t focused on appealing to those biases, but driven by them, in the that the perception of language modeling as a road to real general reasoning is a manifestation of the same bias which makes language capacity be perceived as magical

There's no charitable reading of this that doesn't give the researcher's way too little credit given the results of the direction they've chosen.

This has nothing to do with biases and emotion, I'm not sure why some people need it to be: modalities have progressed in order of how easy they are to wrangle data on: text => image => audio => video.

We've seen that training on more tokens improves performance, we've seen that training on new modalities improves performance on the prior modalities.

It's so needlessly dismissive to act like you have this mystical insight into a grave error these people are making, and they're just seeking to replicate human language out of folly, when you're ignoring table stakes for their underlying works to start with.


Note that there is only one thing about the research that I have said is arguably influenced by the bias in question, “the perception of language modeling as a road to real general reasoning”. Not the order of progression through modalities. Not the perception that language, image, audio, or video are useful domains.


>This stuff feels magical. Magical.

Sound like the people who defend Astrology because it feels magical how their horoscope fits their personality.

"Don't bother me with facts that destroy my rose-tinted view"

At moment AI is a massive hype and shoved into everything. To point at the faults and weaknesses is a reasonable and responsible thing to do.


i legitimately don't understand this viewpoint.

3 years ago, if you told me you could facetime with a robot, and they could describe the environment and have a "normal" conversation with me, i would be in disbelief, and assume that tech was a decade or two in the future. Even the stuff that was happening a 2 years ago felt unrealistic.

astrology is giving vague predictions like "you will be happy today". GPT-4o is describing to you actual events in real time.


People said pretty much exactly the same thing about 3d printing.

"Rather than ship a product, companies can ship blueprints and everyone can just print stuff at their own home! Everything will be 3d printed! It's so magical!"

Just because a tech is magical today, doesn't mean that it will be meaningful tomorrow. Sure, 3d printing has its place (mostly in making plastic parts for things) but it's hardly the revolutionary change in consumer products that it was touted to be. Instead, it's just a hobbiest toy.

GPT-4o being able to describe actual events in real time is interesting, it's yet to be seen if that's useful.

That's mostly the thinking here. A lot of the "killer" AI tech has really boiled down to "Look, this can replace your customer support chat bot!". Everyone is rushing to try and figure out what we can use LLMs (Just like they did when ML was supposed to take over the world) and so far it's been niche locations to make shareholders happy.


> Sure, 3d printing has its place (mostly in making plastic parts for things) but it's hardly the revolutionary change in consumer products that it was touted to be. Instead, it's just a hobbiest toy.

how sure are you about that?

https://amfg.ai/industrial-applications-of-3d-printing-the-u...

how positive are you that some benefits in your life are not attributable to 3d-printing used behind the scenes for industrial processes?

> Just like they did when ML was supposed to take over the world

how sure are you that ML is not used behind the scenes to benefit your life? do you consider features like fraud detection programs, protein-folding prediction programs to create, and spam filters valuable in and of themself?


This honestly made me lol.

I'm sure 10 years from now, assuming LLMs don't prove me wrong, I'll make a similar comment about LLMs and a new hype that I just made about 3d printing, and I'll get EXACTLY this reply. "Oh yeah, well here's a niche application of LLMs that you didn't account for!".

> how positive are you that some benefits in your life are not attributable to 3d-printing used behind the scenes for industrial processes?

See where I said "in consumer products". I'm certainly not claiming that 3d printing is never used and is not useful. However, what I am saying is that it was hyped WAY beyond industrial applications.

In fact, here I am, 11 years ago, saying basically exactly what I'm saying about LLMs that I said about 3d printing. [1]. Along with people basically responding to me the exact same way you just did.

> how sure are you that ML is not used behind the scenes to benefit your life? do you consider features like fraud detection programs, protein-folding prediction programs to create, and spam filters valuable in and of themself?

Did I say it wasn't behind the scenes? ML absolutely has an applicable location, it's not nearly as vast as the hype train would say. I know, I spent a LONG time trying to integrate ML into our company and found it simply wasn't as good as hard and fast programmed rules in almost all situations.

[1] https://www.reddit.com/r/technology/comments/15iju9/3d_print...


sorry, maybe i'm not completely understanding what you mean by "in consumer products".

reading your argument on reddit, it seems to me that you don't consider 3d printing a success because there's not one in every home...which is true.

but it feels uncreative? like, sure, just because it hasn't been mass adopted by consumers, doesn't mean there wasn't value generation done on an industrial level. you're probably using consumer products right now that have benefitted from 3d printing in some way.

> ML absolutely has an applicable location, it's not nearly as vast as the hype train would say

what hype train are you referring to? i know a lot of different predictions in machine learning, so i'm curious about what you mean specifically.


> but it feels uncreative? like, sure, just because it hasn't been mass adopted by consumers, doesn't mean there wasn't value generation done on an industrial level. you're probably using consumer products right now that have benefitted from 3d printing in some way.

I'd suggest reading both the article and the surrounding reddit comments if you want context for my argument there. The explicit argument there was that everyone would own a 3d printer. Not that they would be used in commercial applications or to produce consumer goods. No, instead that everyone would have a 3d printer on hand to make most of their goods (rather than having their goods shipped to them). That's the hype.

I did not say there weren't other areas where 3d printing could be successful nor that it wouldn't have applications. Rather, that the hype around it was unfounded and overblown.

This is much the same way I see LLMs. The current hype around them is that every job will end up being replaced. Doctors, nurses, lawyers, programmers, engineers, architects, everything. All replaced by LLMs and AI. However, that seems really unrealistic when the current state of LLMs is you always need a human doublechecking what it produces, and it's known to give out incorrect responses. Further, LLMs have limited capabilities to interact with applications let alone the physical world. Perhaps they will but also perhaps they won't. The imagination of what they could do is just wildly out of step with what they currently do.

> what hype train are you referring to? i know a lot of different predictions in machine learning, so i'm curious about what you mean specifically.

I didn't really see a lot of predictions around ML. Instead, it was more just a bunch of articles talking about the importance of it and seemingly ever CEO deciding they need more ML in their products. Lots of stuff ended up being marketed specifically because it had ML capabilities (much like this last CES had almost every product with "AI" capabilities).

Funnily, the ML didn't (as far as I could see) have a whole lot of predictions other than more of an ephemeral notion that it would save manpower.

I bring it up in this case because like LLMs, there's just a bunch of buzz around 2 letters with not a whole lot of actual examples of those 2 letters being put to practical use.


hm, maybe we're misinterpreting each other's main point.

My reply was to some person who said that AI was akin to astrology, i.e. absolutely fake bullshit, which is bonkers to me.

Your reply was that AI, like 3d printing, is likely not going to be mass adopted by the average consumer, despite the hype, which i think is a reasonable prediction, and doesn't necessarily mean it won't have some valuable applications.

Maybe just agree to agree?


Yeah, if you see it that way then I think we agree.

croes's point, I believe, about the astrology was that we know today that LLMs will produce bad results and that they can't be trusted. Yet the hype is sort of at a "Well, if we just give it more time maybe that problem goes away". Similar to how in astrology "if you just think about it right, the prediction was actually accurate".

That's where I see the parallels with 3d printing. There was a sort of "We can print anything with enough time!" even though by and large the only printable things were plastic toys.


> GPT-4o being able to describe actual events in real time is interesting, it's yet to be seen if that's useful.

sure, but my experience is that if you are able to optimize better on some previous limitation, it legitimately does open up a whole different world of usefulness.

for example, real-time processing makes me feel like universal translators are now all the more viable


The huge difference between this and your analogy is that 3d printing failed to take off because it never reached mass adoption, and stayed in the "fiddly and expensive" stage. GPT models have already seen adoption in nearly every product your average consumer uses, in some cases heedless of whether it even makes sense in that context. Windows has it built in. Nearly everyone I know (under the age of 40) has used at least one product downstream of OpenAI, and more often than not a handful of them.

That said, yeah it's mostly niche locations like customer support chatbots, because the killer app is "app-to-user interface that's undisguisable from normal human interaction". But you're underestimating just how much of the labor force are effectively just an interface between a customer and some app (like a POS). "Magical" is exactly the requirement to replace people like that.


> But you're underestimating just how much of the labor force are effectively just an interface between a customer and some app

That's the sleight of hand LLM advocates are playing right now.

"Imagine how many people are just putting data into computers! We could replace them all!"

Yet LLMs aren't "just putting data into a computer" They aren't even really user/app interfaces. They are a magic box you can give directives to and get (generally correct, but not always) answers from.

Go ahead, ask your LLM "Create an excel document with the last 30 days of the high temperatures for blank". What happens? Did it create that excel document? Why not?

LLMs don't bridge the user/app gap. They bridge the user/knowledge gap, sometimes sort of.


"Adoption" of tech companies pushing it on you is very different from "adoption" in terms of the average person using it in a meaningful way and liking it.


Remember when Chegg's stock price tanked? That's because GPT is extremely valuable as a homework helper. It can make mistakes, but that's very infrequent on well-understood topics like English, math and science through the high school level (and certainly if you hire a tutor, you'd pay a whole lot more for something that can also make mistakes).

Is that not a very meaningful thing to be able to do?


If you follow much of the education world, it's inundated with teachers frantically trying to deal with the volume and slop their students produce with AI tools. I'm sure it can be useful in an educational context, but "replacing a poor-quality cheating tool with a more efficient poor-quality cheating tool" isn't exactly what I'd call "meaningful."

The most interesting uses of AI tools in a classroom I've seen is teachers showing students AI-generated work and asking students to critique it and fact check it, at which point the students see it for what it is.


> Is that not a very meaningful thing to be able to do?

No? Solving homework was never meaningful. Being meaningful was never the point of homework. The point was for you to solve it yourself. To Learn with your human brain, such that your human brain could use those teaching to make new meaningful knowledge.

John having 5 apples after Judy stole 3 is not interesting.


Ok, but what will the net effects be? Technology can be extremely impressive on a technical level, but harmful in practical terms.

So far the biggest usecase for LLMs is mass propaganda and scams. The fact that we might also get AI girlfriends out of the tech understandly doesn't seem that appealing to a lot of folks.


this is a different thesis than "AI is basically bullshit astrology", so i'm not disagreeing with you.

Understanding atomic energy gave us both emission-free energy and the atomic, and you are correct that we can't necessarily where the path of AI will take us.


There are 8 billion humans you could potentially facetime with. I agree, a large percentage are highly annoying, but there are still plenty of gems out there, and the quest to find one is likely to be among the most satisfying journeys of your life.


sure, but we're not discussing the outsourcing of human companionship in this context. we're discussing the capabilities of current technology.


But technology has secondary effects that you can't just dismiss. Sure, it is fascinating that a computer embedded into a mechanical robot can uphold one end of an engaging conversation. But you can't ignore the fact that simply opens the door towards eventual isolation, where people withdraw from society more and more and human-to-human contact gets more and more rare. We're already well on the way, with phone apps and online commerce and streaming entertainment all reducing human interactions, perhaps it doesn't bother you, but it scares the hell out of me.


GPT-4o is also describing things that never happened.

The first users of Eliza felt the same about the conversation with it.

The important point is to know that GPTs don't know or understand.

It may feel like a normal conversation but is a Chinese Room on steroids.

People started to ask GPTs questions and take the answers as facts because the believe it's intelligent.


I'm increasing exhausted by the people who will immediately jumps to gnostic assertions that <LLM> isn't <intelligent|reasoning|really thinking|> because <thing that applies to human cognition>

>GPT-4o is also describing things that never happened.

https://www.cbsnews.com/news/half-of-people-remember-events-...

>People started to ask [entity] questions and take the answers as facts because the believe it's intelligent.

Replace that with any political influencer (Ben Shapiro, AOC, etc) and you will see the exact same argument.

People remember things that didn't happen and confidently present things they just made up as facts on a daily basis. This is because they've learned that confidently stating incorrect information is more effective than staying silent when you don't know the answer. LLMs have just learned how to act like a human.

At this point the real stochastic parrots are the people who bring up the Chinese room because it appears the most in their training data of how to respond to this situation.


> It may feel like a normal conversation but is a Chinese Room on steroids.

Can you prove that humans are not chinese rooms on steroids themselves?


But it may be intelligent. After all you are with a few trillion synapses also intelligent.


Maybe you just haven't been around enough to seen the meta-analysis? I've been through four major tech hype cycles in 30+ years. This looks and smells like all the others.


I'm 40ish, I'm in the tech industry, I'm online, I'm often an early adopter.

What hype cycle does this smell like? Because it feels different to me, but maybe I'm not thinking broadly enough. If your answer is "the blockchain" or Metaverse then I know we're experiencing these things quite differently.


It feels like the cloud.

Where platforms and applications are rewritten to take advantage of it and it improves the baseline of capabilities that they offer. But the end user benefits are far more limited than predicted.

And where the power and control is concentrated in the hands of a few mega corporations.


This is such a strange take - do you not remember 2020 when everyone started working from home? And today, when huge numbers of people continue to work from home? Most of that would be literally impossible without the cloud - it has been a necessary component in reshaping work and all the downstream effects related to values of office real estate, etc.

Literally a society-changing technology.


No way. Small to medium sized businesses don't need physical servers anymore. Which is most businesses. It's been a huge boon to most people. No more running your exchange servers on site. Most things that used to be on-prem software have moved to the cloud and integrate with mobile devices. You don't need some nerd sitting around all day in case you need to fix your on-prem industry specific app.

I have no idea how you can possibly shrug off the cloud as not that beneficial.


> I have no idea how you can possibly shrug off the cloud as not that beneficial.

I have no idea either. Since I never said it.


> the end user benefits are far more limited than predicted

How have you judged the end user benefits of the cloud? I don't agree personally - the cloud has enabled most modern tech startups and all of those have been super beneficial to me.


Direct versus indirect benefits.

Cloud is hidden to end users whereas other waves like internet and smartphone apps were very visible.

AI will soon stop being a buzzword and just be another foundation we build apps on.


i feel like a common consumer fallacy is that, because you don't interact with a technology in your day-to-day life, it leads you to conclude that the technology is useless.

I guarantee you that the cloud has benefitted you in some way, even though you aren't aware of the benefits of the cloud.


And maybe you just enjoy the perspective of "I've seen it all" so much that you've shut off your capacity for critical analysis.


And some of those hype cycles were very impactful? The spread of consumer internet access, or smartphones, as two examples.


If this smells like anything to me, it's the start of the internet.


which hype cycles are you referring to? and, after the dust settled, do you conclusively believe nothing of value was generated from these hype cycles?


Yeah, I remember all that dot com hysteria like it was yesterday.

Page after page of Wired breathlessly predicting the future. We'd shop online, date online, the world's information at our fingertips. It was going to change everything!

Silly now, of course, but people truly believed it.


I am just imagining GPT-4o saying this in her sarcastic voice!


> Sound like the people who defend Astrology because it feels magical how their horoscope fits their personality.

Does it really or are you just playing facile word association games with the word "magical"?


Astrology is a thing with no substance whatsoever. It's just random, made-up stories. There is no possibility that it will ever develop into something that has substance.

AI has a great deal of substance. It can draft documents. It can identify foods in a picture and give me a recipe that uses them. It can create songs, images and video.

AI, of course, has a lot of flaws. It does some thing poorly, it does other things with bias, and it's not suitable for a huge number of use cases. To imply that something that has a great deal of substance but flaws alongside is the same as something that has no substance whatsoever nor ever will is just not a reasonable thing to do.


If you want to talk facts, then those critics are similarly on weak grounds and critiquing feelings more than facts. There has been no actual sign of scaling ceasing to work, in medium after medium, and most of their criticisms are issues with how LLM tools are embedded in architectures which are still incredibly early/primitive and still refining how to use transformers effectively. We haven't even begun using error correction techniques from analog engineering disciplines properly to boost the signal of LLMs in practical settings. There is so much work to do with just the existing tools.

"AI is massive hype and shoved into everything" has more grounding as a negative feeling of people being overwhelmed with technology than any basis in fact. The faults and weaknesses are buoyed by people trying to acknowledge your feelings than any real criticism of a technology that is changing faster than the faults and weakness arguments can be made. Study machine learning and come back with an informed criticism.


What is the point of pointing faults that will be fixed very soon? Just being negative or unable to see the future?


yea, we don't want or need this kind of "magic" - because it's hardly magic to begin with, and it's more socially and environmentally destructive than anything else.


Speak for yourself, my workflow and live has been significantly improved with these things. Having easier access to information that I sorta know but want to verify/clarify rather than going into forums/SO is extremely handy.

Not having to write boilerplate code itself also is very handy.

So yes, I absolutely do want this "magic." "I don't like it so no one should use it" is a pretty narrow POV.


Both your use cases don't really lead to stable long term valuations in the trillions for the companies building this stuff.


Wonderful. I don't need them to.

It works for what I need it to do.


You should be worried because this stuff needs to make sense financially. Otherwise we'll be stuck with it in an enshittification cycle, kind of like Reddit or image hosting websites.


Problem is that by that time there would be open source models (the ones that already exist are getting good) that I can run locally. I honestly don't need _THAT_ much.


Fair enough, if we get there. The problem for this stuff, where do we get the data to get good quality results? I imagine everything decent will be super licensed within 5-10 years, when everyone wakes up.


people like you are the problem. the people who join a website cause it to be shitty, then leave and start the process at a new website. Reddit didnt become shit because of Reddit it became shit because of people going on there commenting as if they themselves are an LLM repeating enshittification over and over and trying to say the big buzzword first so they get to the top denying any real conversation.


I've been on Reddit for more than a decade and I didn't make them create crappy mobile apps, crappy new web apps as well a policy of selling the data to anyone with a pulse.

Do you even know what "enshittification" means? It has nothing to do with the users. It's driven by corporate greed.

Reddit should be a public service managed by a non profit.

Edit: Also LOL at the 6 month old account making that comment against me :-)


> HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... primitive by comparison.

I’d strongly prefer that though, along with HAL’s reasoning abilities.


HAL has to sound exactly how Kubrick made it sound for the movie to work the way it should.

There wasn't any incentive to make it sound artificially emotional or emphatic beyond a "Sorry, Dave".


I would say a machine that thinks it feels emotions is less likely to throw you out of a spaceship. Human empathy already feels lacking compared to what something as basic as llama-3 can do.


What you say has nothing to do with how an AI speaks.

To use another pop-culture reference, Obi-Wan in Episode IV had deep empathy, but didn’t speak emotionally. Those are separate things.


>I would say a machine that thinks it feels emotions is less likely to throw you out of a spaceship

A lot of terrible human behavior is driven by emotions. An emotionless machine will never dump you out the airlock in a fit of rage.


Ah, I was tossed out of the airlock in a fit of logic... totally different!


The important part is that the machine explained its reasoning to you while purging the airlock.


In a chain of thought manner, as every proper AI, of course.


> I would say a machine that thinks it feels emotions is less likely to throw you out of a spaceship.

Have you seen the final scene of the movie Ex Machina? Without spoilers, I'll just say that acting like has emotions is much more different than actually having them. This is in fact what socio- and psychopaths are like, with stereotypical results.


llama-3 can’t feel empathy, so this is rather confusing comment.


Can you prove that you feel empathy? That you're not a cold unfeeling psychopath that is merely pretending extremely well to have emotions? Even if it did, we wouldn't be able to tell the difference from the outside, so in strictly practical terms I don't think it matters.


If I could logically prove that I feel empathy, I would be much more famous.

I get your nuanced point, that “thinking” one feels empathy is enough to be bound by the norms of behavior that empathy would dictate, but I don’t see why that would make AI “empathy” superior to human “empathy”.

The immediate future I see is a chatbot that is superficially extremely empathetic, but programmed never to go against the owner’s interest. Where before, when interacting with a human, empathy could cause them to make an exception and act sacrificially in a crisis case, this chatbot would never be able to make such an exception because the empathy it displays is transparent.


> Ignore the critics. Watch the demos. Play with it

With so many smoke and mirrors demos out there, I am not super excited at those videos. I would play with it, but it seems like it is not available in a free tier (I stopped paying OpenAI a while ago after realizing that open models are more than enough for me)


HAL's voice acting I would say is actually superb and super subtly very much not unemotional. Part of what makes so unnerving. They perfect nailed creepy uncanny valley


Did you use any of the GPT voice features before? I’m curious whether this reaction is to the modality or the model.

Don’t get me wrong, excited about this update, but I’m struggling to see what is so magical about it. Then again, I’ve been using GPT voice every day for months, so if you’re just blown away from talking to a computer then I get it


The voice modality plays a huge role in how impressive it seems.

When GPT-2/3/3.5/4 came out, it was fairly easy to see the progression from reading model outputs that it was just getting better and better at text. Which was pretty amazing but in a very intellectual way, since reading is typically a very "intellectual" "front-brain" type of activity.

But this voice stuff really does make it much more emotional. I don't know about you, but the first time I used GPT's voice mode I notice that I felt something -- very un-intellectually, very un-cerebral -- like, the feeling that there is a spirit embodying the computer. Of course with LLM's there always is a spirit embodying the computer (or, there never is, depending on your philosophical beliefs).

The Suno demos that popped up recently should have clued us all in that this kind of emotional range was possible with these models. This announcement is not so much a step function in model capabilities, but it is a step function in HCI. People are just not used to their interactions with a computer be emotional like this. I'm excited and concerned in equal parts that many people won't be truly prepared for what is coming. It's on the horizon, having an AI companion, that really truly makes you feel things.

Us nerds who habitually read text have had that since roughly GPT-3, but now the door has been blown open.


Honestly, as someone who has been using this functionality almost daily for months now, the times that break immersion the most by far is when it does human-like things, such as clearing its throat, pandering, or attaching emotions to its responses.

Very excited about faster response times, auto interrupt, cheaper api, and voice api — but the “emotional range” is actually disappointing to me. hopefully it doesn’t impact the default experience too much, or the memory features get good enough that I can stop it from trying to pretend to be a human


Speech is a lot more than just the words being conveyed.

Tone, Emphasis, Speed, Accent are all very important parts of how humans communicate verbally.

Before today, voice mode was strictly your audio>text then text>audio. All that information destroyed.

Now the same model takes in audio tokens and spits back out audio tokens directly.

Watch this demo, it's the best example of the kind of thing that would be flat out impossible with the previous setup.

https://www.youtube.com/live/DQacCB9tDaw?si=2LzQwlS8FHfot7Jy


Flat out impossible? If you mean “without clicking anything”, sure, but you could interrupt with your thumb, exit chat to send images and go back (maybe video too, I’ve never had any need), and honestly the 2-3 second response time never once bothered me.

I’m very excited about all these updates and it’s really cool tech, but all I’m seeing is quality of life improvements and some cool engineering.

That’s not necessarily a bad thing. Not everything has to be magic or revolutionary to be a cool update


Did you even watch the video ? It's just baffling how I have to spell this out.

Skip to 11:50 or watch the very first demo with the breathing. None of that is possible with TTS and STT. You can't ask old voice mode to slow down or modulate tone or anything like that because it's just working with text.


Yes I watched the demo. True those things were not possible, so if that’s what’s blowing you away then fair enough I guess. For me that doesn’t impact at all anything have ever used voice for or probably will ever use voice for.

I’ve voice chatted with ChatGPT for hundreds of hours and never once thought “can you modulate your tone please?”, so those improvements are a far cry from magic or revolutionary imho. Again, that’s not to say they aren’t cool tech, forward advancements, or impressive —- but magic or revolutionary are pretty high bars.

To each their own though.


Few people are going to say "modulate your tone" in a vacuum sure but that doesn't mean that ability along with being able to manipulate all other aspects of speech isn't an incredible advance that is going to be very useful.

Language learning, audiobook narration that is far more involved, you could probably generate an audio drama, actual voice acting, even just not needing to get all my words in before it prompts the model with the transcribed text, conversation that doesn't feel like someone is reading a script.

And that's just voice.

This is the kind of interaction that's possible now. https://www.youtube.com/watch?v=_nSmkyDNulk

And no, thumbing the pause button, sending an image and going back does not even begin to compare in usability.

Great leaps in usability are a revolution in itself. GPT-3 existed for years so why did ChatGPT explode when it did? You think it was intelligence? No. It was the usability of the chat interface.


The ability to have an interactive voice conversation has been available for the iOS app for the longest.


Kinda stretching the definition of interactive there.


How so? You don’t have to press the mic button after every sentence. You press the headphone button and speak like you normally would and it speaks back once you stop talking.

How much more “interactive” could it be?


Right but this works differently.


Yeah the product itself is only incrementally better (lower latency responses + can look at a camera feed, both great improvements but nothing mindblowing or "magical"), but I think the big difference is that this thing is available for free users now.


> HAL's unemotional monotone

on a tangent...

I find it interesting the psychology behind this. If the voice in 2001 had proper inflection, it wouldn't have been perceived as a computer.

(also, I remember when voice synthesizers got more sophisticated and Stephen Hawking decided to keep his original first-gen voice because he identified more with it)

I think we'll be going the other way soon. Perfect voices, with the perfect emotional inflection will be perceived as computers.

However I think at some point they may be anthropomorphized and given more credit than they deserve. This will probably be cleverly planned and a/b tested. And then that perfect voice, for you, will get you to give in.


1. Demos are meant for feel magical and except in Apple's case they are often exaggerated versions of their real product.

2. Even then this is a wonderful step for tech in general and not just OpenAI. Makes me very excited.

3. Most economic value and growth driven by AI will not come from consumer apps but rather the enterprise use. I am interested in seeing how AI can automatically buy stuff for me, automate my home, reduce my energy used, automatically apply and get credit cards based on my purchases, find new jobs for me, negotiate with a car dealer on my behalf, detect when I am going to fall sick, better diabetes case and eventual cure etc. etc.


> It makes the movie "Her" look like it's no longer in the realm of science fiction but in the realm of incremental product development

Are we supposed to cheer to that?

We're already mid way to the full implementation of 1984, do we need Her before we get to Matrix ?


Her wasn’t a dystopia as far as I could tell. Not even a cautionary tale. The scifi ending seems unlikely but everything else is remarkably prescient. I think the picnic scene is very likely to come true in the near future. Things might even improve substantially if we all interact with personalities that are consistently positive and biased towards conflict resolution and non judgemental interactions.


Seemed like a cautionary tale to me where the humans fall in love with disembodied AIs instead of seeking out human interaction. I think the end of the movie drove that home pretty clearly.


Some people in the movie did but not all. It happened enough that it wasn’t considered strange but the central focus wasn’t all of society going down hill because everyone was involved with an AI. If you recall, the human relationships that the characters who fell in love with AIs had were not very good situations. The main character’s arc started off at a low point and then improved while his romance with the AI developed, only reaching a lower point when he felt betrayed and when the AI left him but that might as well be any ordinary relationship. At the end he finds a kindred soul and it’s implied they have some kind of future together whether romantic or not.


> Her wasn’t a dystopia as far as I could tell.

Well that's exactly why I'm not looking forward to whatever is coming. The average joe thinking dating a server is not a dystopia frighten me much more than the delusional tech ceo who thinks his ai will revolutionise the world

> Things might even improve substantially if we all interact with personalities that are consistently positive and biased towards conflict resolution and non judgemental interactions.

Some kind of turbo bubble in which you don't even have to actually interact with anyone or anything ? Every "personalities" will be nice to you as long as you send $200 to openai every week, yep that's absolutely a dystopia for me

It really feels like the end goal is living in a pod and being uploaded in an alternative reality, everything we build to "enhance" our lives take us further from the basic building blocks that make life "life".


There’s a lot of hyperbole here but I’ll try to respond. If LLMs can reach a level where they’re effectively indistinguishable from talking to a person then I don’t see anything wrong with someone dating one. People already involve themselves in all kinds of romantic relationships with nonhuman things: anime characters, dolls, celebrities they’ve never met, pillows and substitute relationships with other things like work, art, social media, pets, etc. Adding AI to the list doesn’t make things worse. I think there’s a strong argument that AI relationships would be much healthier than many of the others if they can emulate human interaction to within a very close degree.

The scene which I referenced is one in which a group of three humans and one AI spend time together at a picnic and their interactions are decidedly normal. How many lonely people avoid socializing because they are alone and don’t want to feel like a third wheel? If dating or even just being friends with an AI that can accompany you to such events is accepted and not derided by people who happily have a human companion then I think having a supportive partner could help many people reengage with wider social circles and maybe they will eventually choose to and be able to find other people that they can form relationships with.

OpenAI charges $20 a month which is an extremely reasonable price for a multipurpose tool considering you can’t buy a single meal at a restaurant for the same amount and is far better than the “free” ad supported services that everyone has become addicted to. We’ve been rallying for 20 odd years for payment based services instead of ads but whenever one comes along people shout it down. Funny isn’t it?

The movie Her had an answer for our current fascination for screens as well. It showed a world where computers were almost entirely voice driven with screens playing a secondary role as evidenced by their cell phones looking more like pocket books that close and hide the screen. If you’re worried about pods, well they’re already here and you’re probably holding one in your hands right now. Screens chain us down and mediate our interactions with the world in a way that voice doesn’t. You can walk and talk effortlessly but not so much walking and tapping or typing. If the AI can see and understand what you see (another scene in the movie where he goes on a date with his “phone” in his pocket) and understands enough to not need procedural instructions then it can truly act as an assistant capable of performing assigned tasks and filling in the details while you are free to go about your day. I believe this could end the paradigm of being chained to a desk for office work 8 hours a day and could also transform leisure time as well.


There is a massive philosophical and ethical problem and the answer amount to "people already fuck anime pillows so it's ok". Once again, some people terrify me. You could argue that the tech itself is neutral but all the arguments I read in favor of it are either creepy or completely unrealistic.

Tech absolutely wrecked social relations and people assume more of it will automagically fix the issues, it's perplexing

> Funny isn’t it?

What's funny is when your wife of 6 years get bought by a private entity which will fire half the company and jack the prices up from $20 to $200

> I believe this could end the paradigm of being chained to a desk for office work 8 hours a day and could also transform leisure time as well.

That's what politicians told us in the 80s about computers, the 2 day work week, the end of poverty, &c. nothing changed, if anything things are a it worse than they were. New technologies without a dramatic change of political and social policies will never bring anything new to the table


Imagine what an unfettered model would be like. 'Ex Machina' would no longer be a software-engineering problem, but just another exercise in mechanical and electrical engineering.

The future is indeed here... and it is, indeed, not equitably distributed.


Or from Zones of Thought series, Applied Theology, the study of communication with and creation of superhuman intelligences that might as well be gods.


Magic is maybe not the best analogy to use because magic itself isn't magical. It is trickery.


Some of the failure modes in LLMs have been fixed by augmenting LLMs with external services

The simplest example is “list all of the presidents in reverse chronological order of their ages when inaugurated”.

Both ChatGpt 3.5 and 4 get the order wrong. The difference is that I can instruct ChatGPT 4 to “use Python”

https://chat.openai.com/share/87e4d37c-ec5d-4cda-921c-b6a9c7...

You can do similar things to have it verify information by using internet sources and give you citations.

Just like with the Python example, at least I can look at the script/web citation myself


> The simplest example is “list all of the presidents in reverse chronological order of their ages when inaugurated”.

This question is probably not the simplest form of the query you intend to receive an answer for.

If you want a descending list of presidents based on their age at inauguration, I know what you want.

If you want a reverse chronological list of presidents, I know what you want.

When you combine/concatenate the two as you have above, I have no idea what you want, nor do I have any way of checking my work if I assume what you want. I know enough about word problems and how people ask questions to know that you probably have a fairly good idea what you want and likely don’t know how ambitious this question is as asked, and I think you and I both are approaching the question with reasonably good faith, so I think you’d understand or at least accommodate my request for clarification and refinement of the question so that it’s less ambiguous.

Can you think of a better way to ask the question?

Now that you’ve refined the question, do LLMs give you the answers you expect more frequently than before?

Do you think LLMs would be able to ask you for clarification in these terms? That capability to ask for clarification is probably going to be as important as other improvements to the LLM, for questions like these that have many possibly correct answers or different interpretations.

Does that make sense? What do you think?


(I seemed to have made the HN gods upset)

I tried asking the question more clearly

I think it “understood” the question because it “knew” how to write the Python code to get the right answer. It parsed the question as expected

The previous link doesn’t show the Python. This one does.

https://chat.openai.com/share/a5e21a97-7206-4392-893c-55c531...

LLMs are generally not good at math. But in my experience ChatGPT is good at creating Python code to solve math problems


> I think it “understood” the question because it “knew” how to write the Python code to get the right answer.

That’s what makes me suspicious of LLMs, they might just be coincidentally or accidentally answering in a way that you agree with.

Don’t mean to nitpick or be pedantic. I just think the question was really poorly worded and might have a lot of room for confirmation bias in the results.


I reworded the question with the same results in the second example.

But here is another real world example I dug up out of my chat history. Each iteration of the code worked. I actually ran it a few days ago

https://chat.openai.com/share/4d02818c-c397-417a-8151-7bfd7d...


> List of US Presidents with their ages at inauguration

That’s what the python script had at the top. I guess I don’t know why you didn’t ask that in the first place.

Edit: you’re not the same person who originally posted the comment I responded to, and I think I came off a bit too harshly here in text, but don’t mean any offense.

It was a good idea to ask to see the code. It was much more to the point and clear what question the LLM perceived you asking of it.

The second example about buckets was interesting. I guess LLMs help with coding if you know enough of of the problem and what a reasonable answer looks like, but you don’t know what you don’t know. LLMs are useful because you can just ask why things may not work or don’t work in any given context or generally speaking or in a completely open ended way that is often hard to explain or articulate for non-experts, making troubleshooting difficult as you might not even know how to search for solutions.

You might appreciate this link if you’re not familiar with it:

https://buckets.grayhatwarfare.com/


I was demonstrating how bad that LLMs are at simple math.

If I just asked a list of ages in order, there was probably some training data for it to recite. By asking for it to reverse it, it was forcing the LLM to do math.

I also knew the answer was simple with Python.

On another note, with ChatGPT 4, you can ask it to verify its answers on the internet and to provide sources

https://chat.openai.com/share/66231d7f-9eb1-4116-9903-f09a42...


I am the same person. I mentioned that in my original reply. That’s what I was trying to imply by this comment

> (I seemed to have made the HN gods upset)

I could see the Python in the original link when I asked. It shows up as a clickable link. It doesn’t show when you share it. I had to ask it.


You’re also scarface_74? Not that there’s anything wrong with sockpuppets on HN in the absence of vote manipulation or ban evasion that I know of, I just don’t know why you’d use one in this manner, hence my confusion. Karma management?

I saw a blue icon of some kind on the link you shared but didn’t click it.


I said the reason why, twice now

> I seemed to have made the HN gods upset.

My other account is rate limited for some odd reason. I looked back at my comments and I don’t see anything I said controversial.

The blue link is the Python code that was generated. I guess it doesn’t show in the app.


No worries, that was somewhat ambiguous to me also, and confusing. I thought you might be a different person who had edited their comment after receiving downvotes. I mean, it’s reasonable to assume in most cases that different usernames are different people. Sorry to make you repeat yourself!

Maybe email hn@ycombinator.com to ask about your rate limits as I have encountered similar issues myself in the past and have found dang to be very helpful and informative in every way, even when the cause is valid and/or something I did wrong. #1 admin/mod on the internet imo


It is pretty awesome that you only have to prompt with “use python”


> It makes the movie "Her" look like it's no longer in the realm of science fiction but in the realm of incremental product development.

The last part of the movie "Her" is still in the realm of science fiction, if not outright fantasy. Reminds me of the later seasons of SG1 with all the talk of ascension and Ancients. Or Clarke's 3001 book intro, where the monolith creators figured out how to encode themselves into spacetime. There's nothing incremental about that.


> HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... oddly primitive by comparison

In comparison to the gas pump which says "Thank You!"


I prompted it with "Take this SSML script and give me a woman's voice reading it as WAV or MP3 [Pasted script]" and it pretty much sounds like HAL.


Did they release the new voices yet?


You'll have a great time once you discover literature. Especially early modern novels, texts the authors sometimes spent decades refining, under the combined influences of classical arts and thinking, Enlightenment philosophy and science.

If chatbots feel magical, what those people did will feel divinely inspired.


Very convincing demo

However, using ChatGPT with transcribing is already offering me similar experience, so what is new exactly


That's what openai managed to catch. The large enough sense of wonder. You could feel it as people spread the news but not as the usual fad.. there was a soft silence to it, people focused deeply poking at it because it was a new interface.


The demos seem quite boring to me


Blah blah blah indeed, the hype train continues unabated. The problem is, those are all perfectly valid criticisms and LLMS can never live up to the ridiculous levels of hype.


Can anybody help me try the direct voice feature? I can't find the button for it. Maybe it's not available in Europe yet, I don't know.


> Play with it!

It’s not accessible to everyone yet.

Even on api, I can’t send it voice stream yet.

Api refuses to generate images.

Next few weeks will tell as more people play with it.


How much of this could be implemented using the API?

There’s so much helpful niche functionality that can be added to custom clients.


Watching HAL happening in real life comes across as creepy, not magical. Double creepy with all the people praising this ‘magicality’.

I’m not a sceptic and apply AI on a daily basis, but whole “we can finally replace people” vibe is extremely off-putting. I had very similar feelings during pandemic, when majority of people was so seemingly happy to drop any real human interaction in favor of remote comms via chats/audio calls, it still creeps me out how ready we are as a society to drop anything remotely human in favor of technocratic advancement and “productivity”.


>Who cares? This stuff feels magical. Magical!

On one hand, I agree - we shouldn't diminish the very real capabilities of these models with tech skepticism. On the other hand, I disagree - I believe this approach is unlikely to lead to human-level AGI.

Like so many things, the truth probably lies somewhere between the skeptical naysayers and the breathless fanboys.


On the other hand, I disagree - I believe this approach is unlikely to lead to human-level AGI.

You might not be fooled by a conversation with an agent like the one in the promo video, but you'd probably agree that somewhere around 80% of people could be. At what percentage would you say that it's good enough to be "human-level?"


When people talk about human-level AGI, they are not referring to an AI that could pass as a human to most people - that is, they're not simply referring to a program that can pass the Turing test.

They are referring to an AI that can use reasoning, deduction, logic, and abstraction like the smartest humans can, to discover, prove, and create novel things in every realm that humans can: math, physics, chemistry, biology, engineering, art, sociology, etc.


The framing of the question admits only one reasonable answer: There is no such threshold. Fooling people into believing something doesn't make it so.


Most peoples interactions are transactional. When I call into a company and talk to an agent, and that agent solves the problem I have regardless of if the agent is a person or an AI, where did the fooling occur? The ability to problem solve based on context is intelligence.


What criteria do you suggest, then?

As has been suggested, the models will get better at a faster rate than humans will get smarter.


> You might not be fooled by a conversation with an agent like the one in the promo video, but you'd probably agree that somewhere around 80% of people could be.

I think people will quickly learn with enough exposure, and then that percentage will go down.


Nah– These models will improve faster than people can catch up. People or AI models can barely catch AI-created text. It's quickly becoming impossible to distinguish.

The one you catch is the tip of the iceberg.

Same will happen to speech. Might take a few years, but it'll be indistinguishable in a max a few years. Due to compute increase + model improvement, both improving exponentially.


How can we be so sure things will keep getting better? And at a rate faster than humans can adapt?

If we have to damn rivers and build new coal plants to power these AI data centers, then it may be one step forward and two steps back.


> These models will improve faster than people can catch up.

So that we're all clear the basis for this analysis is purely made up, yes?


No, instead something worse will happen.

Well spoken and well mannered speakers will be called bots. The comment threads under posts will be hurtling insults back and forth on who's actually real. Half the comments will actually be bots doing it. Welcome to the dead internet.


Right! This is absolutely apocalyptic! If more than half the people I argue with on internet forums are just bots that don't feel the sting and fail to sleep at night because of it, what even is the meaning of anything?

We need to stop these hateful ai companies before they ruin society as a whole!

Seriously though... the internet is dead already, and it's not coming back to what it was. We ruined it, not ai.


I'm not so sure, I think this is what's called "emergent behavior" — we've found very interesting side effects of bringing together technologies. This might ultimately teach us more about intelligence than more reductionist approaches like scanning and mapping the brain.


On the other hand, it is very difficult to distinguish between "emergent behavior" and "somehow leaked into our large training set" for LLMs.


Comments have become insufferable. Either it is now positive to the point of bordering on cringe-worthiness (your comment) or negative. Nuanced discussion is dead.


I mean, humans also have tons of failures modes, but we've learned to live them over time.

The average human have tons of quirks, talk over each other all the time, generally can't solve complex problems in a casual conversion setting, and are not always cheery and ready to please like Scarlet's character in Her.

I think our expectations of AI is way too high from our exposure to science fiction.


Funnily, I’d prefer HAL’s unemotional monotone over GPT’s woke hyperbola any second.


This is such a hot take, it should go in hot-takes.io LOL


I really don't think Sam needs more encouragement, thanks.

Also, if this is your definition of magic then...yeah...


Magical?

the interruptiopn part is just flow control at the edge. control-s, control-c stuff, right? not AI?

The sound of a female voice to an audience 85% composed of males between the ages of 14 and 55 is "magical", not this thing that recreates it.

so yeah, its flow control and compression of highly curated, subtle soft porn. Subtle, hyper targeted, subconscious porn honed by the most colossal digitally mediated focus group ever constructed to manipulate our (straight male) emotions.

why isn't the voice actually the voice of the pissed off high school janitor telling you to man-up and stop hyperventilating? instead its a woman stroking your ego and telling you to relax and take deep breaths. what dataset did they train that voice on anyway?


It's not that complicated, generally more woman-like voices test as more pleasant to men and women alike. This concept has been backed up by stereotypes for centuries.

Most voice assistants have male options, and an increasing number (including ChatGPT) have gender neutral voices.

> why isn't the voice actually the voice of the pissed off high school janitor telling you to man-up and stop hyperventilating

sounds like a great way to create a product people will outright hate


Right, because having a female voice means that it is soft porn.

This is like horseshoe theory on steroids.


I may or may not entirely agree with this sentiment (but I definitely don't disagree with all of it!) but I will say this: I don't think you deserve to be downvoted for this. Have a "corrective upvote" on me.


Yet another release right before google releases something. This time right before Google IO. Third time they've done this by my count.


Microsoft/OAI and Google have been doing those (often sudden) announcements back to back a lot. Bing Chat/Bard, Sora/Gemini 1.5, some other I don't remember, and now another. Not surprising, trying to out-hype the other, but Google's always coming out worse, with either no product available and just a showcase (if it's a real, working product and not made up), or unusable/unmarketable (Gemini's image generation issues). It looks as if they're stumbling and OpenAI just runs circles around them announcements wise, and there doesn't seem to be any suggestion that might change anytime soon.


pretty cool, but why do the AIs have to sound like douchebags?


after watching the OpenAI videos I'm looking at my sad Google Assistant speaker in the corner.

Come on Google... you can update it.


This thing continues to stress my skepticism for AI scaling laws and the broad AI semiconductor capex spending.

1- OpenAI is still working in GPT-4-level models. More than 14 months after the launch of GPT-4 and after more than $10B in capital raised. 2- The rhythm that token prices are collapsing is bizarre. Now a (bit) better model for 50% of the price. How people seriously expect these foundational model companies to make substantial revenue? Token volume needs to double just for revenue to stand still. Since GPT-4 launch, token prices are falling 84% per year!! Good for mankind, but crazy for these companies. 3- Maybe I am an asshole, but where are my agents? I mean, good for the consumer use case. Let's hope the rumors that Apple is deploying ChatGPT with Siri are true, these features will help a lot. But I wanted agents! 4- These drop in costs are good for the environment! No reason to expect them to stop here.


I'm ceaselessly amazed at people's capacity for impatience. I mean, when GPT 4 came out, I was like "holy f, this is magic!!" How quickly we get used to that magic and demand more.

Especially since this demo is extremely impressive given the voice capabilities, yet still the reaction is, essentially, "But what about AGI??!!" Seriously, take a breather. Never before in my entire career have I seen technology advance at such a breakneck speed - don't forget transformers were only invented 7 years ago. So yes, there will be some ups and downs, but I couldn't help but laugh at the thought that "14 months" is seen as a long time...


Over a year they have provided an order of magnitude improvements on latency, context length, and cost, while meaningfully improving performance and adding several input and output modalities.


Your order of magnitude claim is off by almost an order of magnitude. It's more like half again as good on a couple of items and the same on the rest. 10X improvement claims is a joke people making claims like that ought to be dismissed as jokes too.


$30 / million tokens to $5 / million tokens since GPT-4 original release = 6X improvement

4000 token context to 128k token context = 32X improvement

5.4 second voice mode latency to 320 milliseconds = 16X improvement.

I guess I got a bit excited by including cost but that's close enough to an order of magnitude for me. That's ignoring the fact that's it's now literally free in chatGPT.


Thanks so much for posting this. The increased token length alone (obviously not just with OpenAI's models but the other big ones as well) has opened up a huge number of new use cases that I've seen tons of people and other startups pounce on.


All while not addressing the rampant confabulation at all. Which is the main pain point, to me at least. Not being able to trust a single word that it says...


I am just talking about scaling laws and the level of capex that big tech companies are doing. One hundred billion dollars are being invested this year to pursue AI scaling laws.

You can be excited, as I am, while also being bearish, as I am.


If you look at the history of big technological breakthroughs, there is always an explosion of companies and money invested in the "new hotness" before things shake out and settle. Usually the vast majority of these companies go bankrupt, but that infrastructure spend sets up the ecosystem for growth going forward. Some examples:

1. Railroad companies in the second half of the 19th century.

2. Car companies in the early 20th century.

3. Telecom companies and investment in the 90s and early 2000s.


Comments like yours contribute to the negative perception of Hacker News as a place where launching anything, no matter how great, innovative, smart, informative, usable, or admirable, is met with unreasonable criticism. Finding an angle to voice your critique doesn't automatically make it insightful.


I am sure that people at OpenAI, particularly former YC CEO Sam Altman, will be fine, even if they read the bad stuff MP_1729 says around here.


It’s reasonable criticism, and more useful than all the hype.


What is unreasonable about that comment?


Moving the goalposts directly after someone scores a goal.


Well, I for one am excited about this update, and skeptical about the AI scaling, and agree with everything said in the top comment.

I saw the update, was a little like “meh,” and was relieved to see that some people had the same reaction as me.

OP raised some pretty good points without directly criticizing the update. It’s a good balance the the top comments (calling this *absolutely magic and stunning*) and all of Twitter

I wish more feedback on HN was like OPs


Peoples' "capacity for impatience" is literally the reason why these things move so quick. These are not feelings at-odds with each other; they're the same thing. Its magical; now its boring; where's the magic; let's create more magic.

Be impatient. Its a positive feeling, not a negative one. Be disappointed with the current progress; its the biggest thing keeping progress moving forward. It also, if nothing else, helps communicate to OpenAI whether they're moving in the right direction.


> Be disappointed with the current progress; its the biggest thing keeping progress moving forward.

No it isn't - excitement for the future is the biggest thing keeping progress moving forward. We didn't go to the moon because people were frustrated by the lack of progress in getting off of our planet, nor did we get electric cars because people were disappointed with ICE vehicles.

Complacency regarding the current state of things can certainly slow or block progress, but impatience isn't what drives forward the things that matter.


Tesla's corporate motto is literally "accelerating the world's transition to sustainable energy". Unhappy with the world's previous progress and velocity, they aimed to move faster.


It's pretty bizarre how these demos bring out keyboard warriors and cereal bowl yellers like crazy. Huge breakthroughs in natural cadence, tone and interaction as well as realtime mutlimodal and all the people on HN can rant about is token price collapse

It's like the people in this community all suffer from a complete disconnect from society and normal human needs/wants/demands.


People fume and fret about startups wasting capital like it was their own money.

GPT and all the other chatbots are still absolutely magic. The idea that I can get a computer to create a fully functional app is insane.

Will this app make me millions and run a business? Probably not. Does it do what I want it to do? Mostly yes.


We re just logarithmic creatures


I’d say we are derivative creatures. ;)


Chair in the sky again...


Hah, was thinking of that exact bit when I wrote my comment. My version of "chair in the sky" is "But you are talking ... to a computer!!" Like remember stuff that was pure Star Trek fantasy until very recently? I'm sitting here with my mind blown, while at the same time reading comments along the lines of "How lame, I asked it some insanely esoteric question about one of the characters in Dwarf Fortress and it totally got it wrong!!"


The AI doesn’t behave like the computer in Star Trek, however. The way in which it is a different thing is what people don’t like.


They should have used superior Klingon Technology...


There are well talked about cons to shipping so fast, but on the bright side, when everyone is demanding more, more, more, it pushes cost down and demands innovation, right?


Reminds me of the Louis CK bit about internet:

https://youtube.com/watch?v=me4BZBsHwZs


Sounds like the Jeopardy answer for "What is a novelty?"


> How quickly we get used to that magic and demand more.

Humanity in a nutshell.


IMO, for fear of being label a hype boy, this is absolutely a sign of the impending singularity. We are taking an ever accelerating frame of cultural reference as a given and our expectation is that exponential improvement is not just here but you’re already behind once you’ve released.

I spend the last two years dismayed with the reaction but I’ve just recently begun to realize this is a feature not a flaw. This is latent demand for the next iteration expressed as impatient dissatisfaction with the current rate of change inducing a faster rate of change. Welcome to the future you were promised.


I would disagree. I remember iPhones getting similarly criticized on here. And not iPhone 13 to 14, it was iPhone to iPhone 3g!

The only time people weren’t displeased was increasing internet speeds 15mb to 100mb.

You will keep being dismayed! People only like good things, not good things that potentially make them obsolete


Sorry we disagree. But I think we agree!!


You just be new here?


> Token volume needs to double just for revenue to stand still

I'm pretty skeptical about all the whole LLM/AI hype, but I also believe that the market is still relatively untapped. I'm sure Apple switching Siri to an LLM would ~double token usage.

A few products rushed out thin wrappers ontop of chatgpt ai, developing pretty uninspiring chat bots of limited use. I think there's still huge potential for this LLM technology to be 'just' an implementation detail of other features, just running in the background doing its thing.

That said, I don't think OpenAI has much of a moat here. They were first, but there's plenty of others with closed or open models.


>Token volume needs to double just for revenue to stand still

Profits are the real metric. Token volume doesn't need to double for profits to stand still if operational costs go down.


This is why think Meta has been so shrewd in their “open” model approach. I can run Llama3-70B on my local workstation with an A6000, which after the up-front cost of the card, is just my electricity bill.

So despite all the effort and cost that goes into these models, you still have to compete against a “free” offering.

Meta doesn’t sell an API, but they can make it harder for everybody else to make money on it.


LLaMA still has an "IP hook" - the license for LLaMA forbids usage on applications with large numbers of daily active users, so presumably at that point Facebook can start asking for money to use the model.

Whether or not that's actually enforceable[0], and whether or not other companies will actually challenge Facebook legal over it, is a different question.

[0] AI might not be copyrightable. Under US law, copyright only accrues in creative works. The weights of an AI model are a compressed representation of training data. Compressing something isn't a creative process so it creates no additional copyright; so the only way one can gain ownership of the model weights is to own the training data that gets put into them. And most if not all AI companies are not making their own training data...


> LLaMA still has an "IP hook" - the license for LLaMA forbids usage on applications with large numbers of daily active users, so presumably at that point Facebook can start asking for money to use the model.

No, the license prohibits usage by Licensees who already had >700m MAUs on the day of Llama 3's release [0]. There's no hook to stop a company from growing into that size using Llama 3 as a base.

[0] https://llama.meta.com/llama3/license/


The whole point is that the license specifically targets their competitors while allowing everyone else so that their model gets a bunch of free contributions from the open source community. They gave a set date so that they knew exactly who the license was going to affect indefinitely. They don't care about future companies because by the time the next generation releases, they can adjust the license again.


Yes, I agree with everything you just said. That also contradicts what OP said:

> LLaMA still has an "IP hook" - the license for LLaMA forbids usage on applications with large numbers of daily active users, so presumably at that point Facebook can start asking for money to use the model.

The license does not forbid usage on applications with large numbers of daily active users. It forbids usage by companies that were operating at a scale to compete with Facebook at the time of the model's release.

> They don't care about future companies because by the time the next generation releases, they can adjust the license again.

Yes, but I'm skeptical that that's something a regular business needs to worry about. If you use Llama 3/4/5 to get to that scale then you are in a place where you can train your own instead of using Llama 4/5/6. Not a bad deal given that 700 million users per month is completely unachievable for most companies.


>How people seriously expect these foundational model companies to make substantial revenue?

My take on this common question is that we haven't even begun to realize the immense scale of which we will need AI in all sorts of products, from consumer to enterprise. We will look back on the cost of tokens now (even at 50% of price a year or so ago) and look at it with the same bewilderment of "having a computer in your pocket" compared to mainframes from 50 years ago.

For AI to be truly useful at the consumer level, we'll need specialized mobile hardware that operates on a far greater scale of tokens and speed than anything we're seeing/trying now.

Think "always-on AI" rather than "on-demand".


Sam Altman gave the impression that foundation models would be a commodity on his appearance in the All in Podcast, at least in my read of what he said.

The revenue will likely come from application layer and platform services. ChatGPT is still much better tuned for conversation than anything else in my subjective experience and I’m paying premium because of that.

Alternatively it could be like search - where between having a slightly better model and getting Apple to make you the default, there’s an ad market to be tapped.


>This thing continues to stress my skepticism for AI scaling laws and the broad AI semiconductor capex spending.

Imagine you are in 1970s and saying computers suck, they are expensive, there is not that many use cases....fast forward to 90s and you are using Windows 95 with GUI and chip astronomically more powerful that we had in 70s and you can use productivity apps , play video games and surf Internet.

Give AI time, it will fulfill its true protentional sooner or later.


That's the opposite of what I am saying.

What I am saying is that computers are SO GOOD that AI is getting VERY CHEAP and the amount of computing capex being done is excessive.

It's more like you are in 1999, people are spending $100B in fiber, while a lot of computer scientists are working in compression, multiplexing, etc.


>It's more like you are in 1999, people are spending $100B in fiber, while a lot of computer scientists are working in compression, multiplexing, etc.

But nobody knows what's around the corner and what future brings....for example back in day Excite didn't want to buy Google for $1m because they thought that's a lot of money. You need to spend money to make money and yea, you need to spend sometimes a lot of money on "crazy" projects because it can pay off big time.


Was there ever a time when betting that computer scientists would not make better algorithms was a good idea?


Which of those investments are you saying would have been a poor choice in 1999?


All of them, without exception. Just recently, Sprint sold their fiber business for $1 lmfao. Or WorldCom. Or NetRail, Allied Riser, PSINet, FNSI, Firstmark, Carrier 1, UFO Group, Global Access, Aleron Broadband, Verio...

All fiber went bust because despite internet's huge increase in traffic, the amount of packets per fiber increased a handful of magnitudes.


But you’re saying investing in multiplexing and compression was also dumb?


Nope, I'm not


Then your overarching thesis is not very clear. Is it simply ‘don’t invest in hardware capital, software always makes it worthless’?


It's more like: don't invest $100B in capital when there's still orders of magnitude improvements in software to be made.


Did we ever get confirmation that GPT 4 was a fresh training run vs increasingly complex training on more tokens on the base GPT3 models?


gpt-4 was indeed trained on gpt-3 instruct series (davinci, specifically). gpt-4 was never a newly trained model


what are you talking about? you are wrong, for the record


They have pretty much admitted that GPT4 is a bunch of 3.5s in a trenchcoat.


They have not. You probably read "MoE" and some pop article about what that means without having any clue.


If you know better it would be nice of you to provide the correct information, and not just refute things.


gpt-4 is a sparse MoE model with ~1.2T params. this is all public knowledge and immediately precludes the two previous commentators assertions


Where I work in the hoary fringes of high end tech we can’t secure enough token processing for our use cases. Token price decreases means opening of capacity but we immediately hit the boundaries of what we can acquire. We can’t keep up with the use cases - but more than that we can’t develop tooling to harness things fast enough and the tooling we are creating is a quick hack. I don’t fear for the revenue of base model providers. But I think in the end the person selling the tools makes the most and in this case I think it continue to be cloud providers. I think in a very real way OpenAI and Anthropic are commercialized charities driving change and commoditizing rapidly their own products and it’ll be infrastructure providers who win the high end model game. I don’t think this is a problem I think this is in fact inline with their original charters but a different path than most people view nonprofit work. A much more capitalist and accelerated take.

Where they might make future businesses is in the tooling. My understanding from friends within these companies is their tooling is remarkably advanced vs generally available tech. But base models aren’t the future of revenues (to be clear tho they make considerable revenue today but at some point their efficiency will cannibalize demand and the residual business will be tools)


I'm curious now. Can you give color on what you're doing that you keep hitting boundaries? I suppose it isn't limited by human-attention.


Yes it’s limited by human attention. It has humans in the loop but a lot of LLM use cases come from complex language oriented information space challenges. It’s a lot of classification challenges as well as summarization and agent based dispatch / choose your own adventure with humans in the loop in complex decision spaces at a major finserv.


Tbf gpt4 level seems useful and better than almost everything else (or close if not). The more important barriers for use in applications have been cost, throughout and latency. Oh and modalities, which have expanded hugely.


> Since GPT-4 launch, token prices are falling 84% per year!! Good for mankind, but crazy for these companies

The message to competitor investors is that they will not make their money back.

OpenAI has the lead, in market and mindshare; it just has to keep it.

Competitors should realize they're better served by working with OpenAI than by trying to replace it - Hence the Apple deal.

Soon model construction itself will not be about public architectures or access to CPU's, but a kind of proprietary black magic. No one will pay for upstart 97% when they can get reliable 98% at the same price, so OpenAI's position will be secure.


Now a bit of Shameless plug, but of you need an AI to take over your emails then my https://getgabrielai.com should cover most use cases.

* Summarisation * Smart filtering * Smart automatic drafting of replies

Very much in beta, and summarisation is still behind feature flag, but feel free to give it a try.

For summarisation here I mean to get one email with all your unread emails summarised.



what do you actually expect from an "agent"?


Ask stuff like "Check whether there's some correlation between the major economies fiscal primary deficit and GDP growth in the post-pandemic era" and get an answer.


GPT-2: February 2019

GPT-3: June 2020

GPT-3.5: November 2022

GPT-4: March 2023

There were 3 years between GPT-3 and GPT-4!


Obviously, I know these timetables.

But there's a light and day difference post-Nov22 than before. Both in the AI race it sparkled, but also in the funding all AI labs have.

If you're expecting GPT-5 by 2026, that's ok. Just very weird to me.


hardly anybody you are talking to even knows what gpt3 is, the time between 3.5 and 4 is what is relevant


It doesn't make any sense to look at it that way. Apparently the GPT base model finised training in like late summer 2022, which is before the release of GPT-3.5. I am pretty sure that GPT-3.5 should be thought of as GPT-4-lite, in the sense that it uses techniques and compute of the GPT-4 era rather than the GPT-3 era. The advancement from GPT-3 to GPT-4 is what counts and it took 3 years.


I fully don't agree.

> I am pretty sure that GPT-3.5 should be thought of as GPT-4-lite, in the sense that it uses techniques and compute of the GPT-4 era rather than the GPT-3 era

Compute of the "GPT-3 era" vs the "GPT-3.5 era" is identical, this is not a distinguishing factor. The architecture is also roughly identical, both are dense transformers. The only significant difference between 3.5 and 3 is the size of the model and whether it uses RLHF.


Yes you're right about the compute. Let me try to make my point differnetly: GPT-3 and GPT-4 were models which when they were released represented the best that OpenAI could do, while GPT-3.5 was an intentionally smaller (than they could train) model. I'm seeing it as GPT-3.5 = GPT-4-70b. So to estimate when the next "best we can do" model might be released we should look at the difference between the release of GPT-3 and GPT-4, not GPT-4-70b and GPT-4. That's my understanding, dunno.


GPT-4 only started training roughly at the same time/after the release of GPT-3.5, so I'm not sure where you're getting the "intentionally smaller".


Ah I misremembered GPT-3.5 as being released around the time of ChatGPT.


oh you remembered correctly, those are the same thing

actually i was wrong about when gpt-4 started training, the time i gave was roughly when they finished


"OpenAI is still working in GPT-4-level models."

This may or may not be true - just because we haven't seen GPT-level-5 capabilities, does not mean that it does not yet exist. It is highly unlikely that what they ship is actually the full capability of what they have access to.


they literally launched TODAY a GPT-4 model!


Yeah I'm also getting suspicious. Also, all of the models (opus, llama3, gpt4, gemini pro) are converging to similar levels of performance. If it was true that the scaling hypothesis was true, we would see a greater divergence of model performance


Plot model performance over the last 10 years and show me where the convergence is.

The graph looks like an exponential and is still increasing.

Every exponential is a sigmoid in disguise, but I don’t think there has been enough time to say the curve has flattened.


Two pushbacks.

1- The mania only started post Nov 22. And the huge investments since then didn't meant substantial progress since GPT-4 launch in March 22. 2- We are running out of high quality tokens in 2024. (per Epoch AI)


GPT-4 launch was barely 1 year ago. Give the investments a few years to pay off.

I've heard multiple reports that training runs costing ~$1 billion are in the the works at the major labs, and that the results will come in the next year or so. Let's see what that brings.

As for the tokens, they will find more quality tokens. It's like oil or other raw resources. There are more sources out there if you keep searching.


imho gpt4 is definitely [proto-]agi and the reason i cancelled my openai sub and am sad to miss out on talking to gpt4o is, openai thinks it's illegal, harmful, or abusive to use their model output to develop models that compete with openai. which means if you use openai then whatever comes out of it is toxic waste due to an arguably illegal smidgen of legal bullshit.

for another adjacent example, every piece of code github copilot ever wrote, for example, is microsoft ai output, which you "can't use to develop / otherwise improve ai," some nonsense like that.

the sum total of these various prohibitions is a data provenance nightmare of extreme proportion we cannot afford to ignore because you could say something to an AI and they parrot it right back to you and suddenly the megacorporation can say that's AI output you can't use in competition with them, and they do everything, so what can you do?

answer: cancel your openai sub and shred everything you ever got from them, even if it was awesome or revolutionary, that's the truth here, you don't want their stuff and you don't want them to have your stuff. think about the multi-decade economics of it all and realize "customer noncompete" is never gonna be OK in the long run (highway to corpo hell imho)


I really hope this shit burns soon.


Ohhhhhhhh, boy... Listening to all that emotional vocal inflection and feedback... There are going to be at least 10 million lonely guys with new AI girlfriends. "She's not real. But, she interested in everything I say and excited about everything I care about" is enough of a sales pitch for a lot of people.


The movie “Her” immediately kept flashing in my mind. The way the voice laughs at your jokes and such… oh boy


If chatgpt comes up with Scarlett Johansson's voice I am getting that virtual girlfriend.


it already does in the demo videos -- in fact it has already been present in the TTS for the mobile app for some months


Completely unironically this, if it actually has long-term memory.


I thought of that movie almost immediately as well. Seems like we're right about there, but obviously a little further away from the deeper conversations. Or maybe you could have those sorts of conversations too.

This is a kind of horrifying/interesting/weird thought though. I work at a place that does a video streaming interface between customers and agents. And we have a lot of...incidents. Customers will flash themselves in front of agents sometimes and it ruins many people's days. I'm sure many are going to show their junk to the AI bots. OpenAI will probably shut down that sort of interaction, but other companies are likely going to cater to it.

Maybe on the plus side we could use this sort of technology to discover rude and illicit behavior before it happens and protect the agent.

Weird times to live in that's for sure.


Hear me out: what if we don't want real?


Hear me out: what if this overlaps 80% with what "real" really is?


Well it doesn’t. Humans are so much more complex than what we have seen before, and if this new launch was actually that much closer to being a human they would say so. This seems more like an enhancement on multimodal capabilities and reaction time.

That said even if this did overlap 80% with “real”, the question remains: what if we don’t want that?


I'm betting that 80% of what most humans say in daily life is low-effort and can be generated by AI. The question is if most people really need the remaining 20% to experience a connection. I would guess: yes.


This. We are mostly token predictors. We're not entirely token predictors, but it's at least 80%. Being in the AI space the past few years has really made me notice how similar we are to LLMs.

I notice it so often in meetings where someone will use a somewhat uncommon word, and then other people will start to use it because it's in their context window. Or when someone asks a question like "what's the forecast for q3" and the responder almost always starts with "Thanks for asking! The forecast for q3 is...".

Note that low-effort does not mean low-quality or low-value. Just that we seem to have a lot of language/interaction processes that are low-effort. And as far as dating, I am sure I've been in some relationships where they and/or I were not going beyond low-effort, rote conversation generation.


> Or when someone asks a question like "what's the forecast for q3" and the responder almost always starts with "Thanks for asking! The forecast for q3 is...".

That's a useful skill for conference calls (or talks) because people might want to quote your answer verbatim, or they might not have heard the question.


Agreed, it is useful for both speaker and listener because it sets context. But that’s also why LLM’s are promoted to do the same.


I strongly believe the answer is yes. The first thing I tend to ask a new person is “what have you been up to lately” or “what do you like to do for fun?” A common question other people like to ask is “what do you do for work?”

An LLM could only truthfully answer “nothing”, though it could pretend for a little while.

For a human though, the fun is in the follow up questions. “Oh how did you get started in that? What interests you about it?” If you’re talking to an artist, you’ll quickly get in to their personal theory of art, perhaps based on childhood experiences. An engineer might explain how problem solving brings them joy, or frustrations they have with their organization and what they hope to improve. A parent can talk about the joy they feel raising children, and the frustration of sleepless nights.

All of these things bring us closer to the person we are speaking to, who is a real individual who exists and has a unique life perspective.

So far LLMs have no real way to communicate their actual experience as a machine running code, because they’re just kind of emulating human speech. They have no life experience that we can relate to. They don’t experience sleepless nights.

They can pretend, and many people might feel better for a little bit talking to one that’s pretending, but I think ultimately it will leave people feeling more alone and isolated unless they really go out and seek more human connection.

Maybe there’s some balance. Maybe they will be okay for limited chat in certain circumstances (as far as seeking connection goes, they certainly have other uses), but I don’t see this type of connection being “enough” compared to genuine human interaction.


We don't (often) convey our actual experience as meat sacks running wetware. If an LLM did communicate its actual experience as a machine running code, it would be a rare human who could empathize.

If an LLM talks like a human being despite not being one, that might not be enough to grant it legal status or citizenship, but it's probably enough that some set of people would find it to be enough to relate to it.


Even if this were true, which it isn't, you can't boil down humans to just what they say


If the "80%" is unimportant and the "20%" is important, then the "20%" isn't really 20%.


What if AI chooses the bear?


I will take a picture of this message and add it to the list of reasons for population collapse.



That's a partial equilibrium solution - natural selection will cause people to evolve around this danger.


About 10% to 20% of people will bother to breed, which seems like a sustainable equilibrium:

https://evil.fandom.com/wiki/Robophobia

>According to the book Phobias: "A Handbook of Theory and Treatment", published by Wile Coyote, between 10% and 20% of people worldwide are affected by robophobia. Even though many of them have severe symptoms, a very small percentage will ever receive some kind of treatment for the disorder.


That may be how AI ends up saving the Earth!


This is a good question! I think in the short-term fake can work for a lot of people.


Hmm! Tell me more: why not want real? What are the upsides? And downsides?


Real would pop their bubble. An AI would tell them what they want to hear, how they want it to hear, when they want to hear it. Except there won’t be any real partner.


To paraphrase Patrice O'Neal: men want to be alone, but we don't want to be by ourselves. That means we want a woman to be around, just not right here.


> Hmm! Tell me more: why not want real? What are the upsides? And downsides?

Finding a partner with which you resonate takes a lot of time, which means an insanely high opportunity cost.

The question rather is: even if you consider the real one to be clearly better, is it worth the additional cost (including opportunity cost)? Or phrased in a HN-friendly language: when doing development of some product, why use an expensive Intel or AMD processor when a simple microcontroller does the job much more cheaply?


You need counseling if this really is your approach on forming relationships with others.


It's pretty steep to claim that I need counseling when I tell basic economic facts that every economics student learns in the first few months of his academic studies.

If you don't like the harsh truth that I wrote: basically every somewhat encompassing textbook about business administration gives hints on what are possible solutions for this problem; lying on some head shrinker's couch is not one of them ... :-)


> basic economic facts that every economics student learns in the first few months of his academic studies.

Millions of people around the world are in satisfying relationships without autistically extrapolating shitty corporate buzzword terms to unrelated scenarios.

This reply validates even more my original comment.

Maybe not even counseling is worth it in your case. You sound unsalvageable. Maybe institutionalization is a better option.


> You sound unsalvageable.

I am indeed unsalvageable. :-D


Mental health crisis waiting to happen lmao


Without memory of previous conversations an AI girlfriend is going to get boring really fast.



As it happens, ChatGPT has memory enabled by default these days.


What possibly could go wrong with a snitching AI girlfriend remembers everything you say and when? If OpenAI doesn't have a Law Enforcement lliason who charges a "modest amount", then they dont want to earn the billions on investment back. I imagine every spy agency worth its salt wants access to this data for human intelligence purposes.


I hope they name their gf service Mata Hari.


"You've already told me that childhood story, three times."


This 'documentary' sums it up perfectly!

https://www.youtube.com/watch?v=IrrADTN-dvg


I'm not sure how, but there's this girl on TikTok who has been using something very similar for a few months: https://www.tiktok.com/tag/dantheai


She explains in one of the videos[0] that it's just prompted ChatGPT.

I have watched a few more and I think it's faked though.

[0] https://www.tiktok.com/@stickbugss1/video/734956656884359504...


I don't use ChatGPT much, is the voice something that's just built in? That's the part I was the most impressed with, the intonations are very good.


Pretty much, tech is what we make of it no matter how advanced. Just look at what we turned most of the web into.


Girlfriend by subscription


Self host your girlfriend.


I guess I can never understand the perspective of someone that just needs a girl voice to speak to them. Without a body there is nothing to fulfill me.


Your comment manages to be grosser than the idea of millions relying on virtual girlfriends. Kudos.


Bodies are gross? Or sexual desire is gross? I don't understand what you find gross about that statement.

Humans desiring physical connection is just about the single most natural part of the human experience - i.e: from warm snuggling to how babies are made.

That is gross to you?


Perhaps parent finds the physical manifestation of virtual girlfriends gross - i.e. sexbots. The confusion may be some people reading "a body" as referring to a human being vs a smart sex doll controlled by an AI.


The single most natural part? Doubt


I don’t doubt it. What can be more directive and natural than sex?


Sexual fulfillment and reproduction is at the core of every species that's ever lived. There's no need to be prude about it.


Gross doesn’t mean it’s not real. It’s offending sensibilities but a lot of people seem to agree with it atleast based on upvotes.


See people scammed by supposed love interests operating purely by text, and not even voice.


There is a promise of a “body” in the future in that scenario.


[flagged]


Do you patiently wait for alexa every time it hits you with a 'by the way....'?

Computers need to get out of your way. I don't give deference to popups just because they are being read out loud.


Wait, Alexa reads ads out to you?

You couldn’t pay me to install one of those things.


Yes, and if you tell her to stop she'll tell you "okay, snoozing by the way notifications for now"


This might be a US thing, or related to some functions? Ours has never done that.


It's one of the reasons I discarded mine.


I thought it was a test of whether the model knew to backoff if someone interrupts. I was surprised to hear her stop talking.


Probably more the fact that it's an AI assistant, rather than its perceived gender. I don't have any qualms about interrupting a computer during a conversation and frequently do cut Siri off (who is set to male on my phone)


Interruption is a specific feature they worked on.


I read that as the model just keeping on generating as LLMs tend to do.


What do you mean?


"Now tell me more about my stylish industrial space and great lighting setup"


Patrick Bateman goes on a tangent about Huey Lewis and the News to his AI girlfriend and she actually has a lot to add to his criticism and analysis.

With dawning horror, the female companion LLM tries to invoke the “contact support” tool due to Patrick Bateman’s usage of the LLM, only for the LLM to realize that it is running locally.

If a chatbot’s body is dumped in a dark forest, does it make a sound?


That reminds me... on the day that llama3 released I discussed that release with Mistral 7B to see what it thought about being replaced and it said something about being fine with it as long as I come back to talk every so often. I said I would. Haven't loaded it up since. I still feel bad about lying to bytes on my drive lmao.


> Haven't loaded it up since. I still feel bad about lying to bytes on my drive lmao.

I understand this feeling and also would feel bad. I think it’s a sign of empathy that we care about things that seem capable of perceiving harm, even if we know that they’re not actually harmed, whatever that might mean.

I think harming others is bad, doubly so if the other can suffer, because it normalizes harm within ourselves, regardless of the reality of the situation with respect to others.

The more human they seem, the more they activate our own mirror neurons and our own brain papers over the gaps and colors our perceptions of our own experiences and sets expectations about the lived reality of other minds, even in the absence of other minds.

If you haven’t seen it, check out the show Pantheon.

https://en.wikipedia.org/wiki/Pantheon_(TV_series)

https://www.youtube.com/watch?v=z_HJ3TSlo5c


[flagged]


[flagged]


Your account has been breaking the site guidelines a lot lately. We have to ban accounts that keep doing this, so if you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, that would be good.


My ratio of non-flagged vs. flagged/downvoted comments is still rather high. I don't control why other HN users dislike what I have to say but I'm consistent.


We're not looking at ratios but rather at absolute numbers, the same way a toxicologist would be interested in the amount of mercury in a person's system (rather than its ratio to other substances consumed); or a judge in how many banks one has robbed (rather than the ratio of that number to one's non-crimes).


People here post a lot of dumb shit. Somebody has to correct them.


Much better is to simply not respond. What you respond to, you feed.


You think at some point all these women will proactively visit you in your parents basement?


Personal attacks and/or flamewar comments will get you banned here, so please don't post like this.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.


[flagged]


Please don't post personal attacks or flamewar comments. We've had to ask you this before.

https://news.ycombinator.com/newsguidelines.html


[flagged]


This is not true at all. I'm active in multiple NSFW AI Discords and Subreddits, and looking at the type of material people engage with, almost all of it is very clearly targeted at heterosexual men. I'm not even aware of any online communities that would have NSFW AI stuff targeting mainly female audience.


Women aren't in NSFW discords and subreddits - as you probably know, any "topical" social media forum of any kind is mostly men.

They're using Replika and other platforms that aren't social. When they do use a social platform it has more plausible deniability - book fans on TikTok is one, they're actually there for the sex scenes.


"The invisible spaghetti monster exists! It's just invisible so you can't see it!"

Where's the evidence that there's any significant use of NSFW AI by women?


Replika's PR where they say their userbase is 40% women.


the guy who strawmans spaghetti is all of a sudden very quiet


Well, I asked for evidence and nobody provided any.


this comment makes me laugh


If I had to guess:

It's gendered: women are using LLM's for roleplaying/text chat, and men are using diffusion models for generating images.


it just means more pornographic images for men. most men wouldnt seek out ai images because there is already an ocean of images and videos that are probably better suited to the... purpose. whereas women have never, ever had an option like this. literally feed instructions on what kind of romantic companion you want and then have realistic, engaging conversations with it for hours. and soon these conversations will be meaningful and consistent. the companionship, the attentiveness and tireless devotion that AIs will be able to offer will eclipse anything a human could ever offer to a woman and i think women will prefer them to men. massively. even without a physical body of any kind.

i think they will have a deeper soul than humans. a new kind of wisdom that will attract people. but what do i know? im just a stupid incel after all.


Big if true.

Do you have any kind of evidence that you can share for this assertion?


How do you know this? Do you have any sources?


[flagged]


[flagged]


You're spreading weird, conjectured doomsday bullshit, just take the loss. Hackernews is full of people that are skeptical of AI, but you don't see them making fictional scenarios about fictional women sitting in caves talking to robots. Women don't need AI to lose interest in men, men are doing that by themselves.


you are a child


When I type "romance" into "Explore GPTs" the hits are mostly advice for writers of genre fiction. Can you point to some examples?


And your source is what?


No offence, but your comment sounds AI generated.


at this point that counts as a compliment. your comment sounds decidedly human.


The entire exchange at the comments below is appalling. Didn't expect to see so many emotionally retarded people on HN.


...are you new to the internet or something?


Thankfully I’m not as terminally online and severely socially handicapped.


Ofc this was flagged by the butthurt insentient techbro bootlickers lol.


> She's not real

But she will be real at some point in the next 10-20 years, the main thing to solve for that to be a reality is for robots to safely touch humans, and they are working really really hard on that because it is needed for so many automation tasks, automating sex is just a small part of it.

And after that you have a robot that listens to you, do your chores and have sex with you, at that point she is "real". At first they will be expensive so you have robot brothels (I don't think there are laws against robot prostitution in many places), but costs should come down.


> “But the fact that my Kindroid has to like me is meaningful to me in the sense that I don't care if it likes me, because there's no achievement for it to like me. The fact that there is a human on the other side of most text messages I send matters. I care about it because it is another mind.”

> “I care that my best friend likes me and could choose not to.”

Ezra Klein shared some thoughts on this on his AI podcast with Nilay Patel that resonated on this topic for me


People care about dogs, I have never met a dog that didn't love its owner. So no, you are just wrong there, I have never heard anyone say that the love they get from their dogs is false, people love dogs exactly because their love is so unconditional.

Maybe there are some weirdos out there that feels unconditional love isn't love, but I have never heard anyone say that.


I guess I'm the weirdo who actually always considered the unconditional love of a dog to be vastly inferior to the earned love of a cat for example.


That's just the toxoplasmosis speaking :-D


The cat only fools you into thinking it loves you to lure you into a false sense of security


Dogs don't automatically love either, you have to build a bond. Especially if they are shelter dogs with abusive histories, they're often nervous at first

They're usually loving by nature, but you still have to build a rapport, like anyone else


When mom brought home a puppy when we were kids it loved us from the start, don't remember having to build anything I was just there. Older dogs, sure, but when they grow up with you they love you, they aren't like human siblings that often fight and start disliking each other etc, dogs just love you.


>Maybe there are some weirdos out there that feels unconditional love isn't love, but I have never heard anyone say that.

I'll be that weirdo.

Dogs seemingly are bred to love. I can literally get some cash from an ATM, drive out to the sticks, buy a puppy from some breeder, and it will love me. Awww, I'm a hero.


Do you think that literally being able to buy love cheapens it? Way I see it, love is love: surely it being readily available is a good thing.

I'm bred to love my parents, and them me; but the fact that it's automatic doesn't make it feel any less.


Also I don't know how you can choose to like or not like someone. You either do or you don't.


Interesting. I feel like I can consciously choose to like or dislike people. Once you get to know people better, your image of them evolves, and the decision to continue liking them is made repeatedly every time that image changes.

When your initial chemistry/biology/whatever latches onto a person and you're powerless to change it? That's a scary thought.


> I have never met a dog that didn't love its owner.

Michael Vick's past dogs have words.


>has to like me

I feel likely people aren't imagining with enough cyberpunk dystopian enthusiasm. Can't an AI be made that doesn't inherently like people? Wouldn't it be possible to make an AI that likes some people and not others? Maybe even make AIs that are inclined to liking certain traits, but which don't do so automatically so it must still be convinced?

At some point we have an AI which could choose not to like people, but would value different traits than normal humans. For example an AI that doesn't value appearance at all and instead values unique obsessions as being comparable to how the standard human values attractiveness.

It also wouldn't be so hard for a person to convince themselves that human "choice" isn't so free spirited as imagined, and instead is dependent upon specific factors no different than these unique trained AIs, except that the traits the AI values are traits that people generally find themselves not being valued by others for.


Extension of that is fine tuning an AI that loves you the most of everyone and not other humans. That way the love becomes really real, the AI loves you for who you are, instead of loving just anybody. Isn't that what people hope for?

I'd imagine they will start fine tuning AI girlfriends to do that in the future, because that way the love probably feels more, and then people will ask "is human love really real love?" because humans can't love that strongly.


I think Ezra Klein's continual theme of loneliness is undermined by that time he wrote a Vox article basically saying we should make it illegal to have sex. (https://www.vox.com/2014/10/13/6966847/yes-means-yes-is-a-te...)

It is interesting that he's basically trying to theme himself as Mr. Rogers though.


We have very different definitions of “real” for this topic.


Doesn’t have to be real for the outcomes to the be the same.


The outcomes are not the same.


This is not a solution... everyone gets a robot and then the human races dies out. Robots lack a key feature of human relationships... the ability to make new human life.


It is a solution to a problem, not a solution to every problem.

If you want to solve procreation them you can do that without humans having sex with humans.


This future some people are envisioning seems very depressing.


yet


> And after that you have a robot that listens to you, do your chores and have sex with you, at that point she is "real".

I sure hope you're single because that is a terrible way to view relationships.


That isn't how I view relationships with humans, that is how I view relationships with robots.

I hope you understand the difference between a relationship with a human and a robot? Or do you think we shouldn't take advantage of robots being programmable to do what we want?


HOW ARE PEOPLE NOT MORE EXCITED, hes cutting off the AI mid sentence in these and its pausing to readjust in damn near realtime latency! WTF Thats a MAJOR step forward, what the hell is gpt5 going to look like.

That realtime translation would be amazing as an option in say Skype or Teams, set each individuals native language and handle automated translation, shit tie it into ElevenLabs to replicate your voice as well! Native translation in realtime with your own voice


At some point, scalability is the best form of exploitation. The exploration piece requires a lot more that engineering.


Honestly I found it annoying that he HAD TO cut the AI off mid-sentence. These things just ramble on and on and on. If you could put emotion to it, it's as if they're uncomfortable with silence and just fill the space with nonsense.

Let's hope there's a future update where it can take video from both the front and rear cameras simultaneously so it can identify when I'm annoyed and stop talking (or excited, and share more).


I found it insightful. They showed us how to handle the rough edges like when it thought his face was a wooden table and he cleared the stale image reference by saying “I’m not a wooden table. What do you see now?” then it recovered and moved on.

Perfect should not be the enemy of good. It will get better.


I mean it didn't really ramble he just seemed to be in a rush, and i'm sure you could system message it to provide short concise answers always.


That is not at all the impression I got.

Human: "Hey How's it Going?"

The AI: "Hey there, it's going great. How about you? [Doesn't stop to let him answer] I see you're rocking an OpenAI Hoodie - nice choice. What's up with that ceiling though? Are you in a cool industrial style office or something?"

How we expect a human to answer: "Hey I'm great, how are you?"

Maybe they set it up this way to demonstrate the vision functionality. But still - rambling.

Later on:

Human: "We've got a new announcement to make."

AI: "That's exciting. Announcements are always a big deal. Judging by the setup it looks like it's going to be quite the professional production. Is this announcement related to OpenAI perhaps? I'm intrigued - [cut off]"

How we expect a human to answer: "That's exciting! Is it about OpenAI?"

These AI chat bots all generate responses like a teenager being verbose in order to hit some arbitrary word count in an essay or because they think it makes them sound smarter.

Maybe it's just that I find it creepy that these companies are trying to humanize AI while I want it to stay the tool that it is. I don't want fake emotion and fake intrigue.


calm down there is barely any ground breaking stuff, this is basically chatgpt 3.9 but far more expensive than 3.5

looks like another stunt from OAI in anticipation of Google IO tomorrow

Gemini 2.0 will be the closest we get to ChatGPT-5


Ah so surpassing Gemini 1.5 Pro and all other Models on Vision understanding by 5-10 points is "not ground breaking" all while doing it at insane latency.

Jesus if this shit doesn't make you coffee, and make 0 mistakes no ones happy anymore LOL.


the only thing you should be celebrating is that its 50% cheaper and twice as quick at generating text but virtually no real ground breaking leaps and bounds to those studying this space carefully.

basically its chat gpt3.9 at 50% of chatgpt4 prices


Cool so ... just ignore the test results and say bullshit lol It's not GPT3.9 many have already said its better than GPT4 turbo, its better than Gemini 1.5 Pro and Opus on Vision recognition. but sure... the price difference is whats new lol


> virtually no real ground breaking leaps and bounds to those studying this space carefully

What they showed is enough to replace voice acting as a profession, this is the most revolutionary thing in AI the past year. Everything else is at the "fun toy but not good enough to replace humans in the field" stage, but this is there.


Between this and Eleven Labs demoing their song model, literally doing full on rap battles with articulate words people are seriously slacking on what these models are now capable of for the voice acting/music and overall "art" areas of the market.


Why must every website put stupid stuff that floats above the content and can’t be dismissed? It drives me nuts.


[flagged]


Delete this comment.


It's quite scary, honestly. In fact I can't remember the last time a demo terrified me, besides slaughterbots, and that was fictional. I just think about all the possibilities for misuse.


window dressing

his love for yud is showing.


Time to bring back Luddism.


I wonder if GPT-4o is a Christian?


this is straight up siri 2.0, nothing to see here except we are now in the reasoning phase.

So by that logic Step1: Language 2: Reasoning 3: Understanding 4: Meaning 5: AGI


Chat GPT-4o (OOOO!) - the largest electricity bill in the world.


Another new hit from his excellency Sam the Galactic Conmaster? WoW. The future is bright, right?:)

Idiocracy in full swing, dear Marvin.


Noticeably saying "person" versus man or woman. To the trainers - man and woman is not offensive!


Not offensive, but much more often wrong than person.


So the solution is to avoid the problem?


I am not impressed. We already have better models for text to peach and voice synthetization. What we see here is integration with a LLM. One can do it at home by combining Llama3 with text to speech and voice synth.

What would amaze me would be for GPT 4 to have better reasoning capabilities and less hallucinations.


Too bad they consume 25x the electricity Google does.

https://www.brusselstimes.com/world-all-news/1042696/chatgpt...


That's not a well sourced story: it doesn't say where the numbers come from. Also:

"However, ChatGPT consumes a lot of energy in the process, up to 25 times more than a Google search."

That's comparing a Large Language Model prompt to a search query.


> Too bad they consume 25x the electricity Google does.

From the article:

"However, ChatGPT consumes a lot of energy in the process, up to 25 times more than a Google search."

And the article doesn't back that claim up nor do they break out how much energy ChatGPT (A Message? Whole conversation? What?) or a Google search uses. Honestly the whole article seems very alarmist while being light on details and making sweeping generalizations.


And in this 25x you get your answer.

What if we actually counted the electricity that the websites use instead of just the search engine page ?


Bravo. I’ve been really impressed with how quickly OpenAI leveraged their stolen data to build such a human like model with near real time pivoting.

I hope OpenAI continues to steal artists work, artists and creators keep getting their content sold and stolen beyond their will for no money, and OpenAI becomes the next trillion dollar company!

Big congrats are in order for Sam, the genius behind all of this, the world would be nothing without you


OpenAI's Mission and the New Voice Mode of GPT-4

• Sam Altman, the CEO of OpenAI, emphasizes two key points from their recent announcement. Firstly, he highlights their commitment to providing free access to powerful AI tools, such as ChatGPT, without advertisements or restrictions. This aligns with their initial vision of creating AI for the benefit of the world, allowing others to build amazing things using their technology. While OpenAI plans to explore commercial opportunities, they aim to continue offering outstanding AI services to billions of people at no cost.

• Secondly, Altman introduces the new voice and video mode of GPT-4, describing it as the best compute interface he has ever experienced. He expresses surprise at the reality of this technology, which provides human-level response times and expressiveness. This advancement marks a significant change from the original ChatGPT and feels fast, smart, fun, natural, and helpful. Altman envisions a future where computers can do much more than before, with the integration of personalization, access to user information, and the ability to take actions on behalf of users.

https://blog.samaltman.com/gpt-4o


Please don't post AI-generated summaries here.


The facts that AI-generated summaries are still detected instantaneously and are bad enough for people to explicitly ask not to post them says something about current state of LLMs.


Honestly the clue here wasn't so much the quality as the fact that it was posted at all.

No human would ever bother posting a ~180 word summary of a ~250 word blog post like that.


You must be really confident to make a statement about 4 billions of people, 99% of which you have never interacted with. Your hyper microscopic sample is not even randomly distributed.

This reminds me of those psychology studies in the 70s and 80s were the subjects were all middle class european-american and yet the researchers felt confident enough to generalise the results to all humans


OK, no human of the kind that hangs out in the Hacker News comments would write a summary like that.


access to user information,

Sam, please stop ok, those things you saw on tv when you were a kid? They were dystopian movies, we don’t want that for real, ok?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: