This is probably bad news for ChatGPT 5. I don't think it's that likely this co-...

lolinder · 2024-08-06T13:01:37 1722949297

> This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead.

Yep. The writing was already on the wall for GPT-5 when they teased a new model for months and let the media believe it was GPT-5, before finally released GPT-4o and admitting they hadn't even started on 5 yet (they quietly announced they were starting a new foundation model a few weeks after 4o).

Don't get me wrong, the cost savings for 4o are great, but it was pretty obvious at that point that they didn't have a clue how they were going to move past 4 in terms of capabilities. If they had a path they wouldn't have intentionally burned the hype for 5 on 4o.

This departure just further cements what I was already sure was the case—OpenAI has lost the lead and doesn't know how they're going to get it back.

userabchn · 2024-08-06T16:19:07 1722961147

and then revealed that GPT-5 will not be released in this year's Dev Day (which goes on until November)

rvnx · 2024-08-06T13:58:35 1722952715

Or it could be the start of the enshittification of Anthropic, like OpenAI ruined GPT-4 with GPT-4o by overly simplifying it.

I hope not, because Claude is much better, especially at programming.

meowface · 2024-08-06T14:05:22 1722953122

Claude 3.5 Sonnet is the first model that made me realize that the era of AI-aided programming is here. Its ability to generate and modify large amounts of correct code - across multiple files/modules - in one response beats anything I've tried before. Integrating that with specialized editors (like https://www.cursor.com) is an early vision of the future of software development.

lolinder · 2024-08-06T14:13:29 1722953609

I've really struggled every time I've pulled out any LLM for programming besides using Copilot for generating tests.

Maybe I've been using it for the wrong things—it certainly never helps unblock me when I'm stuck like it sounds like it does for some (I suspect it's because when I get stuck it's deep in undocumented rabbit holes), but it sounds like it might be decent at large-scale rote refactoring? Aside from specialized editors, how do people use it for things like that?

rvnx · 2024-08-06T14:16:17 1722953777

At least from my experience:

You take Claude, you create a new Project, in your Project you explain the context of what you are doing and what you are programming (you have to explain it only once!).

If you have specific technical documentation (e.g. rare programming language, your own framework, etc), you can put it there in the project.

Then you create a conversation, and copy-paste the source-code for your file, and ask for your refactoring or improvement.

If you are lazy just say: "give me the full code"

and then

"continue the code" few times in a row

and you're done :)

munksbeer · 2024-08-07T12:21:10 1723033270

> in your Project you explain the context of what you are doing and what you are programming (you have to explain it only once!).

When you say this, you mean typing out some text somewhere? Where do you do this? In a giant comment? In which file?

rvnx · 2024-08-07T12:53:26 1723035206

In "Projects" -> "Create new project" -> "What are you trying to achieve?"

danielbln · 2024-08-06T14:18:08 1722953888

Provide context to the model. The code you're working on, what it's for, where you're stuckhat you've tried, etc. Pretend it's a colleague that should help you out and onboard it to your problem, then have a conversation with it as of your are rubber ducking your colleague.

Don't ask short one-off questions and expect it to work (it might just, depending on what you ask, but probably not if you're deep on some proprietary code base with no traces in the LLMs pretraining).

lolinder · 2024-08-06T14:46:39 1722955599

I've definitely tried that and it doesn't work for the problems I've tried. Claude's answers for me always have all the hallmarks of an LLM response: extremely confident, filled with misunderstandings of even widely used APIs, and often requiring active correction on so many details that I'm not convinced it wouldn't have been faster to just search for a solution by hand. It feels like pair programming with a junior engineer, but without the benefit of helping train someone.

I'm trying to figure out if I'm using it wrong or using it on the wrong types of problems. How do people with 10+ years of experience use it effectively?

FLT8 · 2024-08-06T22:57:10 1722985030

I'm sure I'm going to offend a bunch of people with this, but my experience has been similar to yours, and it reminds me of something "Uncle" Bob Martin once mentioned: the number of software developers is roughly doubling every two years, which means that at any given time half of the developer population has less than two years experience.

If you're an experienced dev, having a peer that enthusiastically suggests a bunch of plausible but subtly wrong things probably net-net slows you down and annoys you. If you're more junior, it's more like being shown a world of possibilities that opens your mind and seems much more useful.

Anyway, I think the reason we see so much enthusiasm for LLM coding assistants right now is the overall skew of developers to being more junior. I'm sure these tools will eventually get better, at least I hope they do because there's going to be a whole lot of enthusiastically written but questionable code out there soon that will need to be fixed and there probably won't be enough human capacity to fix it all.

lolinder · 2024-08-07T04:19:44 1723004384

Thanks for saying it explicitly. I definitely have the same sense, but was hoping someone with experience would chime in about use cases they have for it.

klyrs · 2024-08-06T16:12:53 1722960773

I'm a mathematician and the problems I work on tend to be quite novel (leetcode feel but with real-world applications). I find LLMs to be utterly useless at such tasks; "pair programming a junior, but without the benefit" is an excellent summary of my experience as well.

icholy · 2024-08-06T15:46:46 1722959206

It's good for writing that prototype you're supposed to throw away. It's often easy to see the correct solution after seeing the wrong one.

danielbln · 2024-08-06T14:54:20 1722956060

I think the only way to answer that is if you can share an example of a conversation you had with it, where it broke down as you described.

valval · 2024-08-06T16:37:49 1722962269

For what I’m working on, I can also use the wrong approaches. Going through my fail often fail fast feedback loop is a lot more efficient with LLMs. Like A LOT more.

Then when I have a bunch of wrong answers, I can give those as context as well to the model and make it avoid those pitfalls. At that point my constraints for the problem are so rigorous that the LLMs lands at the correct solution and frankly writes out the code 100x faster than I would. And I’m an advanced vim user who types at 155 wpm.

lolinder · 2024-08-06T17:43:22 1722966202

> And I’m an advanced vim user who types at 155 wpm.

See, it's comments like this that make me suspect that I'm working on a completely different class of problem than the people who find value in interacting with LLMs.

I'm a very fast typer, but I've never bothered to find out how fast because the speed of my typing has never been the bottleneck for my work. The bottleneck is invariably thinking through the problem that I'm facing, trying to understand API docs, and figuring out how best to organize my work to communicate to future developers what's going on.

Copilot is great at saving me large amounts of keystrokes here and there, which is nice for avoiding RSI and occasionally (with very repetitive code like unit tests) actually a legit time saver. But try as I might I can't get useful output out of the chat models that actually speeds up my workflow.

whackyMax · 2024-08-06T20:45:35 1722977135

I have always thought of it as a way to figure out what doesn't work and get to a final design, not necessarily code. Personally, it's easy to verify a solution and figure out use cases that wouldn't work out. Keep on iterating until I have either figured out a mental model of the solution, or figured out the main problems in such a hypothetical solution.

rvnx · 2024-08-06T14:09:29 1722953369

Oh yes, totally agree, it's like if you have a very experienced programmer sitting next to you.

He still needs instructions on what to do next, he lacks a bit of "initiative", but from a pure coding skills it's amazing (aka, we will get replaced over time, and it's already the case, I don't need help of contractors, I prefer to ask Claude).

gizmo · 2024-08-06T16:06:29 1722960389

More like an insanely knowledgeable but very inexperienced programmer. It will get basic algorithms wrong (unless it's in the exact shape it has seen before). It's like a system that automatically copy-pastes the top answer from stackoverflow in your code. Sometimes that is what you want, but most of the time it isn't.

valval · 2024-08-06T16:41:10 1722962470

This sentiment is so far from the truth that I find it hilarious. How can a technically adept person be so out of touch with what these systems are already capable of?

gizmo · 2024-08-06T17:03:28 1722963808

LLMs can write a polite email but it can't write a good novel. It can create art or music (by mushing together things it has seen before) but not art that excites. It's the same with code. I use LLMs daily and I've seen videos of other people using tools like Cursor and so far it looks like these LLMs can only help in those situations where it is pretty obvious (to the programmer) what the right answer looks like.

rvnx · 2024-08-07T12:25:57 1723033557

With all of that, ChatGPT is actually one of the top authors in Amazon e-books.

But I agree that for some creative tasks, like writing or explaining a joke, or some novel algorithms, it's very bad.

PaulRobinson · 2024-08-08T06:35:12 1723098912

The LLM generated e-book thing is actually a serious problem. Have you read any of it? Consumers could lose trust unless it’s fixed. If you buy a book and then realise nobody, not even the seller, has ever read it, as it turns into incomprehensible mush regularly, are you more or less likely to buy a book from the same source?

jetsetk · 2024-08-11T21:58:54 1723413534

Hilarious (or even shocking) is the sentiment that people are actually so overhyped by these tools.

_xnmw · 2024-08-06T14:18:23 1722953903

I keep hearing this comment everywhere Claude is mentioned, as if there is a coordinated PR boost on social media. My personal experience with Claude 3.5 however is, meh. I don't see much difference compared to GPT-4 and I use AI to help me code every day.

viking123 · 2024-08-06T14:31:59 1722954719

Yeah they really like to mention it everywhere, like yeah it's good but imo not as good as some people make it out to be. I have used it recently for libgdx on kotlin and there are things where it struggles, and the code it sometime gives it's not really "good" kotlin but it takes a good programmer to know what is good and what is not

phyalow · 2024-08-06T15:14:18 1722957258

I think in more esoteric languages it wont work as well. Python, C++ it is excellent, suprisingly its Rust is also pretty damn good.

(I am not a paid shiller, just in awe of what Sonnet 3.5 + Opus can do)

kibibu · 2024-08-06T21:49:21 1722980961

Kotlin isn't exactly an esoteric language though

valval · 2024-08-06T16:41:44 1722962504

User error.

lebca · 2024-08-06T17:08:33 1722964113

Please consider avoiding more ad hominem attacks or revising the ones you've already plastered onto this discussion.

samier-trellis · 2024-08-06T21:21:29 1722979289

How are you liking cursor? I tried it ~a year ago, and it was quite a bit worse than ferrying back and forth between ChatGPT and VSCode.

Is it better than using GitHub Copilot in VSCode?

meowface · 2024-08-07T17:49:43 1723052983

Definitely better. I ended my Copilot subscription.

samier-trellis · 2024-08-07T21:40:02 1723066802

Oh interesting, will give it another go, thnx

nar001 · 2024-08-06T14:04:41 1722953081

They ruined GPT-4? How? I thought they were basically the same models, just multimodal

rvnx · 2024-08-06T14:07:49 1722953269

GPT-4o is different from GPT-4, you can "feel" it is smaller model that really struggles to do reasoning and programming and has a much weaker logic.

If you compare to Claude Sonnet, just the context window considerably improves the answers as well.

Of course there is no objective metrics, but from a user perspective I can see the coding skills are much better in Anthropic (and it's funny, because in theory, according to benchmarks it is Google Gemini the best, but in reality is absolutely terrible).

jstummbillig · 2024-08-06T14:23:56 1722954236

> GPT-4o is different from GPT-4, you can "feel" it is smaller model that really struggles to do reasoning and programming and has a much weaker logic.

FWIW according to LMSYS this is not the case. In coding, current GPT-4o (and mini, for that matter) beat GPT-4-Turbo handily, by a margin of 32 points.

By contrast Sonnet 3.5 is #1, 4 score points ahead of GPT-4o.

whymauri · 2024-08-06T14:30:47 1722954647

I'm a firm believer that the best benchmark is playing around with the model for like an hour. On the type of tasks that are relevant to you and your work, of course.

patrickdward · 2024-08-07T04:23:55 1723004635

I've also found GPT-4o to be subjectively less intelligent than GPT-4. The gap especially shows up when more complex reasoning is required, eg, on macroeconomic questions or other domains in the world where the interactions are important or where subtle aspects of the question or domain are important.

toxik · 2024-08-06T14:13:24 1722953604

Have to say I agree with this, 4o is dumber in my subjective experience.

reaperman · 2024-08-06T10:43:45 1722941025

While I agree with your logic I also focused on:

> People who put their ideals above pragmatic self-interest self-select out of positions of power and influence. That is likely to be the case here as well.

It’s also possible that this co-founder realizes he has more than enough eggs saved up in the “OpenAI” basket, and that it’s rational to de-risk by getting a lot of eggs in another basket to better guarantee his ability to provide a huge amount of wealth to his family.

Even if OpenAI is clearly in the lead to him, he’s still looking at a lot of risk with most of his wealth being tied up in non-public shares of a single company.

andruby · 2024-08-06T11:53:40 1722945220

While true, him leaving OpenAI to (one of) their biggest competitors does seriously risk his eggs in the OpenAI basket.

muzani · 2024-08-06T14:31:24 1722954684

There's usually enough room for 2-3 winners. iOS and Android. Intel and AMD. Firefox and Chrome.

Also, OpenAI has some of the most expensive people in the world, which is why they're burning so much money. Presumably they're so expensive because they're some of the smartest people in the world. Some are likely smarter than Schulman.

aleph_minus_one · 2024-08-06T15:07:48 1722956868

> Presumably they're so expensive because they're some of the smartest people in the world.

I don't want to dissuade you from this belief, but maybe you should pay less attention to the boastful marketing of these AI companies. :-)

Seriously: from what I know about the life of insanely smart people, I'd guess that OpenAI (and most other companies that in their marketing claim to hire insanely smart people) doesn't have any idea how to actually make use of such people. Such companies rather hire for other specific personality traits.

JCharante · 2024-08-06T13:50:33 1722952233

it only risks his eggs if the anthropic basket does well, if anthropic doesn't go well then he still has his OpenAI eggs

mark_l_watson · 2024-08-06T11:38:54 1722944334

I find the 5 billion a year burn rate amazing, and OpenAI’s competition is stiff. I happily pay ABACUS.AI ten dollars a month for easy access to all models, with a nice web interface. I just started paying OpenAI twenty a month again, but only because I am hoping to get access to their interactive talking mode.

I was really surprised when OpenAI started providing most of their good features for free. I am not a business person, but it seems crazy to me to not try for profitability, of at least being close to profitability. I would like to know what the competitors’ burn rates are also.

For API use, I think OpenAI’s big competition is Groq, serving open models like Llama 3.1.

gizmo · 2024-08-06T13:32:13 1722951133

> it seems crazy to me to not try for profitability

A business is worth the sum of future profits, discounted for time (because making money today is better than making money tomorrow). Negative profits today are fine as long as they are offset by future profits tomorrow. This should make intuitive sense.

And this is still true when the investment won't pay off for a long time. For example, governments worldwide provide free (or highly subsidized) schooling to all children. Only when the children become taxpaying adults, 20 years or so later, does the government get a return on their investment.

Most good things in life require a long time horizon. In healthy societies people plant trees that won't bear fruit or provide shade for many years.

patrickdward · 2024-08-07T04:31:24 1723005084

Yes. If ChatGPT-like products will be widely and commonly used in the future, it's much more valuable right now to try to acquire users and make their usage sticky (through habituation, memory, context/data, integrations, etc) than it is to monetize them fully right now.

blackeyeblitzar · 2024-08-06T17:38:48 1722965928

I’m not super familiar with the latest AI services out there. Is abacus the cheapest way to access LLMs for personal use? Do they offer privacy and anonymity? What about their stance on censorship of answers?

codazoda · 2024-08-06T12:20:01 1722946801

I don’t use Groq, but I agree the free models are probably the biggest competitors. Especially since we can run them locally and privately.

Because I’ve seen a lot of questions about how to use these models, I recorded a quick video showing how I use them on MacOS.

https://makervoyage.com/ai

dartos · 2024-08-06T12:52:36 1722948756

Local private models are not a threat to openai.

Local is not where the money is, it’s in cloud services and api usage fees.

elorant · 2024-08-06T13:07:36 1722949656

They aren’t in terms of profitability, but they are in terms of future revenue. If most early adopters start self-hosting models then a lot of future products will be build outside of OpenAI’s ecosystem. Then corporations will also start searching how to self-host models because privacy is the primary concern for AI’s adoption. And we already have models like Llama3 400B that is close to ChatGPT.

dartos · 2024-08-06T15:00:05 1722956405

Have you paid much attention to the local model world?

They all tout OpenAI compatible APIs because OAI was the first mover. No real threat for incompatibility with OAI.

Plus these LLMs don’t have any kind of interface moat. It’s text in and text out.

windexh8er · 2024-08-06T17:06:53 1722964013

Just because Ollama and friends copied the API doesn't mean that they're not competitive. They've all done this just the same as others copying the S3 API - ease of integration and lower barrier to entry during a switching event, should one arise.

> Plus these LLMs don't have any kind of interface moat.

The interface really has very little influence. Nobody in the enterprise world cares about the ChatGPT interface because they're all building features into their own products. The UI for ChatGPT has been copied ad nauseam - so if anyone really wanted something to look and feel the same it's already out there. Chat and visual modals are already there, so I'm curious how you think ChatGPT has an "interface moat"?

> Local private models are not a threat to openai.

There are lots of threats to AI. One of them being local models. Because if the OpenAI approach is to continue at their burn rate and hope that they will be the one and only I think they're very wrong. Small, targeted models provide for many more use cases than a bloated, expensive, generalized model. I would gather long term OpenAI either becomes a replacement for Google search or they, ultimately, fail. When I look around me I don't see many great implementations of any of this - mostly because many of them look and feel like bolt-ons to a foundational model that tries to do something slightly product specific. But even in those cases the confidence with which I'd put in these products today is of relatively low quality.

dartos · 2024-08-11T12:17:38 1723378658

My argument was, because ollama and friends use the exact same interface as openai, tools built on top of them are compatible with OpenAI’s products and thus those tools don’t pull users away from OpenAI, so the local model world isn’t something OpenAI is worried about.

There is no interface moat. No reason for a happy openai user to ever leave openai, because they can enjoy all the local model tools with GPT.

elorant · 2024-08-06T18:22:18 1722968538

Who cares about the interface? Not everyone is interested in conversational tasks. Corporations in particular need LLMs to process their data. A restful API is more than enough.

dartos · 2024-08-07T10:22:14 1723026134

By interface, I meant API. (The “I” in API)

I should’ve been more clear.

mark_l_watson · 2024-08-06T12:34:25 1722947665

I use Ollama running local models about half the time (from Common Lisp or Python apps) myself.

bionhoward · 2024-08-06T12:23:35 1722947015

OpenAI features aren’t free, they take your mind-patterns in the “imitation game” as the price, and you can’t do the same to them without breaking their rules.

https://ibb.co/M1TnRgr

tim333 · 2024-08-06T17:38:08 1722965888

>it seems crazy to me to not try for profitability

I'm reminded of the Silicon Valley bit about no revenue https://youtu.be/BzAdXyPYKQo

It probably looks better to be not really trying for profitability and losing $5bn a year than trying hard and losing $4bn

Gettingolderev · 2024-08-06T11:24:45 1722943485

I don't think a co-founder would just jump ship just because. That would be very un-co-founderish.

I would also assume that he earns enough money to be rich. You are not a co-founder of OpenAI if you are not playing with the big boys.

So he definitly wants to be in this AI future but not with OpenAI. So i would argue it has to do with something which is important to him so important that the others disagree with him.

sangnoir · 2024-08-06T15:13:31 1722957211

> This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead.

I'll play devil's advocate. People leave bad bosses all the time, even when everything else is near-perfect. Additionally, cofounders sometimes get pushed out - even Steve Jobs went through this.

bookaway · 2024-08-06T13:26:58 1722950818

If being sued by the world's richest billionaire or the whole non-profit thing didn't complicate matters, and if the board had any teeth, one could wish the board would explore a merger with Anthropic with Altman leaving at the end of all of it and save everyone another years worth of drama.

lupire · 2024-08-06T13:42:38 1722951758

Could be as simple as switching from a limited profit/pa company to unlimited profit/pay.

jejeyyy77 · 2024-08-06T12:52:51 1722948771

this AI safety stuff is just a rabbit hole of distraction, IMO.

OpenAI will be better off without this crowd and just focus on building good products.

tivert · 2024-08-06T15:09:59 1722956999

> this AI safety stuff is just a rabbit hole of distraction, IMO.

> OpenAI will be better off without this crowd and just focus on building good products.

Ah yes, "focus on building good products" without safety. Except a "good product" is safe.

Otherwise you're getting stuff like an infinite range plane powered by nuclear jet engine that has fallout for exhaust [1].

[1] IIRC, nuclear-powered cruise missiles were contemplated: their attack would have consisted on dropping bombs on their targets, then flying around in circles spreading radioactive fallout over the land.

Jensson · 2024-08-06T15:21:19 1722957679

> Except a "good product" is safe.

Depends on how you define "safe". The kind of "safe" we get from OpenAI today seems to be mostly censorship, I don't think we need more of that.

jejeyyy77 · 2024-08-06T21:55:08 1722981308

what i'm saying is the safety risks of AI are over exaggerated to the point of comedy. it is no more dangerous than any other kind of software.

there is an effort by AI-doomer groups to try and regulate/monopolize the technology, but fortunately it looks like open source has put a wrench in this.

wseqyrku · 2024-08-07T07:28:23 1723015703

They won't release 5 before election.

dirtybirdnj · 2024-08-06T13:06:15 1722949575

> In situations like these it's good to remember that people are much more likely to take the ethical and principled road when they also stand to gain from that choice. People who put their ideals above pragmatic self-interest self-select out of positions of power and influence.

I don't know what world you live in, but my experience has been 100% the opposite. Most people will not do what is ethical or principled. When you try to discuss it with them, they will DARVO and congrats, you have now been targeted for public retribution by the sociopathic child in the drivers seat.

The thing that upsets me most is the survivorship bias you express, and how everybody thinks that people are "nice and kind" they are not. The world is an awful terrible place full of liars, cheats and bad people that WE NEED TO STOP CELEBRATING.

One more time WE NEED TO STOP CELEBRATING BAD PEOPLE WHO DO BAD THINGS TO OTHERS.

gizmo · 2024-08-06T13:19:16 1722950356

People are not one-dimensional. People can lie and cheat on one day and act honorably the day after. A person can be kind and generous and cruel and selfish. Most people are just of average morality. Not unusually good nor unusually bad. People in positions of power get there because they seek power, so there is a selection effect there for sure. But nonetheless you'll find that very successful people are in most ways regular people with regular flaws.

(Also, I think you misread what I wrote.)

diab0lic · 2024-08-06T14:34:00 1722954840

I think you may have misread the quote you’re replying to. You and the GP post appear to be in agreement. I read it as:

P(ethical_and_principled) < P(ethical_and_principled|stands_to_gain)

Or in plain language people are more likely to do the right thing when they stand to gain, rather than just because it’s the right thing.