OpenAI is racing against two clocks: the commoditization clock (how quickly open-source alternatives catch up) and the monetization clock (their need to generate substantial revenue to justify their valuation).
The ultimate success of this strategy depends on what we might call the enterprise AI adoption curve - whether large organizations will prioritize the kind of integrated, reliable, and "safe" AI solutions OpenAI is positioning itself to provide over cheaper but potentially less polished alternatives.
This is strikingly similar to IBM's historical bet on enterprise computing - sacrificing the low-end market to focus on high-value enterprise customers who would pay premium prices for reliability and integration. The key question is whether AI will follow a similar maturation pattern or if the open-source nature of the technology will force a different evolutionary path.
The problem is that OpenAI don't really have the enterprise market at all. Their APIs are closer in that many companies are using them to power features in other software, primarily Microsoft, but they're not the ones providing end user value to enterprises with APIs.
As for ChatGPT, it's a consumer tool, not an enterprise tool. It's not really integrated into an enterprises' existing toolset, it's not integrated into their authentication, it's not integrated into their internal permissions model, the IT department can't enforce any policies on how it's used. In almost all ways it doesn't look like enterprise IT.
This remind me why enterprise don't integrated OpenAI product into existing toolset, trust is root reason.
It's hard to provide trust to OpenAI that they won't steal data of enterprise to train next model in a market where content is the most valuable element, compared office, cloud database, etc.
Microsoft 360 has over 300 million corporate users - trusting it with email, document management, and collaboration etc. It’s the defacto standard in larger companies especially in banking, medicine and finance that have more rigorous compliance regulations.
The administrative segments that decide to sell their firstborn to Microsoft all have their heads in the clouds. They'll pay Microsoft to steal their data and resell it and they'll defend their decisions making beyond their own demise.
As such Microsoft is doing the right choice in outright stealing data for whatever purpose. It will have no real consequences.
IT policy flick of the switch disables that, such as at my organization. That was instead intended to snag single, non-corporate, user accounts (still horrible, but I mean to convey that MS at no point in all that expected a company's IT department to actually leave that training feature enabled in policy).
It doesn't need to / it already is – most enterprises are already Microsoft/Azure shops. Already approved, already there. What is close to impossible is to use anything non Microsoft - with one exception – open source.
They betrayed their customers in the Storm-0558 attack.
They didn't disclose the full scale and charged the customers for the advanced logging needed for detections.
Not to mention that they abolished QA and outsourced it to the customer.
Maybe they aren't, but when you already have all your documents in sharepoint, all your emails in outlook and all your databases VMs in Azure, then Azure OpenAI is trusted in the organization.
For some reason (mainly because Microsoft has orders of magnitude more sells reps than anything else) companies have been trusting Microsoft for their most critical data for a long time.
For example when they backed the CEOs coup against the board.
With AI-CEOs - https://ai-ceo.org - This would never have happened because their CEOs have a kill switch and mobile app for the board for full observability
OpenAi enterprise plan especially says that they do not train their models with your data. It's in the contract agreement and it's also visible on the bottom of every chatgpt prompt window.
It seems like a damned if you do, damned if you don't. How is ChatGPT going to provide relevant answers to company specific prompts if they don't train on your data?
My personal take is that most companies don't have enough data, and not in sufficiently high quality, to be able to use LLMs for company specific tasks.
The model from OpenAI doesn’t need to be directly trained on the company’s data. Instead, they provide a fine-tuning API in a “trusted” environment. Which usually means Microsoft’s “Azure OpenAI” product.
But really, in practice, most applications are using the “RAG” (retrieval augmented generation) approach, and actually doing fine tuning is less common.
> The model from OpenAI doesn’t need to be directly trained on the company’s data
Wouldn't that depend on what you expect it to do? If you just want say copilot, summarize texts or help writing emails then you're probably good. If you want to use ChatGPT to help solve customer issues or debug problems specific to your company, wouldn't you need to feed it your own data? I'm thinking: Help me find the correct subscription to a customer with these parameters, then you'd need to have ChatGPT know your pricing structure.
One idea I've had, from an experience with an ISP, would be to have the LLM tell customer service: Hey, this is an issue similar to what five of your colleagues just dealt with, in the same area, within 30 minutes. You should consider escalating this to a technician. That would require more or less live feedback to the model, or am I misunderstanding how the current AIs would handle that information?
100% this. If they can figure out trust through some paradigm where enterprises can use the models but not have to trust OpenAI itself directly then $200 will be less of an issue.
> It's hard to provide trust to OpenAI that they won't steal data of enterprise to train next model
Bit of a cynical take. A company like OpenAI stands to lose enormously if anyone catches them doing dodgy shit in violation of their agreements with users. And it's very hard to keep dodgy behaviour secret in any decent sized company where any embittered employee can blow the whistle. VW only just managed it with Dieselgate by keeping the circle of conspirators very small.
If their terms say they won't use your data now or in the future then you can reasonably assume that's the case for your business planning purposes.
lawsuits over the legality of using using someone's writing as training data aren't the same thing as them saying they won't use you as training data and then doing so. they're different things. one is people being upset that their work was used in a way they didn't anticipate, and wanting additional compensation for it because a computer reading their work is different from a person reading their work. the other is saying you won't do something and then doing that anyway and lying about it.
It's not that anyone suspects OpenAI doing dodgy shit. Data flowing out of an enterprise is very high risk. No matter what security safeguards you employ. So they want everything inside their cloud perimeter and on servers they can control.
IMO no big enterprise will adopt chatGPT unless it's all hosted in their cloud. Open source models lend better to the enterprises in this regard.
> IMO no big enterprise will adopt chatGPT unless it's all hosted in their cloud
80% of big enterprises already use MS Sharepoint hosted in Azure for some of their document management. It’s certified for storing medical and financial records.
Cynical? That’d be on brand… especially with the ongoing lawsuits, the exodus of people and the CEO drama a while back? I’d have a hard time recommending them as a partner over Anthropic or Open Source.
It's not enough for some companies that need to ensure it won't happen.
I know for a fact a major corporation I do work for is vehemently against any use of generative A.I. by its employees (just had that drilled into my head multiple times by their mandatory annual cybersecurity training), although I believe they are working towards getting some fully internal solution working at some point.
Kind of funny that Google includes generative A.I. answers by default now, so I still see those answers just by doing a Google search.
I've seen the enterprise version with a top-5 consulting company, and it answers from their global knowledgebase, cites references, and doesn't train on their data.
I recently (in the last month) asked ChatGPT to cite its sources for some scientific data. It gave me completely made up, entirely fabricated citations for academic papers that did not exist.
The behavior you're describing sounds like an older model behavior. When I ask for links to references these days, it searches the internet the gives me links to real papers that are often actually relevant and helpful.
I don’t recall that it ever mentioned if it did or not. I don’t have the search on hand but from my browser history I did the prompt engineering on 11/18 (which perhaps there is a new model since then?).
I actually repeated the prompt just now and it actually gave me the correct, opposite response. For those curious, I asked ChatGPT what turned on a gene, and it said Protein X turns on Gene Y as per -fake citation-. Asking today if Protein X turns on Gene Y ChatGPT said there is no evidence, and showed 2 real citations of factors that may turn on Gene Y.
So sorry to offend your delicate sensibilities by calling out a blatant lie from someone completely unrelated to yourself. Pretty bizarre behavior in itself to do so.
as just another example, chatgpt said in the Okita paper that they switched media on day 3, when if you read the paper they switched the media on day 8. so not only did it fail to generate the correct reference, it also failed to accurately interpret the contents of a specific paper.
I’m a pretty experienced developer and I struggle to get any useful information out of LLMs for any non-trivial task.
At my job (at an LLM-based search company) our CTO uses it on occasion (I can tell by the contortions in his AI code that isn’t present in his handwritten code. I rarely need to fix the former)
And I think our interns used it for a demo one week, but I don’t think it’s very common at my company.
Won’t name my company, but we rely on Palantir Foundry for our data lake.
And the only thing everybody wants [including Palantir itself] is to deploy at scale AI capabilities tied properly to the rest of the toolset/datasets.
The issues at the moment are a mix of IP on the data, insurance on the security of private clouds infrastructures, deals between Amazon and Microsoft/OpenAI for the proper integration of ChatGPT on AWS, all these kind of things.
But discarding the enterprise needs is in my opinion a [very] wrong assumption.
Very personal feeling, but without a datalake organized the way Foundry is organized, I don’t see how you can manage [cold] data at scale in a company [both in term of size, flexibility, semantics or R&D]. Given the fact that IT services in big companies WILL fail to build and maintain such a horribly complex stack, the walled garden nature of the Foundry stack is not so stupid.
But all that is the technical part of things. Markets do not bless products. They bless revenues. And from that perspective, I have NO CLUE.
This is what's so brilliant about the Microsoft "partnership". OpenAI gets the Microsoft enterprise legitimacy, meanwhile Microsoft can build interfaces on top of ChatGPT that they can swap out later for whatever they want when it suits them
I think this is good for Microsoft, but less good for OpenAI.
Microsoft owns the customer relationship, owns the product experience, and in many ways owns the productionisation of a model into a useful feature. They also happen to own the datacenter side as well.
Because Microsoft is the whole wrapper around OpenAI, they can also negotiate. If they think they can get a better price from Anthropic, Google (in theory), or their own internally created models, then they can pressure OpenAI to reduce prices.
OpenAI doesn't get Microsoft's enterprise legitimacy, Microsoft keep that. OpenAI just gets preferential treatment as a supplier.
On the way up the hype curve it's the folks selling shovels that make all the money, but in a market of mature productionisation at scale, it's those closest to customers who make the money.
$10B of compute credits on a capped profit deal that they can break as soon as they get AGI (i.e. the $10T invention) seems pretty favorable to OpenAI.
I’d be significantly less surprised if OpenAI never made a single $ in profit than if they somehow invented “AGI” (of course nobody has a clue what that even means so maybe there is a chance just because of that..)
Leaving aside the “AGI on paper” point a sibling correctly made, your point shares the same basic structure as noting that any VC investment is a terrible deal if you only 2x your valuation. You might get $0 if there is a multiple on the liquidation preference!
OpenAI are clearly going for the BHAG. You may or may not believe in AGI-soon but they do, and are all in on this bet. So they simply don’t care about the failure case (ie no AGI in the timeframe that they can maintain runway).
OAI through their API probably does but I do agree that ChatGPT is not really Enterprise product.For the company the API is the platform play, their enterprise customers are going to be the likes of MSFT, salesforce, zendesk or say Apple to power Siri, these are the ones doing the heavy lifting of selling and making an LLM product that provides value to their enterprise customers. A bit like stripe/AWS. Whether OAI can form a durable platform (vs their competitors or inhouse LLM) is the question here or whether they can offer models at a cost that justifies the upsell of AI features their customers offer
That's why Microsoft included OpenAI access in Azure. However, their current offering is quite immature so companies are using several prices of infra to make it usable (for rate limiting, better authentication etc.).
> As for ChatGPT, it's a consumer tool, not an enterprise tool. It's not really integrated into an enterprises' existing toolset, it's not integrated into their authentication, it's not integrated into their internal permissions model, the IT department can't enforce any policies on how it's used. In almost all ways it doesn't look like enterprise IT.
What according to you is the bare minimum of what it will take for it to be an enterprise tool?
SSO and enforceable privacy and IP protections would be a start. RBAC, queues, caching results, and workflow management would open a lot of doors very quickly.
have used it at 2 different enterprises internally, the issue is price more than anything. enterprises definitely do want to self host, but for frontier tech they want frontier models for solving complicated unsolved problems or building efficiencies in complicated workflows. one company had to rip it out for a time due to price, I no longer work there anymore though so can't speak on if it was reintegrated.
Decision making in enterprise procurement is more about whether it makes the corporation money and whether there is immediate and effective support when it stops making money.
I don't think user submitted question/answer is as useful for training as you (and many others) think. It's not useless, but it's certainly not some goldmine either considering how noisy it is (from the users) and how synthetic it is (the responses). Further, while I wouldn't put it past them to use user data in that way, there's certainly a PR/controversy cost to doing so, even if it's outlined in their ToS.
In enterprise, there will be long content or document be poured into ChatGPT if there isn't policy limitation from company, which can be a meaning training data.
At least, there's possibility these content can be seen by staff in OpenAI as bad case, there's still existing privacy concerns.
No, because a lot of people asking you questions doesn't mean you have the answers to them. It's an opportunity to find the answers by hiring "AI trainers" and putting their responses in the training data.
Yeah it's a fairly standard clause in the business paid versions of SaaS products that your data isn't used to train the model. The whole thing you're selling is per-company isolation so you don't want to go back on that.
Whether your data is used for training or not is an approximation of whether you're using a tool for commercial applications, so a pretty good way to price discriminate.
I wonder if OpenAI can break into enterprise. I don’t see much of a path for them, at least here in the EU. Even if they do manage to build some sort of trust as far as data safety goes, and I’m not sure they’ll have much more luck with that than Facebook had trying to sell that corporate thing they did (still do?). But if they did, they will still be facing the very real issue of having to compete with Microsoft.
I view that competition a bit like the Teams vs anything else. Teams wasn’t better, but it was good enough and it’s “sort of free”. It’s the same with the Azure AI tools, they aren’t feee but since you don’t exactly pay list pricing in enterprise they can be fairly cheap. Co-pilot is obviously horrible compared to CharGPT, but a lot of the Azure AI tooling works perfectly well and much of it integrates seamlessly with what you already have running in Azure. We recently “lost” our OCR for a document flow, and since it wasn’t recoverable we needed to do something fast. Well the Azure Document Intelligence was so easy to hook up to the flow it was ridiculous. I don’t want to sound like a Microsoft commercial. I think they are a good IT business partner, but the products are also sort of a trap where all those tiny things create the perfect vendor lock-in. Which is bad, but it’s also where European Enterprise is at since the “monopoly” Microsoft has on the suite of products makes it very hard to not use them. Teams again being the perfect example since it “won” by basically being a 0 in the budget even though it isn’t actually free.
Man, if they can solve that "trust" problem, OpenAI could really have an big advantage. Imagine if they were nonprofit, open source, documented all of the data that their training was being done with, or published all of their boardroom documents. That'd be a real distinguishing advantage. Somebody should start an organization like that.
The cyber security gatekeepers care very little about that kind of stuff. They care only about what does not get them in trouble, and AI in many enterprises is still viewed as a cyber threat.
One of the things that i find remarkable in my work is that they block ChatGPT because they're afraid of data leaking. But Google translate has been promoted for years and we don't really do business with Google. Were a Microsoft shop. Kinda double standards.
I mean it was probably a jive at OpenAIs transition to for-profit, but you’re absolutely right.
Enterprise decision makers care about compliance, certifications and “general market image” (which probably has a proper English word). OpenAI has none of that, and they will compete with companies that do.
Sometimes I wish Apple did more for business use cases. The same https://security.apple.com/blog/private-cloud-compute/ tech that will provide auditable isolation for consumer user sessions would be incredibly welcome in a world where every other company has proven a desire to monetize your data.
Teams winning on price instead of quality is very telling of the state of business. Your #1/#2 communication tool being regarded as a cost to be saved upon.
It’s “good enough” and integrates into existing Microsoft solutions (just Outlook meeting request integration, for example), and the competition isn’t dramatically better, more like a side-grade in terms of better usability but less integration.
You still can't copy a picture out of a teams chat and paste it into an office document without jumping through hoops. It's utterly horrible. The only thing that prevents people from complaining about it is that it's completely in line with the rest of the office drone experience.
In my experience Teams is mostly used for video conferencing (i.e. as a Zoom alternative), and for chats a different tool is used. Most places already had chat systems set up (Slack, Mattermost, whatever) (or standardize on email anyway), before video conferencing became ubiquitous due to the pandemic.
And yet Teams allows me to seamlessly video call a coworker. Whereas in Slack you have this ridiculous "huddle" thing where all video call participants show up in a tiny tiny rectangle and you can't see them properly. Even a screen share only shows up in a tiny rectangle. There's no way to increase its size. What's even the point of having this feature when you can't see anything properly because everything is so small?
Seriously, I'm not a fan of Teams, but the sad state of video calls in Slack, even in 2024, seriously ruins it for me. This is the one thing — one important thing — that Teams is better at than Slack.
consider yourself lucky, my team uses skype business. Its skype except it cant do video calls or calls at all. Just a terrible messaging client with zero features!
I’m not sure you can considering how broad a term “better” is. I do know a lot of employees in a lot of non-tech organisations here in Denmark wishes they could still use Zoom.
Even in my own organisation Teams isn’t exactly a beloved platform. The whole “Teams” part of it can actually solve a lot of the issues our employees have with sharing documents, having chats located in relation to a project and so on, but they just don’t use it because they hate it.
Email, Jitsi, Matrix/Element, many of them, e2e encrypted and on-premise. No serious company (outside of US) which really care about it's own data privacy would go for MS Teams, which can't even offer decent user experience most of the time.
> I wonder if OpenAI can break into enterprise. I don’t see much of a path for them, at least here in the EU.
Uhh they're already here. Under the name CoPilot which is really just ChatGPT under the hood.
Microsoft launders the missing trust in OpenAI :)
But why do you think copilot is worse? It's really just the same engine (gpt-4o right now) with some RAG grounding based on your SharePoint documents. Speaking about copilot for M365 here.
I don't think it's a great service yet, it's still very early and flawed. But so is ChatGPT.
Agreed on the strategy questions. It's interesting to tie back to IBM; my first reaction was that openai has more consumer connectivity than IBM did in the desktop era, but I'm not sure that's true. I guess what is true is that IBM passed over the "IBM Compatible" -> "MS DOS Compatible" business quite quickly in the mid 80s; seemingly overnight we had the death of all minicomputer companies and the rise of PC desktop companies.
I agree that if you're sure you have a commodity product, then you should make sure you're in the driver seat with those that will pay more, and also try and grind less effective players out. (As a strategy assessment, not a moral one).
You could think of Apple under JLG and then being handed back to Jobs as precisely being two perspectives on the answer to "does Apple have a commodity product?" Gassée thought it did, and we had the era of Apple OEMs, system integrators, other boxes running Apple software, and Jobs thought it did not; essentially his first act was to kill those deals.
The new pricing tier suggests they're taking the Jobs approach - betting that their technology integration and reliability will justify premium positioning. But they face more intense commoditization pressure than either IBM or Apple did, given the rapid advancement of open-source models.
The critical question is timing - if they wait too long to establish their enterprise position, they risk being overtaken by commoditization as IBM was. Move too aggressively, and they might prematurely abandon advantages in the broader market, as Apple nearly did under Gassée.
Threading the needle. I don't envy their position here. Especially with Musk in the Trump administration.
The Apple partnership and iOS integration seems pretty damn big for them - that really corners a huge portion of the consumer market.
Agreed on enterprise - Microsoft would have to roll out policies and integration with their core products at a pace faster than they usually do (Azure AD for example still pales in comparison to legacy AD feature wise - I am continually amazed they do not priorities this more)
ChatGPT through Siri/Apple Intelligence is a joke compared to using ChatGPT's iPhone app. Siri is still a dumb one trick pony after 13 years of being on the market.
Supposedly Apple wont be able to offer a Siri LLM that acts like ChatGPT's iPhone app until 2026. That gives Apple's current and new competitors a head start. Maybe ChatGPT and Microsoft could release an AI Phone. I'd drop Apple quickly if that becomes a reality.
Except I had to sign in to OpenAI when setting up Apple Intelligence. Even though Apple Intelligence is doing almost nothing useful for me right now at least OpenAI’s AOI number's go up.
Right now Gemini Pro is best for email, docs, calendar integration.
That said ChatGPT Plus us a good product an I might spring for Pro for a month or two.
Well one key difference is that Google and Amazon are cloud operators, they will still benefit from selling the compute that open source models run on.
For sure. If I were in charge of AI for the US, I'd prioritize having a known good and best-in-class LLM available not least for national security reasons; OAI put someone on gov rel about a year ago, beltway insider type, and they have been selling aggressively. Feels like most of the federal procurement is going to want to go to using primes for this stuff, or if OpenAI and Anthropic can sell successfully, fine.
Grok winning the Federal bid is an interesting possible outcome though. I think that, slightly de-Elon-ed, the messaging that it's been trained to be more politically neutral (I realize that this is a large step from how it's messaged) might be a real factor in the next few years in the US. Should be interesting!
Fudged71 - you want to predict openai value and importance in 2029? We'll still both be on HN I'm sure. I'm going to predict it's a dominant player, and I'll go contra-Gwern, and say that it will still be known as best-in-class product delivered AI, whether or not an Anthropic or other company has best-in-class LLM tech. Basically, I think they'll make it and sustain.
Somehow I missed the Anduril partnership announcement. I agree with you. National Security relationships in particular creates a moat that’s hard to replicate even with superior technology.
It seems possible OpenAI could maintain dominance in government/institutional markets while facing more competition in commercial segments, similar to how defense contractors operate.
Now we just need to find someone who disagrees with us and we can make a long bet.
It feels strange to say but I think that the product moat looks harder than the LLM moat for the top 5 teams right now. I'm surprised I think that, but I've assessed so many L and MLM models in the last 18 months, and they keep getting better, albeit more slowly, and they keep getting smaller while they lose less quality, and tooling keeps getting better on them.
At the same time, all the product infra around using, integrating, safety, API support, enterprise contracts, data security, threat analysis, all that is expensive and hard for startups in a way that spending $50mm with a cloud AI infra company is not hard.
Altman's new head of product is reputed to be excellent as well, so it will be super interesting to see where this all goes.
One of the main issues that enterprise AI has is the data in large corporations. It's typically a nightmare of fiefdoms and filesystems. I'm sure that a lot of companies would love to use AI more, both internally and commercially. But first they'd have to wrangle their own systems so that OpenAI can ingest the data at all.
Unfortunately, those are 5+ year projects for a lot of F500 companies. And they'll have to burn a lot of political capital to get the internal systems under control. Meaning that the CXO that does get the SQL server up and running and has the clout to do something about non-compliance, that person is going to be hated internally. And then if it's ever finished? That whole team is gonna be let go too. And it'll all just then rot, if not implode.
The AI boom for corporations is really going to let people know who is swimming naked when it comes to internal data orderliness.
Like, you want to be the person that sell shovels in the AI boom here for enterprise? Be the 'Cleaning Lady' for company data and non-compliance. Go in, kick butts, clean it all up, be hated, leave with a fat check.
Did not know that stack, thanks.
From my perspective as a data architect, I am really focused on the link between the data sources and the data lake, and the proper integration of heterogenous data into a “single” knowledge graph.
For Palantir, it is not very difficult to learn their way of working [their Pipeline Builder feeds a massive spark cluster, and OntologyManager maintains a sync between Spark and a graph database. Their other productivity tools then rely on either one data lake and/or the other].
I wonder how Glean handles the datalake part of their stack. [scalability, refresh rate, etc]
ChatGPTs analogy is more like google. People use enough google, they ain’t gonna switch unless is w quantum leap better + with scale. On the API side things could get commoditized, but it’s more than just having a slightly better LLM in the benchmarks.
There exists no future where OpenAI both sells models through API and has its own consumer product. They will have to pick one of these things to bet the company on.
That's not necessarily true. There are many companies that have both end user products and B2 products they sell. There are a million specific use cases that OpenAI won't build specific products for.
Think Amazon that has both AWS and the retail business. There's a lot of value in providing both.
AI can be used for financial gain, to influence and lie to people, to simulate human connection, to generate infinite content for consumption,... at scale.
In the early days of ChatGPT, I'd get constantly capped, every single day, even on the paid plan. At the time I was sending them messages, begging to charge me $200 to let me use it unlimited.
The enterprise surface area that OpenAI seems to be targeting is very small. The cost curve looks similar to classic cloud providers, but gets very steep much faster. We started on their API and then moved out of the OpenAI ecosystem within ~ 2years as costs grew fast and we see equivalent or better performance with much cheaper and/or OS models, combined with pretty modest hardware. Unless they can pull a bunch of Netflix-style deals the economics here will not work out.
> OpenAI is racing against two clocks: the commoditization clock (how quickly open-source alternatives catch up) and the monetization clock (their need to generate substantial revenue to justify their valuation).
Also important to recognize that those clocks aren’t entirely separated. Monetization timeline is shorter if investors perceive that commodification makes future monetization less certain, whereas if investors perceive a strong moat against commodification, new financing without profitable monetization is practical as long as the market perceives a strong enough moat that investment in growth now means a sufficient increase in monetization down the road.
The "open source nature" this time is different. "Open source" models are not actually open source, in the sense that the community can't contribute to their development. At best they're just proprietary freeware. Thus, the continuity of "open source" models depends purely on how long their sponsors sustain funding. If Meta or Alibaba or Tencent decide tomorrow that they're no longer going to fund this stuff, then we're in real trouble, much more than when Red Hat drops the ball.
I'd say Meta is the most important player here. Pretty much all the "open source" models are built in Llama in one way or the other. The only reason Llama exists is because Meta wants to commoditize AI in order to prevent the likes of OpenAI from overtaking them later. If Meta one day no longer believes in this strategy for whatever reason, then everybody is in serious trouble.
Has anyone heard or seen it used anywhere? I was in-house when it launched to big fanfare by upper management and the vast majority of the company was tasked to create team projects utilizing Watsonm
Watson was a pre-LLM technology, an evolution of IBM's experience with the expert systems which they believed would rule the roost in AI -- until transformers blew all that away.
Am I the only one who's getting annoyed of seeing LLMs be marketed as competent search engines? That's not what they've been designed for, and they have been repeatedly bad at that.
> the commoditization clock (how quickly open-source alternatives catch up)
I believe we are already there at least for the average person.
Using Ollama I can run different LLMs locally that are good enough for what I want to do. That's on a 32GB M1 laptop. No more having to pay someone to get results.
For development Pycharm Pro latest LLM autocomplete is just short of writing everything for me.
"whether large organizations will prioritize the kind of integrated, reliable, and "safe" AI solutions"
While safe in output quality control. SaaS is not safe in terms of data control. Meta's Llama is the winner in any scenario where it would be ridiculous to send user data to a third party.
Yes, but how can this strategy work, and who would choose ChatGPT at this point, when there are so many alternatives, some better (Anthropic), some just as good but way cheaper (Amazon Nova) and some excellent and open-source?
Microsoft is their path into the enterprise. You can use their so-so enterprise support directly or have all the enterprise features you could want via Azure.
There is really not a lot of Open source large language models with that capability. the only game changer so far has been meta open sourcing llama, and that's about it with models of that caliber
I actually pay 166 Euros a month for Claude Teams. Five seats. And I only use one. For myself. Why do I pay so much? Because the normal paid version (20 USD a month) interrups the chats after a dozen questions and wants me to wait a few hours until I can use it again. But Teams plan gives me way more questions.
But why do I pay that much? Because Claude in combination with the Projects feature, where I can upload two dozen or more files, PDFs, text, and give it a context, and then ask questions in this specific context over a period of week or longer, come back to it and continue the inquiry, all of this gives me superpowers. Feels like a handful of researchers at my fingertips that I can brainstorm with, that I can ask to review the documents, come up with answers to my questions, all of this is unbelievably powerful.
I‘d be ok with 40 or 50 USD a month for one user, alas Claude won’t offer it. So I pay 166 Euros for five seats and use one. Because it saves me a ton of work.
Kagi Ultimate (US$25/mo) includes unlimited use of all the Anthropic models.
Full disclosure: I participated in Kagi's crowdfund, so I have some financial stake in the company, but I mainly participated because I'm an enthusiastic customer.
I'm uninformed about this, it may just be superstition, but my feeling while using Kagi in this way is that after using it for a few hours it gets a bit more forgetful. I come back the next day and it's smart again, for while. It's as if there's some kind of soft throttling going on in the background.
I'm an enthusiastic customer nonetheless, but it is curious.
I noticed this too! It's dramatic in the same chat. I'll come back the next day, and even though I still have the full convo history, and it's as if it completely forgot all my earlier instructions.
Makes sense. Keeping the conversation implieas that each new message carries the whole history, again. You need to create new chats from time to time, or throttle to a different model...
This is my biggest gripe with these LLMs. I primarily use Claude, and it exhibits the same described behavior. I'll find myself in a flow state and then somewhere around hour 3 it starts to pretend like it isn't capable of completing specific tasks that it had been performing for hours, days, weeks. For instance, I'm working on creating a few LLCs with their requisite social media handles and domain registrations. I _used_ to be able to ask Claude to check all US State LLC registrations, all major TLD domain registrations, and USPTO against particular terms and similar derivations. Then one day it just decided to stop doing this. And it tells me it can't search the web or whatever. Which is bullshit because I was verifying all of this data and ensuring it wasn't hallucinating - which it never was.
The flow lately has been transforming test cases to accommodate interface changes, so I'm not asking it to remember something from several hours ago, I'm just asking it to make the "same" transformation from the previous prompt, except now to a different input.
It struggles with cases that exceed 1000 lines or so. Not that it loses track entirely at that size, it just starts making dumb mistakes.
Then after about 2 or 3 hours, the size at which it starts to struggle drops to maybe 500. A new chat doesn't seem to help, but who can say, it's a difficult thing to quantify. After 12 hours, both me and the AI are feeling fresh again. Or maybe it's just me, idk.
And if you're about to suggest that the real problem here is that there's so much tedious filler in these test cases that even an AI gets bored with them... Yes, yes it is.
It probably isn’t cheaper for Kagi per token but I assume most people don’t use up as much as they can, like with most other subscriptions.
I.e. I’ve been an Ultimate subscriber since they launched the plan and I rarely use the assistant feature because I’ve got a subscription to ChatGPT and Claude. I only use it when I want to query Llama, Gemini, or Mistral models which I don’t want to subscribe to or create API keys for.
How would you rate Kagi Ultimate vs Arc search? IE is it scraping relevant websites live and summarising them? Or is it just access to ChatGPT and other models (with their old data).
At some point I'm going to subscribe to Kagi again (once I have a job) so be interested to see how it rates.
They extract concepts from their training data and can combine concepts to produce output that isn't part of their training set, but they do require those concepts to be in their training data. So you can ask them to make a picture of your favorite character fighting mecha on an alien planet and it will produce a new image, as long as your favorite character is in their training set. But the extent it imagines an alien planet or what counts as mecha is limited by the input it is trained on, which is where a human artist can provide much more creativity.
You can also expand it by adding in more concepts to better specify things. For example you can specify the mecha look like alphabet characters while the alien planet expresses the randomness of prime numbers and that might influence the AI to produce a more unique image as you are now getting into really weird combinations of concepts (and combinations that might actually make no sense if you think too much about them), but you also greatly increase the chance of getting trash output as the AI can no longer map the feature space back to an image that mirrors anything like what a human would interpret as having a similar feature space.
The paper that coined the term "stochastic parrots" would not agree with the claim that LLMs are "unable to produce a response that isn't in their training data". And the research has advanced a _long_ way since then.
[1]: Bender, Emily M., et al. "On the dangers of stochastic parrots: Can language models be too big?." Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.
/facepalm. Woosh indeed. Can I blame pronoun confusion? (Not to mention this misunderstanding kicked off a farcically unproductive ensuing discussion.)
When combined with intellectual honesty and curiosity, the best LLMs can be powerful tools for checking argumentation. (I personally recommend Claude 3.5 Sonnet.) I pasted in the conversation history and here is what it said:
> Their position is falsifiable through simple examples: LLMs can perform arithmetic on numbers that weren't in training data, compose responses about current events post-training, and generate novel combinations of ideas.
Spot on. It would take a lot of editing for me to speak as concisely and accurately!
> you can try to convince all you want, but you're just grasping at straws.
After coming back to this to see how the conversation has evolved (it hasn't), I offer this guess: the problem isn't at the object level (i.e. what ML research has to say on this) nor my willingness to engage. A key factor seems to a lack of interest on the other end of the conversation.
Most importantly, I'm happy to learn and/or be shown to be mistaken.
Based on my study (not at the Ph.D. level but still quite intensive), I am confident the comment above is both wrong and poorly framed. Why? Seeing phrases "incapable of thought" and "stochastic parrots" are red flags to me. In my experience, people that study LLM systems are wary of using such brash phrases. They tend to move the conservation away from understanding towards combativeness and/or confusion.
Being this direct might sound brusque and/or unpersuasive. My top concern at this point, not knowing you, is that you might not prioritize learning and careful discussion. If you want to continue discussing, here is what I suggest:
First, are you familiar with the double-crux technique? If not, the CFAR page is a good start.
Second, please share three papers (or high-quality writing from experts): one that supports your claim, one that opposes it, and one that attempts to synthesize.
I'll try again... Can you (or anyone) define "thought" in way that is helpful?
Some other intelligent social animals have slightly different brains, and it seems very likely they "think" as well. Do we want to define "thinking" in some relative manner?
Say you pick a definition requiring an isomorphism to thoughts as generated by a human brain. Then, by definition, you can't have thoughts unless you prove the isomorphism. How are you going to do that? Inspection? In theory, some suitable emulation of a brain is needed. You might get close with whole-brain emulation. But how do you know when your emulation is good enough? What level of detail is sufficient?
What kinds of definitions of "thought" remains?
Perhaps something related to consciousness? Where is this kind of definition going to get us? Talking about consciousness is hard.
Anil Seth (and others) talks about consciousness better than most, for what it is worth -- he does it by getting more detailed and specific. See also: integrated information theory.
By writing at some length, I hope to show that using loose sketches of concepts using words such as "thoughts" or "thinking" doesn't advance a substantive conversation. More depth is needed.
Meta: To advance the conversation, it takes time to elaborate and engage. It isn't easy. An easier way out is pressing the down triangle, but that is too often meager and fleeting protection for a brittle ego and/or a fixated level of understanding.
Sometimes, I get this absolute stroke of brilliance for this idea of a thing I want to make and it's gonna make me super rich, and then I go on Google, and find out that there's already been a Kickstarter for it and it's been successful, and it's now a product I can just buy.
No, but then again you're not paying me $20 per month while I pretend I have absolute knowledge.
You can, however, get the same human experience by contracting a consulting company that will bill you $20 000 per month and lie to you about having absolute knowledge.
I have ChatGPT ($20/month tier) and Claude and I absolutely see this use case. Claude is great but I love long threads where I can have it help me with a series of related problems over the course of a day. I'm rarely doing a one-shot. Hitting the limits is super frustrating.
So I understand the unlimited use case and honestly am considering shelling out for the o1 unlimited tier, if o1 is useful enough.
A theoretical app subscription for $200/month feels expensive. Having the equivalent a smart employee work beside me all day for $200/month feels like a deal.
Yep, I have 2 accounts I use because I kept hitting limits. I was going to do the Teams to get the 5x window, but I got instantly banned when clicking the teams button on a new account, so I ended up sticking with 2 separate accounts. It's a bit of a pain, but I'm used to it. My other account has since been unbanned, but I haven't needed it lately as I finished most of my coding.
NotebookLM is designed for a distinct use case compared to using Gemini's models in a general chat-style interface. It's specifically geared towards research and operates primarily as a RAG system for documents you upload.
I’ve used it extensively to cross-reference and analyse academic papers, and the performance has been excellent so far. While this is just my personal experience (YMMV), it’s far more reliable and focused than Gemini when it comes to this specific use case. I've rarely experienced a hallucination with it. But perhaps that's the way I'm using it.
Have you tried LibreChat https://www.librechat.ai/ and just use it with your own API keys? You pay for what you use and can use and switch between all major model providers
The argument of more compute power for this plan can be true, but this is also a pricing tactic known as the decoy effect or anchoring. Here's how it works:
1. A company introduces a high-priced option (the "decoy"), often not intended to be the best value for most customers.
2. This premium option makes the other plans seem like better deals in comparison, nudging customers toward the one the company actually wants to sell.
In this case for Chat GPT is:
Option A: Basic Plan - Free
Option B: Plus Plan - $20/month
Option C: Pro Plan - $200/month
Even if the company has no intention of selling the Pro Plan, its presence makes the Plus Plan seem more reasonably priced and valuable.
While not inherently unethical, the decoy effect can be seen as manipulative if it exploits customers’ biases or lacks transparency about the true value of each plan.
Of course this breaks down once you have a competitor like Anthropic, serving similarly-priced Plan A and B for their equivalently powerful models; adding a more expensive decoy plan C doesn't help OpenAI when their plan B pricing is primarily compared against Anthropic's plan B.
Leadership at this crop of tech companies is more like followership. Whether it's 'no politics', or sudden layoffs, or 'founder mode', or 'work from home'... one CEO has an idea and three dozen other CEOs unthinkingly adopt it.
Several comments in this thread have used Anthropic's lower pricing as a criticism, but it's probably moot: a month from now Anthropic will release its own $200 model.
As Nvidia's CEO likes to say, the price is set by the second best.
From an API standpoint, it seems like enterprises are currently split between anthropic and ChatGPT and most are willing to use substitutes. For the consumer, ChatGPT is the clear favorite (better branding, better iPhone app)
An example of this is something I learned from a former employee who went to work for Encyclopedia Brittanica 'back in the day'. I actually invited the former employee to come back to our office so I could understand and learn from exactly what he had been taught (noting of course this was back before the internet obviously where info like that was not as available...)
So they charge (as I recall from what he told me I could be off) something like $450 for shipping the books (don't recall the actual amount but it seemed high at the time).
So the salesman is taught to start off the sales pitch with a set of encylopedia's costing at the time let's say $40,000 some 'gold plated version'.
The potential buyer laughs and then salesman then says 'plus $450 for shipping!!!'.
They then move on to the more reasonable versions costing let's say $1000 or whatever.
As a result of the first example of high priced the customer (in addition to the positioning you are talking about) the customer is setup to accept the shipping charge (which was relatively high).
That’s a really basic sales technique much older than the 1975 study. I wonder if it went under a different name or this was a case of studying and then publishing something that was already well-known outside of academia.
I use GPT-4 because 4o is inferior. I keep trying 4o but it consistently underperforms. GPT-4 is not working as hard anymore compared to a few months ago. If this release said it allows GPT-4 more processing time to find more answers and filter them, I’d then see transparency of service and happily pay the money. As it is I’ll still give it a try and figure it out, but I’d like to live in a world where companies can be honest about their missteps. As it is I have to live in this constructed reality that makes sense to me given the evidence despite what people claim. Am I fooling/gaslighting myself?? Who knows?
Glad I'm not the only one. I see 4o as a lot more of a sidegrade. At this point I mix them up and I legitimately can't tell, sometimes I get bad responses from 4, sometimes 4o.
Responses from gpt-4 sound more like AI, but I haven't had seemingly as many issues as with 4o.
Also the feature of 4o where it just spits out a ton of information, or rewrites the entire code is frustrating
Yes the looping. They should make and sell a squishy mascot you could order, something in the style of Clippy, so that when it loops, I could pluck it off my monitor and punch it in the face.
I'm a Plus member, and the biggest limitation I am running into by far is the maximum length of a context window. I'm having context fall out of scope throughout the conversion or not being able to give it a large document that I can then interrogate.
So if I go from paying $20/month for 32,000 tokens, to $200/month for Pro, I expect something more akin to Enterprise's 128,000 tokens or MORE. But they don't even discuss the context window AT ALL.
For anyone else out there looking to build a competitor I STRONGLY recommend you consider the context window as a major differentiator. Let me give you an example of a usage which ChatGPT just simply cannot do very well today: Dump a XML file into it, then ask it questions about that file. You can attach files to ChatGPT, but it is basically pointless because it isn't able to view the entire file at once due to, again, limited context windows.
Seems like something that would be worth pinging OpenAI about because it's a pretty important claim that they are making on their pricing page! Unless it's a matter of counting tokens differently.
According to the pricing page, 32K context is for Plus users and 128K context is for Pro users. Not disagreeing with you, just adding context for readers that while you are explaining that the 4o API has 128K window, the 4o ChatGPT agent appears to have varying context depending on account type.
The longer the context the more backtracking it needs to do. It gets exponentially more expensive. You can increase it a little, but not enough to solve the problem.
Instead you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context.
LLM is a cool tool. You need to build around it. OpenAI should start shipping these other components so people can build their solutions and make their money selling shovels.
Instead they want end user to pay them to use the LLM without any custom tooling around. I don't think that's a winning strategy.
Transformer architectures generally take quadratic time wrt sequence length, not exponential. Architectural innovations like flash attention also mitigate this somewhat.
Backtracking isn't involved, transformers are feedforward.
No, additional context does not cause exponential slowdowns and you absolutely can use FlashAttention tricks during training, I'm doing it right now. Transformers are not RNNs, they are not unrolled across timesteps, the backpropagation path for a 1,000,000 context LLM is not any longer than a 100 context LLM of the same size. The only thing which is larger is the self attention calculation which is quadratic wrt compute and linear wrt memory if you use FlashAttention or similar fused self attention calculations. These calculations can be further parallelized using tricks like ring attention to distribute very large attention calculations over many nodes. This is how google trained their 10M context version of Gemini.
So why are the context windows so "small", then? It would seem that if the cost was not so great, then having a larger context window would give an advantage over the competition.
The cost for both training and inference is vaguely quadratic while, for the vast majority of users, the marginal utility of additional context is sharply diminishing. For 99% of ChatGPT users something like 8192 tokens, or about 20 pages of context would be plenty. Companies have to balance the cost of training and serving models. Google did train an uber long context version of Gemini but since Gemini itself fundamentally was not better than GPT-4 or Claude this didn't really matter much, since so few people actually benefited from such a niche advantage it didn't really shift the playing field in their favor.
Marginal utility only drops because effective context is really bad, i.e. most models still vastly prefer the first things they see and those "needle in a haystack" tests are misleading in that they convince people that LLMs do a good job of handling their whole context when they just don't.
If we have the effective context window equal to the claimed context window, well, I'd start worrying a bit about most of the risks that AI doomers talk about...
There has been a huge increase in context windows recently.
I think the larger problem is "effective context" and training data.
Being technically able to use a large context window doesn't mean a model can actually remember or attend to that larger context well. In my experience, the kinds of synthetic "needle in haystack" tasks that AI companies use to show how large of a context their model can handle don't translate very well to more complicated use cases.
You can create data with large context for training by synthetically adding in random stuff, but there's not a ton of organic training data where something meaningfully depends on something 100,000 tokens back.
Also, even if it's not scaling exponentially, it's still scaling: at what point is RAG going to be more effective than just having a large context?
Great point about the meaningful datasets, this makes perfect sense. Esp. in regards to SFT and RLHF. Although I suppose it would be somewhat easier to do pretraining on really long context (books, I assume?)
Because you have to do inference distributed between multiple nodes at this point. For prefill because prefill is actually quadratic, but also for memory reasons. KV Cache for 405B at 10M context length would take more than 5 terabytes (at bf16). That's 36 H200 just for KV Cache, but you would need roughly 48 GPUs to serve bf16 version of the model. Generation speed at that setup would be roughly 30 tokens per second, 100k tokens per hour, and you can server only a single user because batching doesn't make sense at these kinds of context lengths. If you pay 3 dollars per hour per GPU, it's $1440 per million tokens cost. For fp8 version the numbers are a bit better: you need only 24 GPUs, generation speed stays roughly the same, so it's only 700 dollars per million tokens. There are architectural modifications that will bring that down significantly, but, nonetheless, it's still really really expensive, but also quite hard to get to work.
Another factor in context window is effective recall. If the model can't actually use a fact 1m tokens earlier, accurately and precisely, then there's no benefit and it's harmful to the user experience to allow the use of a poorly functioning feature. Part of what Google have done with Gemini's 1-2m token context window is demonstrate that the model will actually recall and use that data. Disclosure, I do work at Google but not on this, I don't have any inside info on the model.
Memory. I don't know the equation, but its very easy to see when you load a 128k context model at 8K vs 80K. The quant I am running would double VRAM requirements when loading 80K.
> The only thing which is larger is the self attention calculation which is quadratic wrt compute and linear wrt memory if you use FlashAttention or similar fused self attention calculations.
FFWD input is self-attention output. And since the output of self-attention layer is [context, d_model], FFWD layer input will grow as well. Consequently, FFWD layer compute cost will grow as well, no?
The cost of FFWD layer according to my calculations is ~(4+2 * true(w3)) * d_model * dff * n_layers * context_size so the FFWD cost grows linearly wrt the context size.
So, unless I misunderstood the transformer architecture, larger the context the larger the compute of both self-attention and FFWD is?
So you're saying that if I have a sentence of 10 words, and I want the LLM to predict the 11th word, FFWD compute is going to be independent of the context size?
I don't understand how since that very context is what makes the likeliness of output of next prediction worthy, or not?
More specifically, FFWD layer is essentially self attention output [context, d_model] matrix matmul'd with W1, W2 and W3 weights?
I may be missing something, but I thought that each context token would result in an 3 additional parameters per context token for self attention to build its map, since each attention must calculate a value considering all existing context
> you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context
Be aware that this tends to give bad results. Once RAG is involved you essentially only do slightly better than a traditional search, a lot of nuance gets lost.
> Instead you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context.
Isn't that kind of what Anthropic is offering with projects? Where you can upload information and PDF files and stuff which are then always available in the chat?
Because they can't do long context windows. That's the only explanation. What you can do with a 1m token context window is quite a substantial improvement, particularly as you said for enterprise usage.
The only reason I open Chat now is because Claude will refuse to answer questions on a variety of topics including for example medication side effects.
When I tested o1 a few hours ago, it seemed like it was losing context. After I asked it to use a specific writing style, and pasting a large reference text, it forgot my demand. I reminded it, and it kept the rule for a few more messages, and after another long paste it forgot again.
If a $200/month pro level is successful it could open the door to a $2000/month segment, and the $20,000/month segment will appear and the segregation of getting ahead with AI will begin.
Agreed. Where may I read about how to set up an LLM similar to that of Claude, which has the minimum length of Claude's context window, and what are the hardware requirements? I found Claude incredibly useful.
And now you can get the 405b quality in a 70b according to meta. Costs really come down massively with that. I wonder if it's really as good as they say though.
Full blown agents but they have to really able to replace a semi competent, harder than it sounds especially for edge cases where a human can easily get past
With o1-preview and $20 subscription my queries typically were answered in 10-20 seconds. I've tried $200 subscription with some queries and got 5-10 minutes answer time. Unless the load is substantially increased and I was just waiting in queue for computing resources, I'd assume that they throw a lot more hardware for o1-pro. So it's entirely possible that $200/month is still at loss.
I've been concatenating my source code of ~3300 lines and 123979 bytes(so likely < 128K context window) into the chat to get better answers. Uploading files is hopeless in the web interface.
Have you considered RAG instead of using the entire document? It's more complex but would at least allow you to query the document with your API of choice.
When talking about context windows I'm surprised no one mentions https://poe.com/.
Switched over from ChatGPT about a year ago, and it's amazing. Can use all models and the full context window of them, for the same price as a ChatGPT subscription.
Poe.com goes straight to login page, doesn't want to divulge ANY information to me before I sign up. No About Us or Product description or Pricing - nothing. Strange behavior. But seeing it more and more with modern web sites.
What don’t you like about Claude? I believe the context is larger.
Coincidentally I’ve been using it with xml files recently (iOS storyboard files), and it seems to do pretty well manipulating and refactoring elements as I interact with it.
First impressions: The new o1-Pro model is an insanely good writer. Aside from favoring the long em-dash (—) which isn't on most keyboards, it has none of the quirks and tells of old GPT-4/4o/o1. It managed to totally fool every "AI writing detector" I ran it through.
It can handle unusually long prompts.
It appears to be very good at complex data analysis. I need to put it through its paces a bit more, though.
> Aside from favoring the long em-dash (—) which isn't on most keyboards
Interesting! I intentionally edit my keyboard layout to include the em-dash, as I enjoy using it out of sheer pomposity—I should undoubtedly delve into the extent to which my own comments have been used to train GPT models!
On my keyboard (en-us) it's ALT+"-" to get an em-dash.
I use it all the time because it's the "correct" one to use, but it's often more "correct" to just rewrite the sentence in a way that doesn't call for one. :)
Just so you know, text using the em-dash like that combined with a few other "tells" makes me double check if it might be LLM written.
Other things are the overuse of transition words (e.g., "however," "furthermore," "moreover," "in summary," "in conclusion,") as well as some other stuff.
It might not be fair to people who write like that naturally, but it is what it is in the current situation we find ourselves in.
"In the past three days, I've reviewed over 100 essays from the 2024-2025 college admissions cycle. Here's how I could tell which ones were written by ChatGPT"
On Windows em dash is ALT+0151; the paragraph mark (§) is ALT+0167. Once you know them (and a couple of others, for instance accented capitals) they become second nature, and work on all keyboards, everywhere.
Startup I'm at has generated a LOT of content using LLMs and once you've reviewed enough of the output, you can easily see specific patterns in the output.
Some words/phrases that, by default, it overuses: "dive into", "delve into", "the world of", and others.
You correct it with instructions, but it will then find synonyms so there is also a structural pattern to the output that it favors by default. For example, if we tell it "Don't start your writing with 'dive into'", it will just switch to "delve into" or another synonym.
Yes, all of this can be corrected if you put enough effort into the prompt and enough iterations to fix all of these tells.
> if we tell it "Don't start your writing with 'dive into'", it will just switch to "delve into" or another synonym.
LLMs can radically change their style, you just have to specify what style you want. I mean, if you prompt it to "write in the style of an angry Charles Bukowski" you'll stop seeing those patterns you're used to.
In my team for a while we had a bot generating meeting notes "in the style of a bored teenager", and (besides being hilarious) the results were very unlike typical AI "delvish".
Of course the "delve into" and "dive into" is just its default to be corrected with additional instruction. But once you do something like "write in the style of...", then it has its own tells because as I noted below, it is, in the end, biased towards frequency.
Of course there will be a set of tells for any given style, but the space of possibilities is much larger than what a person could recognize. So as with most LLM tasks, the issue is figuring out how to describe specifically what you want.
Aside: not about you specifically, but I feel like complaints on HN about using LLMs often boil down to somebody saying "it doesn't do X", where X is a thing they didn't ask the the model to do. E.g. a thread about "I asked for a Sherlock Holmes story but the output wasn't narrated by Watson" was one that stuck in my mind. You wouldn't think engineers would make mistakes like that, but I guess people haven't really sussed out how to think about LLMs yet.
Anyway for problems like what you described, one has to be wary about expecting the LLM to follow unstated requirements. I mean, if you just tell it not to say "dive into" and it doesn't, then it's done everything it was asked, after all.
I mean, we get it. It's a UX problem. But the thing is you have to tell it exactly what to do every time. Very often, it'll do what you said but not what you meant, and you have to wrestle with it.
You'd have to come up with a pretty exhaustive list of tells. Even sentence structure and mood is sometimes enough, not just the obvious words.
This is the way. Blending two or more styles also works well, especially if they're on opposite poles, e.g. "write like the imaginary lovechild of Cormac McCarthy and Ernest Hemingway."
Also, wouldn't angry Charles Bukowski just be ... Charles Bukowski?
> ...once you've reviewed enough of the output, you can easily see specific patterns in the output
That is true, but more importantly, are those patterns sufficient to distinguish between AI-generated content from human-generated content? Humans express themselves very differently by region and country ( e.g. "do the needful" in not common in the midwest, "orthogonal" and "order of magnitude" are used more on HN than most other places). Outside of watermaking, detecting AI-generated text is with an acceptably small false-positive error rate is nearly impossible.
Not sure why you default to an uncharitable mode in understanding what I am trying to say.
I didn't say they know their own tells. I said they naturally output them for you. Maybe the obvious is so obvious I don't need to comment on it. Meaning this whole "tells analysis" would necessarily rely on synthetic data sets.
I always assumed that they were snake oil because the training objective is to get a model that writes like a human. AI detectors by definition are showing what does not sound like a human, so presumably people will train the models against the detectors until they no longer provide any signal.
The thing is, the LLM has a flaw: it is still fundamentally biased towards frequency.
AI detectors generally can take advantage of this and look for abnormal patterns in frequencies of specific words, phrases, or even specific grammatical constructs because the LLM -- by default -- is biased that way.
I'm not saying this is easy and certainly, LLMs can be tuned in many ways via instructions, context, and fine-tuning to mask this.
They're not very accurate, but I think snake oil is a bit too far - they're better than guessing at least for the specific model(s) they're trained on. OpenAI's classifier [0] was at 26% recall, 91% precision when it launched, though I don't know what models created the positives in their test set. (Of course they later withdrew that classifier due to its low accuracy, which I think was the right move. When a company offers both an AI Writer and an AI Writing detector people are going to take its predictions as gospel and _that_ is definitely a problem.)
All that aside, most models have had a fairly distinctive writing style, particularly when fed no or the same system prompt every time. If o1-Pro blends in more with human writing that's certainly... interesting.
Anecdotally, English/History/Communications professors are confirming cheaters with them because they find it easy to identify false information. The red flags are so obvious that the checker tools are just a formality: student papers now have fake URLs and fake citations. Students will boldly submit college papers which have paragraphs about nonexistent characters, or make false claims about what characters did in a story.
The e-mail correspondence goes like this: "Hello Professor, I'd like to meet to discuss my failing grade. I didn't know that using ChatGPT was bad, can I have some points back or rewrite my essay?"
Yeah but they "detect" the characteristic AI style: The limited way it structures sentences, the way it lays out arguments, the way it tends to close with an "in conclusion" paragraph, certain word choices, etc. o1-Pro doesn't do any of that. It writes like a human.
Damnit. It's too good. It just saved me ~6 hours in drafting a complicated and bespoke legal document. Before you ask: I know what I'm doing, and it did a better job in five minutes than I could have done over those six hours. Homework is over. Journalism is over. A large slice of the legal profession is over. For real this time.
Journalism is not only about writing. It is about sources, talking to people, being on the ground, connecting dots, asking the right questions. Journalists can certainly benefit from AI and good journalists will have jobs for a long time still.
While the above is true, I'd say the majority of what passes as journalism these days has none of the above and the writing is below what an AI writer could produce :(
It's actually surprising how many articles on 'respected' news websites have typos. You'd think there would be automated spellcheckers and at least one 'peer review' (probably too much to ask an actual editor to review the article these days...).
Mainstream news today is written for an 8th grade reading ability. Many adults would lose interest otherwise, and the generation that grew up reading little more than social media posts will be even worse.
AI can handle that sort of writing just fine, readers won't care about the formulaic writing style.
So AI could actually turn journalism more into what it originally was: reporting what is going on, rather than reading and rewriting information from other sources. Interesting possibility.
Is exactly the key element in being able to use spicy autocomplete. If you don't know what you're doing, it's going to bite you and you won't know it until it's too late. "GPT messed up the contract" is not an argument I would envy anyone presenting in court or to their employer. :)
> Before you ask: I know what I'm doing, and it did a better job in five minutes than I could have done over those six hours.
Seems like lawyers could do more faster because they know what they are doing. Experts dont get replaced, they get tools to amplify and extend their expertise
Replacement doesn't happen only if the demand for their services scales proportional to the productivity improvements, which is true sometimes but not always true, and is less likely to be true if the productivity improvements are very large.
It still needs to be driven by someone who knows what they're doing.
Just like when software that was coming out, it may have ended jobs.
But it also helped get things done that wouldn't otherwise, or as much.
In this case, equipping a capable lawyer to be 20x is more like an iron man suit, which is OK. If you can get more done, wit less effort, you are still critical to what's needed.
I noticed a writing style difference, too, and I prefer it. More concise. On the coding side, it's done very well on large (well as large as it can manage) codebase assessment, bug finding, etc. I will reach for it rather than o1-preview for sure.
My 10th grade english teacher (2002, just as blogging was taking off) called it sloppy and I gotta agree with her. These days I see it as youtube punctuation, like jump cut editing for text.
It's not. People just like to pretend they have moral superiority for their opinions on arbitrary writing rules, when in reality the only thing that matters is if you're clearly communicating something valuable.
I'm a professional writer and use em-dashes without a second thought. Like any other component of language, just don't _over_ use them.
That's encouraging to hear that it's a better writer, but I wonder if "quirks and tells" can only be seen in hindsight. o1-pro's quirks may only become apparent after enough people have flooded the internet with its output.
This is a huge improvement over previous GPT and Claude, which use the terrible "space, hyphen, space" construct. I always have to manually change them to em-dashes.
This shouldn’t really be a serious issue nowadays. On macOS it’s Option+Shift+'-', on Windows it’s Ctrl+Alt+Num- or (more cryptic) Alt+0151.
The Swiss army knife solution is to configure yourself a Compose key, and then it’s an easy mnemonic like for example Compose 3 - (and Compose 2 - for en dash).
No internet access makes it very hard to benefit from o1 pro. Most of the complex questions I would ask require google search for research papers, language or library docs, etc. Not sure why o1 pro is banned from the internet, was it caught downloading too much porn or something?
Macs have always been able to type the em dash — the key combination is ⌥⇧- (Option-Shift-hyphen). I often use them in my own writing. (Hope it doesn't make somebody think I'm phoning it in with AI!)
Some autocorrect software automatically converts two hyphens in a row into an emdash. I know that's how it worked in Microsoft Word and just verified it's doing that with Google Docs. So it's not like it's hard to include an emdash in your writing.
This is interesting, because at my job I have to manually edit registration addresses that use the long em-dash as our vendor only supports ASCII. I think Windows automatically converts two dashes to the long em-dash.
How do you have that configured? The Windows+. shortcut was added in a later update to W10 and pops up a GUI for selecting emojis, symbols, or other non-typable characters.
I need help creating a comprehensive Anki deck system for my 8-year-old who is following a classical education model based on the trivium (grammar stage). The child has already:
- Mastered numerous Latin and Greek root words
- Achieved mathematics proficiency equivalent to US 5th grade
- Demonstrated strong memorization capabilities
Please create a detailed 12-month learning plan with structured Anki decks covering:
1. Core subject areas prioritized in classical education (specify 4-5 key subjects)
2. Recommended daily review time for each deck
3. Progression sequence showing how decks build upon each other
4. Integration strategy with existing knowledge of Latin/Greek roots
5. Sample cards for each deck type, including:
- Basic cards (front/back)
- Cloze deletions
- Image-based cards (if applicable)
- Any special card formats for mathematical concepts
For each deck, please provide:
- Clear learning objectives
- 3-5 example cards with complete front/back content
- Estimated initial deck size
- Suggested intervals for introducing new cards
- Any prerequisites or dependencies on other decks
Additional notes:
- Cards should align with the grammar stage focus on memorization and foundational knowledge
- Please include memory techniques or mnemonics where appropriate
- Consider both verbal and visual learning styles
- Suggest ways to track progress and adjust difficulty as needed
Example of the level of detail needed for card examples:
Interesting that it thought for 1m28s on only two tasks. My intuition with o1-preview is that each task had a rather small token limit, perhaps they raised this limit.
If o1-pro is 10% better than Claude, but you are a guy who makes $300,000 per year, but now can make $330,000 because o1-pro makes you more productive, then it makes sense to give Sam $2,400.
Above example makes no sense since it says ChatGPT is 10% better than Claude at first, then pivots to use it as a 10% total productivity enhancer. Which is it?
It's never this clean, but it is direction-ally correct. If I make $300k / year, and I can tell that chatgpt already saves me hours or even days per month, $200 is a laughable amount. If I feel like pro is even slightly better, it's worth $200 just to know that I always have the best option available.
Heck, it's probably worth $200 even if I'm not confident it's better just in case it is.
For the same reason I don't start with the cheapest AI model when asking questions and then switch to the more expensive if it doesn't work. The more expensive one is cheap enough that it doesn't even matter, and $200 is cheap enough (for a certain subsection of users) that they'll just pay it to be sure they're using the best option.
That's only true if your time is metered by the hour; and the vast majority of roles which find some benefit from AI, at this time, are not compensated hourly. This plan might be beneficial to e.g. CEO-types, but I question who at OpenAI thought it would be a good idea to lead their 12 days of hollowhype with this launch, then; unless this is the highest impact release they've got (one hopes it is not).
>This plan might be beneficial to e.g. CEO-types, but I question who at OpenAI thought it would be a good idea to lead their 12 days of hollowhype with this launch, then; unless this is the highest impact release they've got (one hopes it is not).
In previous multi-day marketing campaigns I've ran or helped ran (specifically on well-loved products), we've intentionally announced a highly-priced plan early on without all of its features.
Two big benefits:
1) Your biggest advocates get to work justifying the plan/product as-is, anchoring expectations to the price (which already works well enough to convert a slice of potential buyers)
2) Anything you announce afterward now gets seen as either a bonus on top (e.g. if this $200/mo plan _also_ includes Sora after they announce it...), driving value per price up compared to the anchor; OR you're seen as listening to your audience's criticisms ("this isn't worth it!") by adding more value to compensate.
I work from home and my time is accounted for by way of my productive output because I am very far away from a CEO type. If I can take every Wednesday off because I’ve gained enough productivity to do so, I would happily pay $200/mo out of my own pocket to do so.
$200/user/month isn’t even that high of a number in the enterprise software world.
Employers might be willing to get their employees a subscription if they believe it makes their employees they are paying $$$$$ more X% productive. (Where X% of their salary works out to more than $2400/year)
There is only so much time in the day. If you have a job where increased productivity translates to increases income (not just hourly metered jobs) then you will see a benefit.
> cheapest AI model when asking questions and then switch to the more expensive if it doesn't work.
The thing is, more expensive isn't guaranteed to be better. The more expensive models are better most of the time, but not all the time. I talk about this more in this comment https://news.ycombinator.com/item?id=42313401#42313990
Since LLMs are non-deterministic, there is no guarantee that GPT-4o is better than GPT-4o mini. GPT-4o is most likely going to be better, but sometimes the simplicity of GPT-4o mini makes it better.
As you say, the more expensive models are better most of the time.
Since we can't easily predict which model will actually be better for a given question at the time of asking, it makes sense to stick to the most expensive/powerful models. We could try, but that would be a complex and expensive endeavor. Meanwhile, both weak and powerful models are already too cheap to meter in direct / regular use, and you're always going to get ahead with the more powerful ones, per the very definition of what "most of the time" means, so it doesn't make sense to default to a weaker model.
TBH it's easily in the other direction. If I can get something to clients quicker that's more valuable.
If paying this gets me two days of consulting it's a win for me.
Obvious caveat if cheaper setups get me the same, although I can't spend too long comparing or that time alone will cost more than just buying everything.
The number of times I've heard all this about some other groundbreaking technology... most businesses just went meh and moved on. But for self-employed, if those numbers are right, it may make sense.
It's not worth it if you're a W2 employee and you'll just spend those 2 hours doing other work. Realistically, working 42 hours a week instead of 40 will not meaningfully impact your performance, so doing 42 hours a week of work in 40 won't, either.
I pay $20/mo for Claude because it's been better than GPT for my use case, and I'm fine paying that but I wouldn't even consider something 10x the price unless it is many, many times better. I think at least 4-5x better is when I'd consider it and this doesn't appear to be anywhere close to even 2x better.
That's also not how pricing works, it's about perceived incremental increases in how useful it is (marginal utility), not about the actual more money you make.
Yeah, the $200 seems excessive and annoying, until you realise it depends on how much it saves you. For me it needs to save me about 6 hours per month to pay for itself.
Funny enough I've told people that baulk at the $20 that I would pay $200 for the productivity gains of the 4o class models. I already pay $40 to OpenAI, $20 to Anthropic, and $40 to cursor.sh.
ah yes, you must work at the company where you get paid per line of code. There's no way productivity is measured this accurately and you are rewarded directly in any job unless you are self-employed and get paid per website or something
Being in an AI domain does not invalidate the fundamental logic. If an expensive tool can make you productive enough to offset the cost, then the tool is worth it for all intents and purposes.
I think of them as different people -- I'll say that I use them in "ensemble mode" for coding, the workflow is Claude 3.5 by default -- when Claude is spinning, o1-preview to discuss, Claude to implement. Worst case o1-preview to implement, although I think its natural coding style is slightly better than Claude's. The speed difference isn't worth it.
The intersection of problems I have where both have trouble is pretty small. If this closes the gap even more, that's great. That said, I'm curious to try this out -- the ways in which o1-preview fails are a bit different than prior gpt-line LLMs, and I'm curious how it will feel on the ground.
Okay, tried it out. Early indications - it feels a bit more concise, thank god, certainly more concise than 4o -- it's s l o w. Getting over 1m times to parse codebases. There's some sort of caching going on though, follow up queries are a bit faster (30-50s). I note that this is still superhuman speeds, but it's not writing at the speed Groqchat can output Llama 3.1 8b, that is for sure.
Code looks really clean. I'm not instantly canceling my subscription.
When you say "parse codebases" is this uploading a couple thousand lines in a few different files? Or pasting in 75 lines into the chat box? Or something else?
$ find web -type f \( -name '.go' -o -name '.tsx' \) | tar -cf code.tar -T; cat code.tar | pbcopy
Then I paste it in and say "can you spot any bugs in the API usage? Write out a list of tasks for a senior engineer to get the codebase in basically perfect shape," or something along those lines.
Alternately: "write a go module to support X feature, and implement the react typescript UI side as well. Use the existing styles in the tsx files you find; follow these coding guidelines, etc. etc."
I pay for both GPT and Claude and use them both extensively. Claude is my go-to for technical questions, GPT (4o) for simple questions, internet searches and validation of Claude answers. GPT o1-preview is great for more complex solutions and work on larger projects with multiple steps leading to finish. There’s really nothing like it that Anthropic provides.
But $200/mo is way above what I’m willing to pay.
I have several local models I hit up first (Mixtral, Llama), if I don’t like the results then I’ll give same prompt to Claude and GPT.
Overall though it’s really just for reference and/or telling me about some standard library function I didn’t know of.
Somewhat counterintuitively I spend way more time reading language documentation than I used to, as the LLM is mainly useful in pointing me to language features.
After a few very bad experiences I never let LLM write more than a couple lines of boilerplate for me, but as a well-read assistant they are useful.
But none of them are sufficient alone, you do need a “team” of them - which is why I also don’t see the value is spending this much on one model. I’d spend that much on a system that polled 5 models concurrently and came up with a summary of sorts.
People keep talking about using LLMs for writing code, and they might be useful for that, but I've found them much more useful for explaining human-written code than anything else, especially in languages/frameworks outside my core competency.
E.g. "why does this (random code in a framework I haven't used much) code cause this error?"
About 50% of the time I get a helpful response straight away that saves me trawling through Stack Overflow and random blog posts. About 25% of the time the response is at least partially wrong, but it still helps me get on the right track.
25% of the time the LLM has no idea and won't admit it so I end up wasting a small amount of time going round in circles, but overall it's a significant productivity boost when I'm working on unfamiliar code.
Right on, I like to use local models - even though I also use OpenAI, Anthropic, and Google Gemini.
I often use one or two shot examples in prompts, but with small local models it is also fairly simple to do fine tuning - if you have fine tuning examples, and if you are a developer so you get the training data in the correct format, and the correct format changes for different models that you are fine tuning.
> But none of them are sufficient alone, you do need a “team” of them
Given the sensitivity to parameters and prompts the models have, your "team" can just as easily be querying the same LLM multiple times with different system prompts.
I haven't used ChatGPT in a few weeks now. I still maintain subscriptions to both ChatGPT and Claude, but I'm very close to dropping ChatGPT entirely. The only useful thing it provides over Claude is a decent mobile voice mode and web search.
If you don't want to necessarily have to pick between one or the other, there are services like this one that let you basically access all the major LLMs and only pay per use: https://nano-gpt.com/
I've used TypingMind and it's pretty great, I like the idea of just plugging in a couple API keys and paying a fraction, but I really wish there was some overlap.
If a random query via the API costs a fifth of a cent why can't I can't 10 free API calls w/ my $20/mo premium subscription?
I'm in the same boat — I maintain subscriptions to both.
The main thing I like OpenAI for is that when I'm on a long drive, I like to have conversations with OpenAI's voice mode.
If Claude had a voice mode, I could see dropping OpenAI entirely, but for now it feels like the subscriptions to both is a near-negligible cost relative to the benefits I get from staying near the front of the AI wave.
I've heard so much about Claude and decided to give it a try and it has been rather a major disappointment. I ended up using chatgpt as an assistant for claude's code writing because it just couldn't get things right. Had to cancel my subscription, no idea why people still promote it everywhere like it is 100x times better than chatgpt.
I've heard this a lot and so I switched to Claude for a month and was super disappointed. What are you mainly using ChatGPT for?
Personally, I found Claude marginally better for coding, but far, far worse for just general purpose questions (e.g. I'm a new home owner and I need to winterize my house before our weather drops below freezing. What are some steps I should take or things I should look into?)
It's ironic because I never want to ask an LLM for something like your example general purpose question, where I can't just cheaply and directly test the correctness of the answer
But we're hurtling towards all the internet's answers to general purpose questions being SEO spam that was generated by an LLM anyways.
Since OpenAI probably isn't hiring as many HVAC technicians to answer queries as they are programmers, it feels like we're headed towards a death spiral where either having the LLM do actual research from non-SEO affected primary sources, or finding a human who's done that research will be the only options for generic knowledge questions that are off the beaten path
-
Actually to test my hypothesis I just tried this with ChatGPT with internet access.
The list of winterization tips cited an article that felt pretty "delvey". I search the author's name and their LinkedIn profile is about how they professionally write marketing content (nothing about HVAC), one of their accomplishments is Generative AI, and their like feed is full of AI mentions for writing content.
So ChatGPT is already at a place where when it searches for "citations", it's just spitting back out its own uncited answers above answers by actual experts (since the expert sources aren't as SEO-driven)
> I can't just cheaply and directly test the correctness of the answer
I feel that, but I think for me the key is knowing that LLMs can be wrong and I should treat the answer as a starting point and not an actual expert. I find it really helpful for topics where I don't even know where to start because, like you said, most search engines are utter trash now.
For things like that, I find ChatGPT to be a good diving off point. For example, this is what I got when I asked:
```
Preparing your townhouse for winter involves addressing common issues associated with the region's wet and cool climate. Here's a concise checklist to help you get started:
1. Exterior Maintenance
Roof Inspection: Check for damaged or missing shingles to prevent leaks during heavy rains.
Gutter Cleaning: Remove leaves and debris to ensure proper drainage and prevent water damage.
Downspouts: Ensure they direct water away from the foundation to prevent pooling and potential leaks.
Siding and Trim: Inspect for cracks or gaps and seal them to prevent moisture intrusion.
2. Windows and Doors
Weatherstripping: Install or replace to seal gaps and prevent drafts, improving energy efficiency.
Caulking: Apply around window and door frames to block moisture and cold air.
3. Heating System
Furnace Inspection: Have a professional service your furnace to ensure it's operating efficiently.
Filter Replacement: Change furnace filters to maintain good air quality and system performance.
4. Plumbing
Outdoor Faucets: Disconnect hoses and insulate faucets to prevent freezing.
Pipe Insulation: Insulate exposed pipes, especially in unheated areas, to prevent freezing and bursting.
5. Landscaping
Tree Trimming: Prune branches that could break under snow or ice and damage your property.
Drainage: Ensure the yard slopes away from the foundation to prevent water accumulation.
6. Safety Checks
Smoke and Carbon Monoxide Detectors: Test and replace batteries to ensure functionality.
Fireplace and Chimney: If applicable, have them inspected and cleaned to prevent fire hazards.
By addressing these areas, you can help protect your home from common winter-related issues in Seattle's climate.
```
Once I dove into the links ChatGPT provided I found the detail I needed and things I needed to investigate more, but it saved 30 minutes of pulling together a starting list from the top 5-10 articles on Google.
Claude Sonnet 3.5 has outperformed o1 in most tasks based on my own anecdotal assessment. So much so that I'm debating canceling my ChatGPT subscription. I just literally do not use it anymore, despite being a heavy user for a long time in the past
Is a "reasoning" model really different? Or is it just clever prompting (and feeding previous outputs) for an existing model? Possibly with some RLHF reasoning examples?
OpenAI doesn't have a large enough database of reasoning texts to train a foundational LLM off it? I thought such a db simply does not exist as humans don't really write enough texts like this.
It's trained via reinforcement learning on essentially infinite synthetic reasoning data. You can generate infinite reasoning data because there are infinite math and coding problems that can be created with machine-checkable solutions, and machines can make infinite different attempts at reasoning their way to the answer. Similar to how models trained to learn chess by self-play have essentially unlimited training data.
We don't know the specifics of GPT-o1 to judge, but we can look at open weights model for an example. Qwen-32B is a base model, QwQ-32B is a "reasoning" variant. You're broadly correct that the magic, such as it is, is in training the model into a long-winded CoT, but the improvements from it are massive. QwQ-32B beats larger 70B models in most tasks, and in some cases it beats Claude.
I just tried QwQ 32B, i didn't know about it. I used it to generate, some code GPT generated 2 days ago perfect code without even sweating.
QwQ generated 10 pages of it's reasoning steps, and the code is probably not correct. [1] includes both answers from QwQ and GPT.
Breaking down it's reasoning steps to such an excruciating detailed prose is certainly not user friendly, but it is intriguing. I wonder what an ideal use case for it would be.
To my understanding, Anthropic realizes that they can’t compete in name recognition yet, so they have to overdeliver in terms of quality to win the war. It’s hard to beat the incumbent, especially when “chatgpt’ing” is basically a well understood verb.
They don't have a model that does o1-style "thought tokens" or is specialized for math, but Sonnet 3.6 is really strong in other ways. I'm guessing they will have an o1-style model within six months if there's demand
Same. Honestly if they released a $200 a month plan I’d probably bite, but OpenAI hasn’t earned that level of confidence from me yet. They have some catching up to do.
The main difficulty when pricing a monthly subscription for "unlimited" usage of a product is the 1% of power users who use have extreme use of the product that can kill any profit margins for the product as a whole.
Pricing ChatGPT Pro at $200/mo filters it to only power users/enterprise, and given the cost of the GPT-o1 API, it wouldn't surprise me if those power users burn through $200 worth of compute very, very quickly.
They are ready for this, there is a policy against automation, sharing or reselling access; it looks like there are some unspecified quotas as well:
> We have guardrails in place to help prevent misuse and are always working to improve our systems. This may occasionally involve a temporary restriction on your usage. We will inform you when this happens, and if you think this might be a mistake, please don’t hesitate to reach out to our support team at help.openai.com using the widget at the bottom-right of this page. If policy-violating behavior is not found, your access will be restored.
Is there any evidence to suggest this is true? IIRC there was leaked information that OpenAI's revenue was significantly higher than their compute spending, but it wasn't broken down between API and subscriptions so maybe that's just due to people who subscribe and then use it a few times a month.
> OpenAI's revenue was significantly higher than their compute spending
I find this difficult believe, although I don't doubt leaks could have implied it. The challenge is that "the cost of compute" can vary greatly based on how it's accounted for (things like amortization, revenue recognition, capex vs opex, IP attribution, leasing, etc). Sort of like how Hollywood studio accounting can show a movie as profitable or unprofitable, depending on how "profit" is defined and how expenses are treated.
Given how much all those details can impact the outcome, to be credible I'd need a lot more specifics than a typical leak includes.
Is compute that expensive? An H100 rents at about $2.50/hour, it's 80 hours of pure compute. Assuming 720 hours a month, 1/9 duty cycle around the clock, or 1/3 if we assume 8-hour work day. It's really intense, constant use. And I bet OpenAI spend less on operating their infra than the rate at which cloud providers rent it out.
You need enough RAM to store the model and the KV-cache depending on context size. Assuming the model has a trillion parameters (there are only rumours how many there actually are) and uses 8 bit per parameter, 16 H100 might be sufficient.
A single H100 has 80GB of memory, meaning that at FP16 you could roughly fit a 40B parameter model on it, or at FP4 quantisation you could fit a 160B parameter model on it. We don't know (I don't think) what quantisation OpenAI use, or how many parameters o1 is, but most likely...
...they probably quantise a bit, but not loads, as they don't want to sacrifice performance. FP8 seems like a possible middle ground. o1 is just a bunch of GPT-4o in a trenchcoat strung together with some advanced prompting. GPT-4o is theorised to be 200B parameters. If you wanted to run 5 parallel generation tasks at peak during the o1 inference process, that's 5x 200B, at FP8, or about 12 H100s. 12 H100s takes about one full rack of kit to run.
I was testing out a chat app that supported images. Long conversations with multiple images in the conversation can be like .10cents per message after a certain point. It sure does add up quickly
There are many use cases for which the price can go even higher. I look at recent interactions with people that were working at an interview mill: Multiple people in a boiler room interviewing for companies all day long, with a computer set up so that our audio was being piped to o1. They had a reasonable prompt to remove many chatbot-ism, and make it provide answers that seem people-like: We were 100% interviewing the o1 model. The operator said basically nothing, in both technical and behavioral interviews.
A company making money off of this kind of scheme would be happy to pay $200 a seat for an unlimited license. And I would not be surprised if there were many other very profitable use cases that make $200 per month seem like a bargain.
So, wait a minute, when interviewing candidates, you're making them invest their valuable time talking to an AI interviewer, and not even disclosing to them that they aren't even talking to a real human? That seems highly unethical to me, yet not even slightly surprising. My question is, what variables are being optimized for here? It's certainly not about efficiently
matching people with jobs, it seems to be more about increasing the number of interviews, which I'm sure benefits the people who get rewarded for the number of interviews, but seems like entirely the wrong metric.
Scams and other antisocial use cases are basically the only ones for which the damn things are actually the kind of productivity rocket-fuel people want them to be, so far.
We better hope that changes sharply, or these things will be a net-negative development.
Right? To me it's eerily similar to how cryptocurrency was sold as a general replacement for all money uses, but turned out to be mainly useful for societally negative things like scams and money laundering.
It sounds like a setup where applicants hire some third-party company to perhaps "represent the client" in the interview and that company hired a bunch of people to be the interviewee on their clients behalf. Presumably also neither the company nor the applicant disclose this arrangement to the hiring manager.
If any company wants me to be interviewed by AI to represent the client, I'll consider it ethical to let an AI represent me. Then AIs can interview AIs, maybe that'll get me the job. I have strong flashbacks to the movie "Surrogates" for some reason.
Decades ago in Santa Cruz county California, I had to have a house bagged for termites for the pending sale. Turned out there was one contractor licensed to do the poison gas work, and all the pest service companies simply subcontracted to him. So no matter what pest service you chose, you got the same outfit doing the actual work.
I used to work for a manufacturing company that did this. They offered a standard, premium, and "House Special Product". House special was 2x premium but the same product. They didn't even pretend it wasn't, they just said it was recommended and people bought it.
I had this happen once at a car wash. The first time I went I paid for a $25 premium package with all the bells and whistles. They seemed to do a good job. The next time I went for the basic $10 one. Exact same thing.
Yesterday, I spent 4.5hrs crafting a very complex Google Sheets formula—think Lambda, Map, Let, etc., for 82 lines. If I knew it would take that long, I would have just done it via AppScript. But it was 50% kinda working, so I kept giving the model the output, and it provided updated formulas back and forth for 4.5hrs. Say my time is $100/hr - that’s $450. So even if the new ChatGPT Pro mode isn’t any smarter but is 50% faster, that’s $225 saved just in time alone. It would probably get that formula right in 10min with a few back-and-forth messages, instead of 4.5hrs. Plus, I used about $62 worth of API credits in their not-so-great Playground. I see similar situations of extreme ROI every few days, let alone all the other uses. I’d pay $500/mo, but beyond that, I’d probably just stick with Playground & API.
> so I kept giving the model the output, and it provided updated formulas back and forth for 4.5hrs
I read this as: "I have already ceded my expertise to an LLM, so I am happy that it is getting faster because now I can pay more money to be even more stuck using an LLM"
Maybe the alternative to going back and forth with an AI for 4.5 hours is working smarter and using tools you're an expert in. Or building expertise in the tool you are using. Or, if you're not an expert or can't become an expert in these tools, then it's hard to claim your time is worth $100/hr for this task.
I agree going back and forth with an AI for 4.5 hours is usually a sign something has gone wrong somewhere, but this is incredibly narrow thinking. Being an open-ended problem solver is the most valuable skill you can have. AI is a huge force multiplier for this. Instead of needing to tap a bunch of experts to help with all the sub-problems you encounter along the way, you can just do it yourself with AI assistance.
That is to say, past a certain salary band people are rarely paid for being hyper-proficient with tools. They are paid to resolve ambuguity and identify the correct problems to solve. If the correct problem needs a tool that I'm unfamiliar with, using AI to just get it done is in many cases preferable to locating an expert, getting their time, etc.
If somebody claims that something can be done with LLM in 10 minutes which takes 4.5 hours for them, then they are definitely not experts. They probably have some surface knowledge, but that’s all. There is a reason why the better LLM demos are related to learn something new, like a new programming language. So far, all of the other kind of demos which I saw (e.g. generating new endpoints based on older ones) were clearly slower than experts, and they were slower to use for me in my respective field.
For no true Scotsman, you need to throw out a counter example by using a misrepresented or wrong definition, or just simply using a definition wrongly. But in any case I need a counter example for that specific fallacy. I didn’t have, and I still don’t have.
I understand that some people maybe think themselves experts, and they could achieve similar reduction (not in the cases which I said that it’s clearly possible), but then show me, because I still haven’t seen a single one. The ones which were publicly showed were not quicker than average seniors, and definitely worse than the better ones. Even in larger scale in my company, we haven’t seen any performance improvement in any single metric regarding coding after we introduced it more than half years ago.
Here's your counterexample: “Copilot has dramatically accelerated my coding. It’s hard to imagine going back to ‘manual coding,’” Karpathy said. “Still learning to use it, but it already writes ~80% of my code, ~80% accuracy. I don’t even really code, I prompt & edit.” -- https://siliconangle.com/2023/05/26/as-generative-ai-acceler...
A more convenient manual that frequently spouts falsehoods, sure.
My favorite part is when it includes parameters in its output that are not and have never been a part of the API I'm trying to get it to build against.
My favorite part is when it includes parameters in its output that are not and have never been a part of the API I'm trying to get it to build against.
The thing is, when it hallucinates API functions and parameters, they aren't random garbage. Usually, those functions and parameters should have been there.
More than that, one of the standard practices in development is writing code with imaginary APIs that are convenient at the point of use, and then reconciling the ideal with the real - which often does involve adding the imaginary missing functions or parameters to the real API.
Long excel formulas are really just bad "one liners". You should be splitting your operation into multiple cells or finding a more elegant solution. This is especially true in excel where your debug tools are quite limited!
Expect more of this as they scramble to course-correct from losing billions every year, to hitting their 2029 target for profitability. That money's gotta come from somewhere.
> Price hikes for the premium ChatGPT have long been rumored. By 2029, OpenAI expects it’ll charge $44 per month for ChatGPT Plus, according to reporting by The New York Times.
I suspect a big part of why Sora still isn't available is because they couldn't afford to offer it on their existing plans, maybe it'll be exclusive to this new $200 tier.
Runway is $35 a month to generate 10 second clips and you really get very few generations for that. $95 a month for unlimited 10 second clips.
I love art and experimental film. I really was excited for Sora but it will need what feels like unlimited generation to explore what it can do . That is going to cost an arm and a leg for the compute.
Something about video especially seems like it will need to be ran locally to really work. Pay a monthly fee for the model that can run as much as you want with your own compute.
I give o1 a URL and I ask it to comment on how well the corresponding web page markets a service to an audience I define in clear detail.
o1 generates a couple of pages of comments before admitting it didn’t access the web page and entirely based its analysis on the definition of the audience.
If one makes $150 an hour and it saves them 1.25 hours a month, then they break even. To me, it's just a non-deterministic calculator for words.
If it getting things wrong, then don't use it for those things. If you can't find things that it gets right, then it's not useful to you. That doesn't mean those cases don't exist.
I don't think this math depends on where that time is saved.
If I do all my work in 10 hours, I've earned $1500. If I do it all in 8 hours, then spend 2 hours on another project, I've earned $1500.
I can't bill the hours "saved" by ChatGPT.
Now, if it saves me non-billing time, then it matters. If I used to spend 2 hours doing a task that ChatGPT lets me finish in 15 minutes, now I can use the rest of that time to bill. And that only matters if I actually bill my hours. If I'm salaried or hourly, ChatGPT is only a cost.
And that's how the time/money calculation is done. The idea is that you should be doing the task that maximizes your dollar per hour output. I should pay a plumber, because doing my own plumbing would take too much of my time and would therefore cost more than a plumber in the end. So I should buy/use ChatGPT only if not using it would prevent me from maximizing my dollar per hour. At a salaried job, every hour is the same in terms of dollars.
My firm's advertised billing rate for my time is $175/hour as a Sr Software Engineer. I take home ~$80/hour, accounting for benefits and time off. If I freelanced I could presumably charge my firm's rate, or even more.
This is in a mid-COL city in the US, not a coastal tier 1 city with prime software talent that could charge even more.
Ironically, the freelance consulting world is largely on fire due to the lowered barrier of entry and flood of new consultants using AI to perform at higher levels, driving prices down simply through increased supply.
I wouldn't be surprised if AI was also eating consultants from the demand side as well, enabling would-be employers to do a higher % of tasks themselves that they would have previously needed to hire for.
That's what they are billed at, what they take home from that is probably much lower. At my org we bill folks out for ~$150/hr and their take home is ~$80/hr
On the one hand, there's the moral argument: we need janitors and plumbers and warehouse workers and retail workers and nurses and teachers and truck drivers for society to function. Why should their time be valued less than anyone elses?
On the other hand there's the economic argument: the supply of people who can stock shelves is greater than the supply of people who can "create value" at a tech company, so the latter deserve more pay.
Depending on how you look at the world, high salaries can seem insane.
I don’t even remotely understand what you’re saying is wrong. Median salaries are significantly higher in the US compared to any other region. Nominal and PPP adjusted AND accounting for taxes/social benefits. This is bad?
Those jobs you referenced do not have the same requirements nor the same wages…seems like your just clumping all of those together as “lower class” so you can be champion of the downtrodden
I do wonder what effect this will have on furthering the divide between the "rich West" and the rest of the world.
If everyone in the West has powerful AI and Agents to automate everything. Simply because we can afford it, but the rest of the world doesn't have access to it.
Anecdotally, as an educator, I am already seeing a digital divide occurring, with regard to accessing AI. This is not even at a premium/pro subscription level, but simply at a 'who has access to a device at home or work' level, and who is keeping up with the emerging tech.
I speak to kids that use LLMs all the time to assist them with their school work, and others who simply have no knowledge that this tech exists.
What are some productive ways students are using LLMs for aiding learning? Obviously there is the “write this paper for me” but that’s not productive. Are students genuinely doing stuff like “2 + x = 4, help me understand how to solve for x?”
I challenge what I read in textbooks and hear from lecturers by asking for contrary takes.
For example, I read a philosopher saying "truth is a relation between thought and reality". Asking ChatGPT to knock it revealed that statement is an expression of the "correspondence theory" of truth, but that there is also the "coherence theory" of truth that is different, and that there is a laundry list of other takes too.
My son doesn't use it but I use to help him with his homework. For example, I can take a photograph of his math homework and get the LLM to mark the work, tell me what he got wrong, and make suggestions on how to correct it.
Absolutely. My son got a 6th grade AI “ban” lifted by showing how they could use it productively.
Basically they had to adapt a novel to a comic book form — by using AI to generate pencil drawings, they achieved the goal of the assignment (demonstrating understanding of the story) without having the computer just do their homework.
Huh the first prompt could have been "how would you adapt this novel to comic book form? Give me the breakdown of what pencil drawings to generate and why"
At the time, the tool available was Google Duet AI, which didn’t expose that capability.
The point is, AI is here, and it can be a net positive if schools can use it like a calculator vs a black market. It’s a private school with access to some alumni money for development work - they used this to justify investing in designing assignments that make AI a complement to learning.
I recently saw someone revise for a test by asking chatgpt to create practice questions for them on the topics they were revising. I know other people who use it to practice chatting in a foreign language they are trying to learn.
The anology I would use is extended phenotype evolution in digital space as Richard Dawkins would say. Just as crabs in oceans use shells to protect themselves.
Even if its not making you smarter, AI is definitely making you more productive. That essentially means you get to outproduce poorer people, if not out-intellectualize them
Don't you worry; the "rich West" will have plenty of disenfranchised people out of work because of this sort of thing.
Now, whether the labor provided by the AI will be as high-quality as that provided by a human when placed in an actual business environment will be up in the air. Probably not, but adoption will be pushed by the sunk cost fallacy.
I’m watching some of this happening first and second hand, and have seen a lot of evidence of companies spending a ton of money on these, spinning up departments, buying companies, pivoting their entire company’s strategy to AI, et c, and zero of its meaningfully replacing employees. It takes very skilled people to use LLMs well, and the companies trying to turn 5 positions into 2 aren’t paying enough to reliably get and keep two people who are good at it.
I’ve seen it be a minor productivity boost, and not much more.
I mean, yes, that is in practice what I’m seeing so far. A lot of spending, and if they’re lucky productivity doesn’t drop. Best case I’ve seen so far is that it’s a useful tool that gives a small boost, but even for that a lot of folks are so bad at using them that it’s not helping.
The situation now is kinda like back when it was possible to be “good at Google” and lots of people, including in tech, weren’t. It’s possible to be good at LLMs, and not a lot of people are.
Yes. The people who can use these tools to dramatically increase their capabilities and output without a significant drop in quality were already great engineers for which there was more demand than supply. That isn't going to change soon.
Ditto for other use cases, like writer and editor. There are a ton of people doing that work whom I don’t think are ever going to figure out how to use LLMs well. Like, 90% of them. And LLMs are nowhere near making the rest so much better that they can make up for that.
They’re ok for Tom the Section Manager to hack together a department newsletter nobody reads, though, even if Tom is bad at using LLMs. They’re decent at things that don’t need to be any good because they didn’t need to exist in the first place, lol.
I disagree. By far, most of the code is created by perpetually replaced fresh juniors churning out garbage. Similarly, most of the writing is low-quality marketing copy churned out by low-paid people who may or may not have "marketing" in their job title.
Nah, if the last 10-20 years demonstrated something, it's that nothing needs to be any good, because a shitty simulacrum achieves almost the same effect but costs much less time and money to produce.
(Ironically, SOTA LLMs are already way better at writing than typical person writing stuff for money.)
> (Ironically, SOTA LLMs are already way better at writing than typical person writing stuff for money.)
I’m aware of multiple companies that would love to know about these, because they’re currently flailing around trying to replace writers with editors + LLMs and it’s not going great. The closest to success are the ones that are only aiming to turn out stuff one step better than outright book-spam, and even they aren’t quite where they want to be, hardly a productivity bump at all from the LLM use and increased demand on their few talented humans.
Yeah, but it’s a bit trickier with them, given how they still operate in US and listed in NYSE. Also if they keep releasing open source code, people will still just use it… basically the Meta way of adoption into their AI ecosystem.
If $200 a month is the price, most of the West will be left behind also. If that happens we will have much bigger problems of a revolution sort on our hands.
I think the tech-elite would espouse "raising the ceiling" vs "raising the floor" models to prioritize progress. Each has it's own problems. The reality is that the dienfranchised don't really have a voice. The impact of not involving them with access is not well understood as much as the impact of prioritizing access to those who can afford it is.
We don't have a post-cold war era response akin to the kind of US led investment in a global pact to provide protection, security, and access to innovation founded in the United States. We really need to prioritize a model akin to the Bretton Woods Accord
If the models are open, the rest of the world will run them locally.
If the models are closed, the West will become a digital serfdom to anointed AI corporations, which will be able to gouge prices, inject ads, and influence politics with ease.
tbh a lot of the rest of the world already has the ability to get tasks they don't want to do done for <$200 per month in the form of low wage humans. Some of their middle classes might be scratching their heads wondering why we've delegating creativity and communication to allow more time to do laundry rather than delegating laundry to allow more time for creativity and communication...
I actually suspect the opposite. If you get access to or steal a large LLM you can potentially massively leverage the talent pool you have as a small country.
Has it really made that much of a difference in the first place? I have a feeling that we'll look back in 10 years and not even notice the "AI revolution" on any charts of productivity, creating a productivity paradox 3.0.
I can imagine the headlines now: "AI promised unlimited productivity, 10 years later, we're still waiting for the rapture"
Kai-Fu Lee's AI Superpowers is more relevant than ever.
The rich west will be in the lead for awhile and then get tiktok-ed.
The lead is just not really worth that much in the long run.
There is probably an advantage gained at some point in all this of being a developing country too that doesn't need to bother automating all these middle management and bullshit jobs they don't have.
I know a guy who owned a tropical resort on a island where competiton was sprouting up all around him. He was losing money trying to keep up with the quality offered by his neighbors. His solution was to charge a lot more for an experience that was really no better, and often worse, than the resorts next door. This didn't work.
After a few hours of $200 Pro usage, it's completely worth it. Having no limit on o1 usage is a game changer, where I felt so restricted before, the amount of intelligence at the palm of my hand UNLIMITED feels a bit scary.
I was using aider last night and ran up a $10 bill within two hours using o1 as the architect and Sonnet as the editor. It’s really easy to blow through $200 a month and o1-pro isn’t available in the API as far as I can tell.
I generally find o1, or the previous o1-preview to perform better than Claude 3.5 Sonnet in complex reasonings, new Sonnet is more on-par with o1-mini in my experience.
Creating somewhat complex python scripts at work to automate some processes which incorporate like 3-4 APIs, and next I'll be replacing our excise tax processing (which costs us like $500/month) since we already have all the data.
Personal use I'll be using it to upgrade all my website code. I literally took a screenshot of Apple.com and combined it with existing code from my website and told o1 pro to combine the two... the results were really good, especially for one shot... But again, I have unlimited fast usage so I can just keep tweaking and tweaking.
I also have this history idea I've been wanting to do for a while, might see if the models are advanced enough yet.
All this with an understanding on how programming works, but not being able to code.
Interesting, thanks for the details. I haven't played around with o1 enough yet. The kinds of tasks I had it do seemed to be performed just as well by 4o. I'm sure I just wasn't throwing enough at it.
Any AI product sold for a price that's affordable on a third-world salary is being heavily subsidized. These models are insanely expensive to train, guzzle electricity to the point that tech companies are investing in their own power plants to keep them running, and are developed by highly sought-after engineers being paid millions of dollars a year. $20/month was always bound to be an intro offer unless they figured out some way to reduce the cost of running the model by an order of magnitude.
We've been conditioned to pay $10/mo for an endless stream of gloried CRUD apps, but it is very common for specialized software to cost orders of magnitude more. Think Bloomberg Terminal, Cadence, Maya, lots of CAD software (like SOLIDWORKS), higher tiers of Adobe etc. all running in the thousands of dollars per user. And companies happily pay for them because of the value they add. ChatGPT isn't any different.
Question, what stops openai from downgrading existing models so that you're pushed up the subscription tiers to ever more expensive models? I'd imagine they're currently losing a ton of money supplying everyone with decent models with a ton of compute behind them because they want us to become addicted to using them right? The fact that classic free web searching is becoming diluted by low quality AI content will make us rely on these LLMs almost exclusively in a few years or so. Am I seeing this wrong?
It's definitely not impossible. I think the increase competition they've begun to face over the last year is helping as a deterrent. If people notice GPT 4 sucks now and they can get Claude 3.5 Sonnet for the same price, they'll move. If the user doesn't care enough to move, they weren't going to upgrade anyway.
Also depends on the friction to move. I admittedly have not really started using AI in my work, so I don't know. Is it easy to replace GPT with Claude or do I have to reconfigure a bunch of integration and learn new usage?
It depends on the tool you use and I guess the use case too. Some are language model agnostic like aider in the command line, I use sonnit sometimes and then 4o other times. I wonder if or when language models will become highly differentiable. Right now I see them more like a commodity that are relatively interchangeable but that is shifting slightly with other features as they battle to become platforms
They don’t need to downgrade what is already downgraded. In my experience ChatGPT was much more capable a year ago than it is now and have become more dogmatic. Their latest updates have focused on optimizing benchmark scenarios while reducing computation costs.
What's important, and I don't think has ever been revealed by OpenAI, is what the margin is on actual use of the models.
If they're losing money but just because they're investing billions in R&D, while only spending a few hundred million to serve the use that's bringing in $1.6B then it would be a positive story despite the technical loss, just like Amazon's years if aggressive growth at the cost of profits.
But if they're losing money because the server costs needed for the use that brings in $1.6B are $3B then they've got a scaling problem until they either raise prices or lower costs or both.
Part of my justification for spending $20 per month on ChatGPT Plus was that I'd have the best access to the latest models and advanced features. I'll probably roll back to the free plan rather than pay $20/mo for mid tier plan access and support.
In the past, $20 got me the most access to the latest models and tools. When OpenAI rolled out new advanced features, the $20 per month customers always got full / first access. Now the $200 per month customers will have the most access to the latest models and tools, not the (now) mid/low tier customers. That seems like less to me.
They probably didn't pay for access to a certain version of a model, they paid for access to the best available model, whatever that is at any given moment. I'm reasonably sure that is even what OpenAI implied (or outright said) their subscription would get them. Now, it's the same amount of money for access to the second best model, which would feel like a regression.
Did you read the post you're replying to? It's very short. He was paying for top-tier service, and now, despite paying the same amount, has become a second-class customer overnight.
A lot of these tools aren't going to have this kind of value (for me) until they are operating autonomously at some level. For example, "looking at" my inbox and prepping a bundle of proposed responses for items I've been sitting on, drafting an agenda for a meeting scheduled for tomorrow, prepping a draft LOI based on a transcript of a Teams chat and my meeting notes, etc. Forcing me to initiate everything is (uncomfortably) like forcing me to micromanage a junior employee who isn't up to standards: it interrupts the complex work the AI tool cannot do for the lower value work it can.
I'm not saying I expect these tools to be at this level right now. I'm saying that level is where I will start to see these tools as anything more than an expensive and sometimes impressive gimmick. (And, for the record, Copilot's current integration into Office applications doesn't even meet that low bar.)
Tangent. Does any body have good tips for working in a company that is totally bought in on all this stuff, such that the codebase is a complete wreck? I am in a very small team, and I am just a worker, not a manager or anything. It has become increasingly clear that most if not all my coworkers rely on all this stuff so much. Spending hours trying to give benefit of the doubt to huge amounts of inherited code, realizing there is actually no human bottom to it. Things are merged quickly, with very little review, because, it seems, the reviewers can't really have their own opinion about stuff anymore. The idea of "idiomatic" or even "understandable" code seems foreign at this place. I asked why we don't use more structural directives in our angular frontend, and people didn't know what I was talking about!
I don't want the discourse, or tips on better prompts. Just tips for being able to interact with the more heavy AI-heads, to maybe encourage/inspire curiosity and care in the actual code, rather than the magic chatgpt outputs. Or even just to talk about what they did with their PR. Not for some ethical reason, but just to make my/our jobs easier. Because its so hard to maintain this code now, it is like truly a nightmare for me everyday seeing what has been added, what now needs to be fixed. Realizing nobody actually has this stuff in their heads, its all just jira ticket > prompt > mission accomplished!
I am tired of complaining about AI in principle. Whatever, AGI is here, "we too are stochastic parrots", "my productivity has tripled", etc etc. Ok yes, you can have that, I don't care. But can we like actually start doing work now? I just want to do whatever I can, in my limited formal capacity, to steer the company to be just a tiny bit more sustainable and maybe even enjoyable. I just don't know how to like... start talking about the problem I guess, without everyone getting super defensive and doubling down on it. I just miss when I could talk to people about documentation, strategy, rationale..
Found it better to not fight it, you can't really turn back the clock with people who have embraced it or become enamored by it. Part of the issue I've noticed with it is it enables people who couldn't do a thing at all to do the most basic version of a thing, e.g a CEO can now make a button appear on the app and maybe it'll kinda work, they then assume this magic experience to them is applicable across the rest of coding where if you actually know how to code making the button appear isn't the thing that's difficult, it's the harder work that the AI can't really solve.
But really you're never going to convince these people so I'd say if you're really passionate about coding find a workplace with similar minded people, if you really want to stay in this job then embrace it, stop caring if the codebase is good or maintainable and just let the slop flow. It's the path of least resistance and stress, trying to fight it and convince people is a losing and frustrating battle, take your passion for your work and invest it in a project outside work or find a workplace where they appreciate it too.
> Things are merged quickly, with very little review
Sounds like the real problem is lax pre-existing dev practices rather than just LLM usage. If code is getting merged with little review, that is a big red flag right away. But the 'very little' gives some hope - that means there is some review?
So what happens when you see problems with the code and give review feedback and ask why things have been done the way they were done, or suggest alternative better approaches? That should make it clear first if devs actually understand the code they are submitting, and second if they are willing to listen to suggested improvements. And if they blow you off, and the tech leads on the project also don't care, then it sounds like a place you don't want to stick around.
It does not say anything about real use cases. It performs better and "reason" better than o1-preview and o1. But I was expecting some real-life scenarios when it would be useful in a way no other model can do now.
Not a lot of companies when announcing its most expensive product have the bravery to give 10 of them to help cure cancer. Well played OpenAI. Fully expect Apple now to give Peter Attia an iPhone 17 Pro so humanity can live forever.
Is anybody else tempted to sign up for this just for personal use? I've found ChatGPT-o1 Preview to be so helpful — which was absolutely not the case for me with any previous models (or Claude 3.5) — that the concept of having "unlimited" usage of o1 is pretty intriguing.
I recently used it to buy some used enterprise server gear (which I knew nothing about) and it saved me hours of googling and translating foreign-language ads etc. That conversation stretched across maybe 10 days, and it kept the context the whole time. But then I ran out of "preview" tokens and it got dumb and useless again. (Or maybe the conversation exceeded the context window, I am not really sure.)
But that single conversation used up the entire amount of o1 tokens that come with my $20/month ChatGPT Plus account. I am not sure that I have 10x that number of things for it to help me with each month, and where I live $200 is a not-insignificant amount, but... tempting.
People saying this is a "con" have no understanding of the cost of compute. o1 is expensive and gets more expensive the harder the problems are. Some people could use $500 or more via the API per month. So I assume the $200 price point for "unlimited" is set that high mainly because it's too easy for people to use up $100 or $150 worth of resources.
The price feels outrageous, but I think the unsaid truth of this is that they think o1 is good enough to replace employees. For example, if it's really as good at coding as they say, I could see this being a point where some people decide that a team of 5 devs with o1 pro can do the work of 6 or 7 devs without o1 pro.
No, o1 is definitely not good enough to replace employees.
The reason we're launching o1 pro is that we have a small slice of power users who want max usage and max intelligence, and this is just a way to supply that option without making them resort to annoying workarounds like buying 10 accounts and rotating through their rate limits. Really it's just an option for those who'd want it; definitely not trying to push a super expensive subscription onto anyone who wouldn't get value from it.
(I work at OpenAI, but I am not involved in o1 pro)
My 3rd day intern still couldn't do a script o1-preview could do in less than 25 prompts.
OBVIOUSLY a smart OAI employee wouldn't want the public to think they are already replacing high-level humans.
And OBVIOUSLY OAI senior management will want to try to convince AI engineers that might have 2nd-guessings about their work that they aren't developing a replacement for human beings.
Indeed, I'm very concerned about this. Though i think it's a case of tragedy of the commons. Every company individually optimizes for themselves, fucking us over in the aggregate. But I think any executive arguing for this would have to be a pretty big company with an internal pipeline and promoting within to justify it, especially since everyone else will just poach your cultivated talent, and employees aren't loyal anymore (nor should they be, but that's a different discussion).
> The reason we're launching o1 pro is that we have a small slice of power users who want max usage and max intelligence
I'd settle for knowing what level of usage and intelligence I'm getting instead of feeling gaslighted with models seemingly varying in capabilities depending on the time of day, number of days since release and whatnot
Yeah, to be fair, there exist employees (some of whom are managers) who could not be replaced and their absence would improve productivity. So the bar for “can this replace any employees at all?” is potentially so low that, technically, cat’ing from /dev/null can clear it, if you must have a computerized solution.
Companies won’t be able to figure those cases out, though, because if they could they’d already have gotten rid of those folks and replaced them with nothing.
Unfortunately I'm seeing that in my company already. They are forcing AI tools down our throat and execs are vastly misinterpreting stats like '20% of our code is coming from AI'.
What that means is the simple, boilerplate and repetitive stuff is being generated by LLM's, but anything complex or involving more than a singular simple problem LLM's often provide more problems than benefit. Effective dev's are using it to handle simple stuff and Execs are thinking 'the team can be reduced by x', when in reality you can get rid of at best your most junior and least trained people without loosing key abilities.
Watching companies try to sell their AI's and "Agents" as having the ability to reason is also absurd but the non-technical managers and execs are eating it up...
I haven't used ChatGPT enough to judge what a "fair price" is but $200/month seems to be in the ballpark of other "software-tools-for-highly-paid-knowledge-workers" with premium pricing:
- mathematicians: Wolfram Mathematica is $154/mo
- attorneys: WestLaw legal research service is ~$200/month with common options added
- engineers for printed circuit boards : Altium Designer is $355/month
- CAD/CAM designers: Siemens NX base subscription is $615/month
- financial traders : Bloomberg Terminal is ~$2100/month
It will be interesting to see if OpenAI can maintain the $200/month pricing power like the sustainable examples above. The examples in other industries have sustained their premium prices even though there are cheaper less-featured alternatives (sometimes including open source). Indeed, they often increase their prices each year instead of discount them.
One difference from them is that OpenAI has much more intense competition than those older businesses.
This is a really interesting take. I don't think individuals pay for these subscriptions though, it's usually an organizational license.
They also come with extensive support, documentation and people have vast experience using them. They are also integrated into all other tools if the field very well. This makes them very entrenched. I am not sure OpenAI has any of those things. I also don't know what those things would entail for LLMs.
Maybe they need to add modes that are good for certain tasks or integrate with tools that their users most commonly use like email, document processors.
That'll work out nicely when you have 5 people learning nothing and just asking GPT to do everything and then you have a big terrible codebase that GPT can't effectively operate on, and a team that doesn't know how to do anything.
Suppose an employee costs a business, say, $10k/mo; it's 50 subscriptions. Can giving access to the AI to, say, 40 employees improve their performance enough to avoid the need of hiring another employee? This does not sound outlandish to me, at least in certain industries.
That’s the wrong question. The only question is “is this price reflective of 10x performance over the competition?”. The answer is almost definitely no.
If Claude increases their productivity 5% ($17.5k/yr), but CGPT Pro adds 7% ($24.5k), that's an extra $7k in productivity, which more than makes up for the $2400 annual cost. 10x the price, but only 40% better, but still worth it.
In a hypothetical world where this was integrated with code reviews, and minimized developer time (writing valid/useful comments), and minimized bugs by even a small percentage... $200/m is a no-brainer.
That sounds very much like the first-order reaction they'd expect from upper and middle management. Artificially high prices can give the buyer the feeling that they're getting more than they really are, as a consequence of the sunk cost fallacy. You can't rule out that they want to dazzle with this impression even if eval metrics remain effectively the same.
I think the key is to have a strong goal. If the developer knows what they want but can't quite get there, even if it gives the wrong answer you can catch it. The use the resulting code to improve your productivity.
Last week when using jetpack compose(which is a react like framework). A cardinal sin in jetpack compose is to change a State variable in a composable based on non-user/UI action which the composable also mutates. This is easy enough to understand this for toy examples. But for more complex systems one can make this mistake. o1-preview made this mistake last week, and I caught it. On prompting it with the stacktrace it did not immediately catch it and recommended a solution that committed the same error. When I actually gave it the documentation on the issue it caught on and made the variable a userpreference instead. I used the userpreference code in my app instead of coding it by myself. It worked well.
I am not so sure about "replace" atleast at my company we are always short staffed (mostly cause we cant find people fast enough given how long the whole interview cycle takes). It might actually free some people up to do more interviews.
That's a great point actually. Nearly everywhere (us included) is short-staffed (and by that I mean we don't have the bandwidth to build everything we want to build), so perhaps it's not a "reduce the team size" but rather a "reduce the level of deficit."
> It also includes o1 pro mode, a version of o1 that uses more compute to think harder
I like that this kind of verifies that OpenAI can simply adjust how much compute a request gets and still say you’re getting the full power of whatever model they’re running. I wouldn’t be surprised if the amount of compute allocated to “pro mode” is more or less equivalent to what was the standard free allocation given to models before they all got mysteriously dramatically stupider.
It is amazing that we are giving billions of dollars to a group of people that saw Human Centipede and thought “this is how we will cure cancer or make some engineering tasks easier or whatever”
This was part of the premise of o1 though, no? By encouraging the model to output shorter/longer chains of thought, you can scale model performance (and costs) down/up at inference time.
I think from this fine print there will be a quota with o1 pro:
> This plan includes unlimited access to our smartest model, OpenAI o1, as well as to o1-mini, GPT-4o, and Advanced Voice. It also includes o1 pro mode,
I've not found value anywhere remotely close to this lol, but i'd buy it to experiment if they had a solid suite of tooling. Ie an LSP that offered real value, maybe a side-monitor assistant that helped me with the code in my IDE of choice, etc.
At $200/m merely having a great AI (if it even is that) without insanely good tooling is pointless to me.
I don't know about you, but I get to solve algorithmic challenges relevant to my work approximately once per week to once per month. Most of my job consists of gluing together various pieces of tech that are mostly commodity.
For the latter, Claude is great, but for the former, my usage pattern would be poorly served by something that costs $200 and I get to use it maybe a dozen times a month.
For me i feel like most of my time is spent inventing bespoke solutions in existing infra. Less about algorithms and more about making it work in an existing complex code base, which option will have the most negative impact, best impact, performant, etc.
A lot of tradeoffs to evaluate and it can be tiring onboarding people, let alone onboarding an AI.
Maybe it would massively improve my job if the AI could just grab the whole codebase, but we're not there yet. Too many LOC, too much legal BS, etc.
LLMs have significantly increased my productivity, but in this case it'd be about the increase in productivity over the existing Pro plan. I mainly use them for generating or improving code, learning about things, and running estimates.
How much better will this be for my uses? Based on my experience with o1, the answer is "fairly marginal". To me, o1 is worse than the regular model or Claude on most things, but it's best for something non-numeric that requires deep thought or new insights. I'm sure there are some people who got a huge productivity boost from o1. This plan is for those people.
From what I have seen a lot of people who make these claims seem to be people who are working at a level where there is a lot of text being produced that nobody actually cares to read.
That, or I am actually a much better developer and writer than I thought. Because while LLMs certainly have become useful tools to me. They have not doubled my productivity.
$200 per month means it must be good enough at your job to replicate and replace a meaningful fraction of your total work. Valid? For coding, probably. For other purposes I remain on the fence.
The reality is more like: The frothy american economy over the past 20 years has created an unnaturally large number of individuals and organizations with high net worth who don't actually engage in productive output. A product like ChatGPT Pro can exist in this world because it being incapable of consistent, net-positive productive output isn't actually a barrier to being worth $200/month if consistent net-positive productive output isn't also demanded of the individual or organization it is augmenting.
The macroeconomic climate of the next ~ten years is going to hit some people and companies like a truck.
For 2024 prediction is 2.6% US and 4.8% China. I don't see how it's low compared to US.
> high unemployment
5.1% China vs 4.1% USA
> huge debt from infrastructure spending
What do you mean by "huge" and compared to whom? The U.S. is currently running a $2 trillion deficit per year, which is about 6% of GDP, with only a fraction allocated to investments.
> weakening capital markets and real estate
China's economy operates differently from that of the U.S. Currently, China records monthly trade surpluses ranging between $80 billion and $100 billion.
The real estate sector indeed presents challenges, leading the government to inject funds into local government to manage the resulting debt. The effectiveness of these measures remains to be seen.
There is a lot of wishful thinking on HN regarding the rivalry between China and the U.S
The comparison is not between US and China. I don't understand why people keep making that comparison when it's not at all apples-to-apples. It's featured in headlines constantly, but it's honestly a stick measuring contest. For starters, the US is a free economy and the China is a centrally planned one. There's significant chatter about China numbers being massaged to suit the state's narrative, leaving would-be investors extra cautious, whereas in the US data quality and availability is state-of-the-art.
The real questions are: can China deliver on long term expectations for its economy? Do the trends support the argument that it will become a leading developed economy? I don't think they do. If they don't, then is it an issue with the current economy plan that can be solved with a better plan or is it a systemic issue that can't be solved in the near to medium term? These are way more useful questions than "who's going to win the race?"
>> Low growth
> For 2024 prediction is 2.6% US and 4.8% China. I don't see how it's low compared to US.
> What do you mean by "huge" and compared to whom?
To answer in reverse: yes, the US also has a debt problem. That doesn't make the China problem less of an issue. The china debt crisis has been widely reported and is related to the other point about real estate. Those articles will definitely do a better job of explaining the issue than me, so here's just one: https://www.reuters.com/breakingviews/chinas-risky-answer-wa...
> There is a lot of wishful thinking on HN regarding the rivalry between China and the U.S
I'm arguing there's no rivalry. Different countries, different problems, different scales entirely. China is in dire straits and I don't expect it to recover before the crisis gets worse.
> For starters, the US is a free economy and the China is a centrally planned one.
USSR was a centrally planned economy, China is not. Do you mean subsidies (like the IRA and CHIPS Act in the US) for certain industries, which act as guidance to local governments and state banks? Is that what you call "centrally planned"?
> can China deliver on long term expectations for its economy? Do the trends support the argument that it will become a leading developed economy? I don't think they do. If they don't, then is it an issue with the current economy plan that can be solved with a better plan or is it a systemic issue that can't be solved in the near to medium term?
That's your opinion that they can't, and it's your right to have one. There were people 10 years ago saying exactly what you’re saying now. Time showed they were wrong.
> China is growing slower than historically and slower than forecasts which were it at 5%. Look at this chart and tell me if it points a rosy picture or a problematic one:
Oh come on, 4.8% vs. 5%? As for the chart, it's the most incredible growth in the history of mankind. No country has achieved something like this. It's fully natural for it to decline in percentage terms, especially when another major power is implementing legislation to curb that growth, forcing capital outflows, imposing technology embargoes, etc.
> China is in dire straits and I don't expect it to recover before the crisis gets worse.
Time will tell. What I can say is that over the last 20 centuries, in 18 of them, China was the cultural and technological center of the world. So from China’s perspective, what they are doing now is just returning to their natural state. In comparison, the US is only 2 centuries old. Every human organization, whether a company or state, will sooner or later be surpassed by another human creation, there are no exceptions to this rule in all of human history. We have had many empires throughout our history. The Roman Empire was even greater at its peak than the US is now, and there were also the British Empire, the Spanish Empire, etc. Where are they now? Everything is cyclical. All of these empires lasted a few centuries and started to decline after around 200-250 years, much like the US now.
> I'm arguing there's no rivalry.
Come on, there is obvious rivalry. Just listen to US political elites and look at their actions—legislation. It's all about geopolitics and global influence to secure their own interests.
I wouldn’t consider it a major problem, especially with the coming robotic revolution. Even if the population declines by half, that would still leave 700 million people so twice the population of the U.S. According to predictions, the first signs of demographic challenges are expected to appear in about 15–20 years from now. That’s a long time, and a lot can change in two decades. Just compare the world in 2004 to today.
It's a major mistake to underestimate your competition.
That's a long ways out. We're barely past the first innings of the chatbot revolution and it's already struggling to keep going. Robotics are way more complex because physics can be cruel.
Show me what was possible 20 years ago versus what we can do now. I think you have enough imagination to envision what might be possible 20 years from now.
I don't really follow this line of thinking. $200 is nothing—nothing—in the context of the fully loaded cost of an employee for a month (at least, for any sort of employee who would benefit from using an LLM).
I wonder who came up with the $200/month idea, and what was running in their mind.
$200/month = $2400/year
We (consumers/enterprises) are already accustomed to a baseline price. Their model quality will be caught up or exceeded by open-source in ~6 months. So if I find it difficult to justify paying $20/month, why should I even think about $200/month.
Probably the thought process was that we can package all the great things (text, voice, video, images) and experience. The problem is that very few people use everything. Most of the time, the use cases are limited. Someone wants to use for coding, while someone else (artist) wants to use for Sora. OpenAI had an opportunity to introduce a la carte pricing, and then go to bundling. My hypothesis is that they will have very few takers at $200 for the bundle.
Enterprises - did they interview enterprises enough to see if they need user licenses for the bundles? Maybe they will give it at 80% or 90% discount to drive adoption.
Disclosure:
I am on Claude, Grok 2/X Pro, Cursor Personal, and Github Copilot enterprise. My ChatGPT monthly subscription expires in a week, and I will not renew for now and see the user vibes before deciding. I have limited brain power to multitask between so many models, and I will rather give a chance to Gemini Pro for 6 months.
Right. Sonnet 3.5, yesterday. Converted Woodpecker pipeline into Forgejo, then modified to use buildah instead of docker. That's baseline right now.
The tough questions are when one asks about what shadow shapes from a single light can be expected on faces inside a regular voxel grid. That's where it just holds the duck.
With the release of Nova earlier this week thats even cheaper (I havent had a chance to really play with it yet to see how good it is) ive been thinking more about what happens when intelligence gets "too cheap to meter", but this def feels like a step in the other direction!
Still though, if you were able to actually utilize this, is it capable of replacing a part-time or full-time employee? I think thats likely
I know this is no different than any other expert situation in life. $ buys the best lawyers, the best doctors, the best teachers.... But I personally interact with a lawyer less than once every few years. A doctor a couple of times a year. Teachers almost never as an adult.
But now $ buys better (teacher/lawyer/doctor/scientist) type thing that I use daily.
$20 a month is reasonable because a computer can do in an hour what a human can do in a month. Multiplying that by ten would suggest that the world is mostly made of Python and that the solution space for those programs has been "solved." GPT is still not good at writing Clojure and paying 10x more would not solve this problem for me.
I think should hire an economist or ask their superintelligence about the demand. The market is very shallow and nobody has any kind of moat. There is simply not enough math problems out there to apply it to. 200$ price tag really makes no sense to me unless this thing also cooks hot meals. I may be buying it for 100$ though.
For USD, the "$" goes in front of the denomination. So your comments should be $200 price tag, and $100 respectively. Apologies for being pedantic, just trying to make sure the future LLMs will continue to keep it this way.
No, because the terms imply you cannot actually use the output for any business purpose. How does this sail over so many people’s heads ???
If they do everything and you can’t use their stuff to compete with them, you can’t do anything with their stuff.
That, plus the time cost, and the fact they’re vigorously brain raping this shit out of you every time you use the thing, means it’s worth LESS THAN zero dollars
(unless your work does not compete with intelligence, in which case, please tell me what that is)
From https://openai.com/policies/terms-of-use/
"Ownership of content. As between you and OpenAI, and to the extent permitted by applicable law, you (a) retain your ownership rights in Input and (b) own the Output. We hereby assign to you all our right, title, and interest, if any, in and to Output. "
Why couldn't you use it's output for business purposes?
I am using more Claude.ai these days, but the limitations for paying accounts do apply to ChatGPT as well.
I find it a terrible business practice to be completely opaque and vague about limits. Even worse, the limits seem to be dynamic and change all the time.
I understand that there is a lot of usage happening, but most likely it means that the $20 per month is too cheap anyway, if an average user like myself can so easily hit the limits.
I use Claude for work, I really love the projects where I can throw in context and documentation and the fact that it can create artifacts like presentation slides. BUT because I rely on Claude for work, it is unacceptable for me to see occasional warnings coming up that I have reached a given limit.
I would happily pay double or even triple for a non-limited experience (or at least know what limit I get when purchasing a plan). AI providers, please make that happen soon.
> I find it a terrible business practice to be completely opaque and vague about limits. Even worse, the limits seem to be dynamic and change all the time.
Here are some things I've noticed about this, at least in the "free" tier web models since that's all I typically need.
* ChatGPT has never denied a response but I notice the output slows down during increased demand. I'd rather have a good quality response that takes longer than no response. After reaching the limit, the model quality is reduced and there's a message indicating when you can resume using the better model.
* Claude will pop-up messages like "due to unexpected demand..." and will either downgrade to Haiku or reject the request altogether. I've even observed Claude yanking responses back, it will be mid-way through a function and it just disappears and asks to try again later. Like ChatGPT, eventually there's a message about your quota freeing up at a later time.
* Copilot, at least the free tier found on Bing, at least tells you how many responses you can expect in the form of a "1/20" status text. I rarely use Copilot or Bing but it demonstrates it's totally possible to show this kind of status to the user - ChatGPT and Claude just prefer to slow down, drop model size, or reject the request.
It makes sense that the limits are dynamic though. The services likely have a somewhat fixed capacity but demand will ebb and flow, so it makes sense to expand/contact availability on free tiers and perhaps paid tiers as well.
I believe the "1/20" indicator on Copilot was added back when it was unhinged to try to prevent users from getting it to act up, and it has been removed in the latest redesign
If you go through the API (with chatGPT at least), you pay per request and are never limited. I personally hate the feeling of being nickeled-and-dimed, but it might be what you are looking for.
Yeah it's crazy to me you can't just 10x your price to 10x your usage (since you could kind of do this manually by creating more accounts). I would easily pay $200/month for 10x usage - especially now with MCP servers where Claude Desktop + vanilla VS Code is arguably more effective than Cursor/Windsurf.
Personally I'm using the Filesystem server along with the mcp server called wcgw[0] that provides a FileEdit action. I use MacWhisper[1] to dictate. I use `tree` to give Claude a map of the directory I'm interested in editing. I usually opt to run terminal commands myself for better control though wcgw does that too. I keep the repo open in a Cursor/Windsurf window for other edits I need.
But other than that I basically just tell the model what I want to do and it does it, lol. I like the Claude Desktop App interface better than trying to do things in Cursor/Windsurf directly, I like the ability to organize prompts/conversations in terms of projects and easily include context. I also honestly just have a funny feeling that the Claude web app often performs better than the API responses I get from the IDEs.
how is that better than AI Coding tools?
They do more sophisticated things such as creating compressed representations of the code that fit better into the context window. E.g https://aider.chat/docs/repomap.html.
I have never found embeddings to be that helpful, or context beyond 30-50K tokens to be used well by the models. I think I get better results by providing only the context I know for sure is relevant, and explaining why I'm providing it. Perhaps if you have a bunch of boilerplate documentation that you need to pattern-match on it can be helpful, but generally I try to only give the models tasks that can be contextualized by < 15-20 medium code files or pages of documentation.
This announcement left me feeling sad because it made me realize that I'm probably working on simple problems for which the current non-pro models seem to be perfectly sufficient (writing code for basic CRUD apps etc.)
I wish I was working on the type of problems for which the pro model would be necessary.
$200/month seems to be psychological pricing to imply superior quality. In a blind test, most would be hard-pressed to distinguish the results from other LLMs. For those that think $200/month is a good deal, why not $500/mo or $1000/mo?
I'll bite and say I'd evaluate the output at all those price points. $1k/mo is heading into "outsourced employee" territory, and my requirements for quality ratchet up quite a lot somewhere in that price range.
A super/magical LLM could definitely be worth $1k/mo, but only if there isn't another equivalent LLM for $20/mo. I'll need to see some pretty convincing evidence that ChatGPT Pro is doing things that Gemini Advanced can't.
I'd potentially pay $200 for unlimited and better access to Claude Sonnet 3.5v2 but definitely not inferior chatgpt models. You can charge a premium when you have the best and OpenAI doesn't have the best.
I began responding to this announcement with something along the lines of what I could achieve with $200/mo in platform services, training and managing my own agents, and it occurred to me that maybe that's exactly what I ought to do. Has anyone else come to this conclusion? It's not a question of whether the $2400/yr is ridiculous but that maybe if someone can afford to spend that much right now and knows how to achieve the goal (or can figure it out), this is the time to do so.
What happens when people get so addicted to using AI they just can’t stand working without it, and then the pricing is pushed up to absurd levels? Will people shell out $2k a year just to use AI?
It can't get too expensive otherwise it's cheaper to just rent some GPUs and run an open source model yourself. China's already got some open source reasoning models that are competitive with o1 at reasoning on many benchmarks.
I really wish they would drop the Google play requirement on Android. I have Google play installed on my Samsung but it's just not logged in. You don't need to be logged in for a lot of functionality like push notifications.
Every single app works fine that way, except ChatGPT. It opens the play store login page then exits. I have no problems with apps from 2 banks, authenticators etc etc.
It's just so weird that they force me to make an account with one of their biggest competitors in AI. I just don't want to, I don't trust Google with my data. By not logging in they have some but not a lot.
iOS isn't an option either because it's too locked down. I need things like sideloading and full NFC access for things like OpenPGP.
Even the gap between the $200 model and the $20 model is tiny. It’s just designed to position the company based on pricing (it must be useful if they are charging this much for it) rather than reality (the new model cannot operate at 20-30% of a very competent human).
I think this is proof that Open AI have nothing at all and AGI is as far away as fusion and self driving cars on London roads.
Price doesn't make any sense in the context of nothing between $20 and $200 (unless you just use the API directly which for a large subset of people would be very inconvenient). Assuming they didn't change the limit from o1-preview to o1 of 50 a week it's obnoxious to not easily have an option to just get 100 a week for $40 a month or after you hit 50 just pay per request. When I last looked at API pricing for o1-preview I estimated most of my request/responses were around 8 cents. 50 a week is actually more than it sounds as long as you just don't default to o1 for all interactions and use it more strategically. If you pay for $20 a month plan and spent the other $180 on api o1 responses that is likely more than 2000 additional queries. Not sure what subset of people this $200 plan is good value for (60+ o1 queries, or really just all chatGPT queries) every day is an awful lot outside of a scenario where you are using it as an API for some sort of automated task.
As I said before. OpenAI cannot maintain profitability unless they can increase the pricing an order of magnitude. Adding a $200 pro plan is only the first step. Expect they will also have $2k and 20$k per month plans soon and your "normal" $20 plan will curiously be worse and worse every month
They're not adding a $200 plan to solve a profitability challenge.
They have added the plan because they need to show that their most advanced model is ready for the market, but it's insanely expensive to operate. They may even still lose money for every user that sign up for Pro and start using the model.
Not o1-preview, it's way too slow. Sonnet 3.5 via Cursor IDE, yes. In fact, I'm writing very little code these days and mostly prompting the LLM to make changes for me and reviewing the changes.
$200 per month feels like a lot of a consumer subscription service (only thing I can think of in this range are some cable TV packages). Part of me wonders if this price is actually much more in line with actual costs (compared to the non-pro subscription)
Not only is in the the same range as cable TV packages, it's basically a cable TV play where they bundle lots of models/channels of questionable individual utility into one expensive basket allegedly greater than the sum of its parts to justify the exorbitant cost.
This anti-cable-cutting maneuver doesn't bode well for any hopes of future models maintaining same level of improvements (otherwise they'd make GPT 5 and 6 more expensive). Pivoting to AIaaS packages is definitely a pre-emptive strike against commodification, and a harbinger of plateauing model improvements.
The big question is if OpenAI will achieve "general" AI before their investors get fed up. I wonder if they used the success of ChatGPT to imply that they have a path to it. I don't see how else they achieved such a high valuation.
Anyone claiming they're anywhere near something even remotely resembling AGI is simply lying.
What happened to "we're a couple years away from AGI"? Where's the Scaaaaaaryyyyyyy self aware techno god GPT-5? It's all BS to BS investors with. All of the rumored new models that were supposed to be out by now are nowhere to be seen because internally the improvement rate has cratered.
If anything LLMs have delayed AGI by a decade by rerouting massive amounts of funding and attention away from promising areas and into stochastic parrots
You have no idea. There certainly could be a breakthrough tomorrow that sets off AGI. Researchers across the board have been sounding the alarm bells for years now. There’s not much we can do at this point.
My only hope is that when AGI happens I can fire off an ‘I told you so’ comment before it kills us all.
This kind of pricing strategy makes me think we're gonna have a pretty rough time once AGI arrives making any money. (I.e. no 'too cheap to meter' and basically all the rich getting richer.)
I’m a big critic of OpenAI generally, I know a lot about their board members and it’s dim light between them and war criminals.
With that said, I strictly approve of them doing real price discovery on inference costs. Claude is dope when it doesn’t fuck up mid-response, and OpenAI is the gold standard on “you query with the token budget, you get your response”.
I’ve got a lot of respect for the folks who made it stand up to that load: it’s a new thing and it’s solid AF.
I still think we’d be fools to trust these people, but my criticisms are bounded above by acknowledging good shit and this is a good play.
They should have this tier earlier on, like any SaaS offering that had different plans
They focus too much on their frontend multimodal chat product, while also having this complex token pricing model for API users and we cant tell which one they are ever really catering towards with these updates
all while their chat system is buggy with its random disconnections and sessions updates, and produces tokens slowly in comparison to competitors like Claude
to finally come around and say pay us an order of magnitude more than Claude is just completely out of touch and looks desperate in the face of their potential funding woes
Great, so now OpenAI has opened the door to pricing people out of AI access.
The o1-pro model in their charts is only ever so slightly better than the one I can get for $20 a month. To blur the lines of this they add in other features for $200 a month, but make no mistake, their best model is now 10x more expensive for 1% or so better results based on their charts.
What's next? The best models will soon cost $500 a month and only be available to enterprises? Seems they are opening the door to taking away public access to powerful models.
Why not, if people are willing to pay? You can think of them as subsidies for the weaker models. They're determining the price elasticity. And the better models will eventually get cheaper, as competition encroaches.
> What's next? The best models will soon cost $500 a month and only be available to enterprises? Seems they are opening the door to taking away public access to powerful models.
Struggling to reconcile this is cool with what about the insane energy/water costs. Are we supposed to stick our heads in the sand? Hope it will magically go away?
Everyone rants about the price, but if you’re handling large numbers of documents for classification or translations $200/month for unlimited use seems like a bargain.
Does it allow to upload files, text, pdfs to give it a context? Claude‘s project feature allows this, and I can create as many projects as I like, and search for them.
> To highlight the main strength of o1 pro mode (improved reliability), we
> use a stricter evaluation setting: a model is only considered to solve a
> question if it gets the answer right in four out of four attempts ("4/4
> reliability"), not just one.
So, $200/mo. gets you less than 12.5% randomly wrong answers?
I am surprised at the number of people who think this has no market.
If this improves employee productivity by 10%, this would be a great buy for many companies. Personally, I'll buy this in an instant if this measurably improves over Claude in code generation abilities. I've tried o1-preview, and there are only a few cases where it actually does better than Claude - and that too at a huge time penalty.
The problem is user experience. It's still very much a chatbot... To justify that amount, it needs to integrate a lot more with an employee's day-to-day tools such as code editor for SWE, the browser for quickbooks, words/sheets/powerpoint, salesforce, HR tools, and so on.
The problem for openai is Google's cost are always going to be way lower than theirs if they're doing similar things. Google's secret sauce for so many of their products is cheaper compute. Once the models are close, decades of Google's experience integrating and optimizing use of new hardware into their data centers with high utilization aren't going to be overcome by openai for years.
From what I’ve seen, the usefulness of my AIs are proportional to the data I give them access to. The more data, (like health data, location data, bank data, calendar data, emails, social media feeds, browsing history, screen recordings, etc) - the more I can rely on them for.
On the enterprise side, businesses are interested in exploring AI for their huge data sets - but very hesitant to dump all their company IP across all their current systems into a single SaaS that, btw, is also providing AI services to their competitors.
Consumers are also getting uncomfortable with the current level of sharing personal data with SaaS vendors, becoming more aware of the risks of companies like Google and Facebook.
I just don’t see the winner-takes-all market happening for an AI powered 1984 telescreen in 2025.
The vibes I’m picking up from most everybody are:
1) Hardware and AI costs are going to shrink exponentially YoY
2) People do not want to dump their entire life and business into a single SaaS
All signs are pointing to local compute and on-prem seeing a resurgence.
I mean that was always how the route was going to go. Theres no way for them to recoup without either heavily on Saas, enterprise or embedded ads/marketing.
> OpenAI says that it plans to add support for web browsing, file uploads, and more in the months ahead.
It's been extremely frustrating to not have these features on o1 and have limited what I can do with it. I'm presumably in the market who doesn't mind paying $200 / month but without the features they've added to 4o it feels not worth it.
i want to learn to speak another language but now i find myself questioning whether or not it makes any sense in light of the fact that AI already translates so well. its clear that by the time i learn another language real-time translation will be so good and so accessible that my own translations will just be a hindrance to effective communication. i looked very hard for some reason to justify learning other languages because i have always wanted to learn another language. the only good reason to learn another language is to have privacy. you could also say that it would be useful if you were cut off from AI services but that will probably only apply to terrorists or other extreme cases. the only solid justification for learning a language yourself is to have conversations that are not monitored or data-mined. honestly, in the context of the world-to-come, its not worth doing.
$200 a month for this is insane, but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model. In other words, it's a con. I'm a paying Perplexity user, and Perplexity already does this same sort of reasoning. At first it seemed impressive, then I started noticing mistakes in topics I'm an expert in. After awhile, I started realizing that these mistakes are present in almost all topics, if you check the sources and do the reasoning yourself.
LLMs are very good at giving plausible answers, but calling them "intelligent" is a misnomer. They're nothing more than predictive models, very useful for some things, but will ALWAYS be the wrong tool for the job when it comes to verifying truth and reasoning.
This is for people who rely enough on ChatGPT Pro features that it becomes worth it. Whether they pay for it because they're freelance, or their employer does.
Just because an LLM doesn't boost your productivity, doesn't mean it doesn't for people in other lines of work. Whether LLM's help you at your work is extremely domain-dependent.
That's not a problem. OpenAI need to get some cash from its product because the competition is intense from free models. Moreover, since they supposedly used most of the web content and pirated whatever else they could, improvements in training will likely be only incremental.
All the while, after the wow effect passed, more people start to realize the flaw in generative AI. So current hype, like all hype, as a limited shelf life and companies need to cash out now because it could be never.
A con? It's not that $200 is a con, their whole existence is a con.
They're bleeding money and are desperately looking for a business model to survive. It's not going very well. Zitron[1] (among others) has outlined this.
> OpenAI's monthly revenue hit $300 million in August, and the company expects to make $3.7 billion in revenue this year (the company will, as mentioned, lose $5 billion anyway), yet the company says that it expects to make $11.6 billion in 2025 and $100 billion by 2029, a statement so egregious that I am surprised it's not some kind of financial crime to say it out loud. […] At present, OpenAI makes $225 million a month — $2.7 billion a year — by selling premium subscriptions to ChatGPT. To hit a revenue target of $11.6 billion in 2025, OpenAI would need to increase revenue from ChatGPT customers by 310%.[1]
They haven’t raised the price, they have added new models to the existing tier with better performance at the same price.
They have also added a new, even higher performance model which can leverage test time compute to scale performance if you want to pay for that GPU time. This is no different than AWS offering some larger ec2 instance tier with more resources and a higher price tag than existing tiers.
Roughly 10 million ChatGPT users pay the company a $20 monthly fee, according to the documents. OpenAI expects to raise that price by $2 by the end of the year, and will aggressively raise it to $44 over the next five years, the documents said.
We'll have to see if the first bump to $22 this year ends up happening.
You're technically right. New models will likely be incremental upgrades at a hefty premium. But considering the money they're loosing, this pricing likely better reflects their costs.
They're throwing products at the wall to see what sticks. They're trying to rapidly morph from a research company into a product company.
Models are becoming a commodity. It's game theory. Every second place company (eg. Meta) or nation (eg. China) is open sourcing its models to destroy value that might accrete to the competition. China alone has contributed a ton of SOTA and novel foundation models (eg. Hunyuan).
AI may be over hyped and it may have flaws (I think it is both)... but it may also be totally worth $200 / month to many people. My brother is getting way more value than that out of it for instance.
So the question is it worth $200/month and to how many people, not is it over hyped, or if it has flaws. And does that support the level of investment being placed into these tools.
Models are about to become a commodity across the spectrum: LLMs [1], image generators [2], video generators [3], world model generators [4].
The thing that matters is product.
[1] Llama, QwQ, Mistral, ...
[2] Nobody talks about Dall-E anymore. It's Flux, Stable Diffusion, etc.
[3] HunYuan beats Sora, RunwayML, Kling, and Hailuo, and it's open source and compatible with ComfyUI workflows. Other companies are trying to open source their models with no sign of a business model: LTX, Genmo, Rhymes, et al.
[4] The research on world models is expansive and there are lots of open source models and weights in the space.
A better way to express it than a "con" is that it's a price-framing device. It's like listing a watch at an initial value of $2,000 so that people will feel content to buy it at $400.
The line between ‘con’ and ‘genuine value synthesised in the eye of the buyer using nothing but marketing’ is very thin. If people are happy, they are happy.
Few days ago I had issue with IPsec VPN behind NAT. I spend few hours Googling around, tinkering with system, I had some rough understanding of what goes wrong, but not much and I had no idea how to solve this issue.
I made a very exhaustive question to ChatGPT o1-preview, including all information I though is relevant. Something like good forums question. Well, 10 seconds later it spew me a working solution. I was ashamed, because I have 20 years of experience under my belt and this model solved non-trivial task much better than me.
I was ashamed but at the same time that's a superpower. And I'm ready to pay $200 to get solid answers that I just can't get in a reasonable timeframe.
It is really great when it works, but challenge is I've had it sometimes not understanding a detailed programming question and it confidently gives an incorrect answer. Going back and forth a few times ends up clear it really does not know answer, but I end up going in circles. I know LLMs can't really tell you "sorry I don't know this one", but I wish they could.
The exhaustive question makes ChatGPT reconstruct your answer in real-time, while all you need to do is sleep; your brain will construct the answer and deliver it tomorrow morning.
The benefit of getting an answer immediately rather than tomorrow morning is why people are sometimes paid more for on-call rates rather than everyone being 9-5.
(Now I think if of the idiom, when did we switch to 9-6? I've never had a 9-5).
I bet users won't pay for the power, but for a guarantee of access! I always hear about people running out of compute time for ChatGPT. Obvious answer is charge more for a higher quality service.
Imo the con is picking the metric that makes others look artificially bad when it doesn't seem to be all that different (at least on the surface)
> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one
This surely makes the other models post smaller numbers. I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.
As someone who has both repeatedly written that I value the better LLMs as if they were a paid intern (so €$£1000/month at least), and yet who gets so much from the free tier* that I won't bother paying for a subscription:
I've seen quite a few cases where expensive non-functional things that experts demonstrate don't work, keep making money.
My mum was very fond of homeopathic pills and Bach flower tinctures, for example.
* 3.5 was competent enough to write a WebUI for the API so I've got the fancy stuff anyway as PAYG when I want it.
Does Apple charge a premium? Of course. Do Apple products also tend to have better construction, greater reliability, consistent repair support, and hold their resale value better? Yes.
The idea that people are buying Apple because of the Apple premium simply doesn't hold up to any scrutiny. It's demonstrably not a Verblen good.
Now that is a trope when you're talking about Apple. They may use more premium materials that and have a degree of improved construction leveraging those materials - but at the end of the day there are countless numbers of failure prone designs that Apple continued to ship for years even after knowing they existed.
I guess I don't follow the fact that the "Apple Premium" (whether real or otherwise) isn't a factor in a buyer decision. Are you saying Apple is a great lock-in system and that's why people continue to buy from them?
I suspect they're saying that for a lot of us, Apple provides enough value compared to the competition that we buy them despite the premium prices (and, on iOS, the lock-in).
It's very hard to explain to people who haven't dug into macOS that it's a great system for power users, for example, especially because it's not very customizable in terms of aesthetics, and there are always things you can point to about its out-of-the-box experience that seem "worse" than competitors (e.g., window management). And there's no one thing I can really point to and say "that, that's why I stay here"; it's more a collection of little things. The service menu. The customizable global keyboard shortcuts. Automator, AppleScript (in spite of itself), now the Shortcuts app.
And, sure, they tend to push their hardware in some ways, not always wisely. Nobody asked for the world's thinnest, most fragile keyboards, nor did we want them to spend five or six years fiddling with it and going "We think we have it now!" (Narrator: they did not.) But I really do like how solid my M1 MacBook Air feels. I really appreciate having a 2880x1800 resolution display with the P3 color gamut. It's a good machine. Even if I could run macOS well on other hardware, I'd still probably prefer running it on this hardware.
Anyway, this is very off topic. That ChatGPT Pro is pretty damn expensive, isn't it? This little conversation branch started as a comparison between it and the "Apple tax", but even as someone who mildly grudgingly pays the Apple tax every few years, the ChatGPT Pro tax is right off the table.
They only have to be consistently better than the competition, and they are, by far. I always look for reviews before buying anything, and even then I've been nothing but disappointed by the likes of Razer, LG, Samsung, etc.
The lack of repairability is easily Apple's worst quality. They do everything in their power to prevent you from repairing devices by yourself or via 3rd party shops. When you take it to them to repair, they often will charge you more than the cost of a new device.
People buy apple devices for a variety of reasons; some people believe in a false heuristic that Apple devices are good for software engineering. Others are simply teenagers who don't want to be the poor kid in school with an Android. Conspicuous consumption is a large part of Apple's appeal.
Here in Brazil Apple is very much all about showing off how rich you are. Especially since we have some of the most expensive Apple products in the world.
Maybe not as true in the US, but reading about the green bubble debacle, it's also a lot about status.
>Whether LLM's help you at your work is extremely domain-dependent.
I really doubt that, actually. The only thing that LLMs are truly good for is to create plausible-sounding text. Everything else, like generating facts, is outside of its main use case and known to frequently fail.
There was a study recently that made it clear the use of LLMs for coding assistance made people feel more productive but actually made them less productive.
I recently slapped 3 different 3 page sql statements and their obscure errors with no line or context references from Redshift into Claude, it was 3 for 3 on telling me where in my query I was messing up. Saved me probably 5 minutes each time but really saved me from moving to a different task and coming back. So around $100 in value right there. I was impressed by it. I wish the query UI I was using just auto-ran it when I got an error. I should code that up as an extension.
When forecasting for developers and employee cost for a company I double their pay but I'm not going to say what I make and if I did or not. I also like to think that developers should be working on work that is many multiples of leverage over their pay to be effective. But thanks.
It didn't cost me anything, my employer paid for it. Math for my employer is odd because our use of LLMs is also R&D (you can look at my profile to see why). But it was definitely worth $1 in api costs. I can see justifying spending $200/month for devs actively using a tool like this.
I am in a similar same boat. Its way more correct than not for the tasks I give it. For simple queries about, say, CLI tools I dont use that often, or regex formulations, I find it handy as when it gives the answer Its easy to test if its right or not. If it gets it wrong, I work with Claude to get to the right answer.
First of all, that's moving the goalposts to next state over, relative to what I replied to.
Secondly, the "No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance" result you quote came, per article, from a "study from Uplevel", which seems to[0] have been testing for change "among developers utilizing Copilot". That may or may not be surprising, but again it's hardly relevant to discussion about SOTA LLMs - it's like evaluating performance of an excavator by giving 1:10 toy excavators models to children and observing whether they dig holes in the sandbox faster than their shovel-equipped friends.
Best LLMs are too slow and/or expensive to use in Copilot fashion just yet. I'm not sure if it's even a good idea - Copilot-like use breaks flow. Instead, the biggest wins coming from LLMs are from discussing problems, generating blocks of code, refactoring, unstructured to structured data conversion, identifying issues from build or debugger output, etc. All of those uses require qualitatively more "intelligence" than Copilot-style, and LLMs like GPT-4o and Claude 3.5 Sonnet deliver (hell, anything past GPT 3.5 delivered).
Thirdly, I have some doubts about the very metrics used. I'll refrain from assuming the study is plain wrong here until I read it (see [0]), but anecdotally, I can tell you that at my last workplace, you likely wouldn't be able to tell whether or not using LLMs the right way (much less Copilot) helped by looking solely at those metrics - almost all PRs were approved by reviewers with minor or tangential commentary (thanks to culture of testing locally first, and not writing shit code in the first place), but then would spend days waiting to be merged due to shit CI system (overloaded to the point of breakage - apparently all the "developer time is more expensive than hardware" talk ends when it comes to adding compute to CI bots).
--
[0] - Per the article you linked; I'm yet to find and read the actual study itself.
LLMs have become indispensable for many attorneys. I know many other professionals that have been able to offload dozens of hours of work per month to ChatGPT and Claude.
Arguably the same problem is occurs in programming: Anything so formulaic and common that an LLM can regurgitate it with a decent level of reliability... is something that ought to have been folded into method/library already.
Or it already exists in some howto documentation, but nobody wanted to skim the documentation.
As a customer of legal work for 20 years, it is also way (way way) faster and cheaper to draft a contract with Claude (total work ~1 hour, even with complex back-and-forth ; you don't want to try to one-shot it in a single prompt) and then pay a law firm their top dollar-per-hour consulting to review/amend the contract (you can get to the final version in a day).
Versus the old way of asking them to write the contract, where they'll blatantly re-use some boilerplate (sometimes the name of a previous client's company will still be in there) and then take 2 weeks to get back to you with Draft #1, charging 10x as much.
That's interesting. I've never had a law firm be straightforward about the (obvious) fact they'll be using a boilerplate.
I've even found that when lawyers send a document for one of my companies, and I give them a list of things to fix, including e.g. typos, the same typos will be in there if we need a similar document a year later for another company (because, well, nobody updated the boilerplate)
Do you ask about the boilerplate before or after you ask for a quote?
I typically don’t ask for a quote upfront since they are very fair with their business and billing practices.
I could definitely see a large law firm (Orrick, Venable, Cooley, Fenwick) doing what you describe. I’ve worked with 2 firms just listed, and their billing practices were ridiculous.
I’ve had a lot more success (quality and price) working with boutique law firms, where your point of contact is always a partner instead of your account permanently being pawned off to an associate.
Email is in profile if you want an intro to the law firm I use. Great boutique firm based in Bay Area and extremely good price/quality/value.
Yeah the industries LLMs will disrupt the most are the ones who gatekeep busywork. SWE falls into this to some degree but other professions are more guilty than us. They dont replace intelligence they just surface jobs which never really required much intelligence to begin with.
If you do a lot of work in an area that o1 is strong in - $200/month effectively rounds down to $0 - and a single good answer at the right time could justify that entire $200 in a single go.
It's so strange to me that in a forum full of programmers, people don't seem to understand that you set up systems to detect errors before they cause problems. That's why I find ChatGPT so useful for helping me with programming - I can tell if it makes a mistake because... the code doesn't do what I want it to do. I already have testing and linting set up to catch my own mistakes, and those things also catch AI's mistakes.
Thank you! I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is; it's like people want it 100% perfect or nothing. For me if it gets me 80% there in 1/10 the time, and then I do the final 20%, that's still heck of a productivity boost basically for free.
Yep, I’m with you. I’m a solo dev who never went to college… o1 makes far fewer errors than I do! No chance I’d make it past round one of any sort of coding tournament. But I managed to bootstrap a whole saas company doing all the coding myself, which involved setting up a lot of guard rails to catch my own mistakes before they reached production. And now I can consult with a programming intelligence the likes of which I could never afford to hire if it was a person. It’s amazing.
Not sure what you're referring to exactly. But broadly yes it is working for me - the number of new features I get out to users has sped up greatly, and stability of my product has also gone up.
Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.
I’m not saying that LLMs can’t be useful, but I do think it’s a darn shame that we’ve given up on creating tools that deterministically perform a task. We know we make mistakes and take a long time to do things. And so we developed tools to decrease our fallibility to zero, or to allow us to achieve the same output faster. But that technology needs to be reliable; and pushing the envelope of that reliability has been a cornerstone of human innovation since time immemorial. Except here, with the “AI” craze, where we have abandoned that pursuit. As the saying goes, “to err is human”; the 21st-century update will seemingly be, “and it’s okay if technology errs too”. If any other foundational technology had this issue, it would be sitting unused on a shelf.
What if your compiler only generated the right code 99% of the time? Or, if your car only started 9 times out of 10? All of these tools can be useful, but when we are so accepting of a lack of reliability, more things go wrong, and potentially at larger and larger scales and magnitudes. When (if some folks are to believed) AI is writing safety-critical code for an early-warning system, or deciding when to use bombs, or designing and validating drugs, what failure rate is tolerable?
> Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.
This does not follow. By your own assumptions, getting you 80% of the way there in 10% of the time would save you 18% of the overall time, if the first 80% typically takes 20% of the time. 18% time reduction in a given task is still an incredibly massive optimization that's easily worth $200/month for a professional.
For tasks where bullshitting or regurgitating common idioms is key, it works rather well and indeed takes you 80% or even close to 100% of the way there. For tasks that require technical precision and genuine originality, it’s hopeless.
I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is;
People around here feel seriously threatened by ML models. It makes no sense, but then, neither does defending the Luddites, and people around here do that, too.
Of course, but for every thoroughly set up TDD environment, you have a hundred other people just blindly copy pasting LLM output into their code base and trusting the code based on a few quick sanity checks.
>I can tell if it makes a mistake because... the code doesn't do what I want it to do
Sometimes it does what you want it to do, but still creates a bug.
Asked the AI to write some code to get a list of all objects in an S3 bucket. It wrote some code that worked, but it did not address the fact that S3 delivers objects in pages of max 1000 items, so if the bucket contained less than 1000 objects (typical when first starting a project), things worked, but if the bucket contained more than 1000 objects (easy to do on S3 in a short amount of time), then that would be a subtle but important bug.
Someone not already intimately familiar with the inner workings of S3 APIs would not have caught this. It's anyone's guess if it would be caught in a code review, if a code review is even done.
I don't ask the AI to do anything complicated at all, the most I trust it with is writing console.log statements, which it is pretty good at predicting, but still not perfect.
So the AI wrote a bug; but if humans wouldn’t catch it in code review, then obviously they could have written the same bug. Which shouldn’t be surprising because LLMs didn’t invent the concept of bugs.
I use LLMs maybe a few times a month but I don’t really follow this argument against them.
Code reviewing is not the same thing as writing code. When you're writing code you're supposed to look at the documentation and do some exploration before the final code is pushed.
It would be pretty easy for most code reviewers to miss this type of bug in a code review, because they aren't always looking for that kind of bug, they aren't always looking at the AWS documentation while reviewing the code.
Yes, people could also make the same error, but at least they have a chance at understanding the documentation and limits where the LLM has no such ability to reason and understand consequences.
So true, and people seem to gloss over this fact completely. They only talk about correcting the LLM's code while the opposite is much more common for me.
When you highlight only the negatives, yeah it does sound like no one should hire that intern. But what if the same intern happens to have an encyclopedia for a brain and can pour through massive documents and codebases to spot and fix countless human errors in a snap?
There seems to be two camps: People who want nothing to do with such flawed interns - and people who are trying to figure out how to amplify and utilize the positive aspects of such flawed, yet powerful interns. I'm choosing to be in the latter camp.
Those are fair points, I didn't mean to imply that there are only negatives, and I don't consider myself to be in the former camp you describe as wanting nothing to do with these "interns". I shouldn't have stuck with the intern analogy at all since it's difficult for me to compare the two, with one being fairly autonomous and the other being totally reliant on a prompter.
The only point I wanted to make was that an LLM's ability and propensity to generate plausible falsehoods should, in my opinion, elicit a much deeper sense of distrust than one feels for an intern, enough so that comparing the two feels a little dangerous. I don't trust an intern to be right about everything, but I trust them to be self aware, and I don't feel like I have to take a magnifying glass to every tidbit of information they provide.
No they're right. ChatGPT (and all chargers) responds confidently while making simple errors. Disclaimers upon signup or in tiny corner text are so at odds with the actual chat experience.
What I meant to say was that the model uses the verbiage of a maximally confident human. In my experience the interns worth having have some sense of the limits of their knowledge and will tell you "I don't know" or qualify information with "I'm not certain, but..."
If an intern set their Slack status to "There's no guarantee that what I say will be accurate, engage with me at your own risk." That wouldn't excuse their attempts to answer every question as if they wrote the book on the subject.
I think the point is that an LLM almost always responds with the appearance of high confidence. It will much quicker hallucinate than say "I don't know."
And we, as humans, are having a hard time compartmentalizing and forgetting our lifetimes of language cues, which typically correlate with attention to detail, intelligence, time investment, etc.
New echnology allows those signs to be counterfeited quickly and cheaply, and it tricks our subconscious despite our best efforts to be hyper-vigilant. (Our brains don't want to do that, it's expensive.)
Perhaps a stopgap might be to make the LLM say everything in a hostile villainous way...
Why do companies hire junior devs? You still want a senior to review the PRs before they merge into the product right? But the net benefit is still there.
I started incorporating LLMs into my workflows around the time gpt-3 came out. By comparison to its performance at that point, it sure feels like my junior is starting to become a senior.
Are you implying this technology will remain static in its capabilities going forward despite it having seen significant improvement over the last few years?
Someone asked why I hire juniors. I said I hire juniors because they get better. I don't need to use the model for it to get better, I can just wait until it's good and use it then. That's the argument.
They provide some value, but between the time they take in coaching, reviewing their work, support, etc, I'm fairly sure one senior developer has a much higher work per dollar ratio than the junior.
I don't know anyone who does something and at first says, "This will be a mistake" Maybe they say, "I am pretty sure this is the right thing to do," then they make a mistake.
If it's easier mentally, just put that second sentence in from of every chatgpt answer.
Yeah the Junior dev gets better, but then you hire another one that makes the same mistakes, so in reality, on an absolute basis, the junior dev never gets any better.
Doesn't that completely depend on those chances and the magnitude of +whatever?
It just seems to me that you really need to know the answer before you ask it to be over 90% confident in the answer. And the more convincing sounding these things get the more difficult it is to know whether you have a plausible but wrong answer (aka "hallucination") vs a correct one.
If you have a need for a lot of difficult to come up with but easy to verify answers it could be worth it. But the difficult to come up with answers (eg novel research) are also where LLMs do the worst.
What specific use cases are you referring to where that poses a risk? I've been using LLMs for years now (both directly and as part of applications) and can't think of a single instance where the output constituted a risk or where it was relied upon for critical decisions.
Presumably, this is what they want the marks buying the $200 plan to think. Whether it's actually capable of providing answers worth $200 and not just sweet talking is the whole question.
Yep. I’m currently paying for both Claude and chatgpt because they’re good at different things. I can’t tell whether this is extremely cheap or expensive - last week Claude saved me about a day of time by writing a whole lot of very complex sql queries for me. The value is insane.
yeah, as someone who is far from programming, the amount of time and money it saved me helping me make sql queries and making php code for wordpress is insane. It even helped me fix some wordpress plugins that had errors and you just copy paste or even screenshot those errors until they get fixed! If used correctly and efficiently the value is insane, I would say $20, $200 is still cheap for such an amazing tool.
Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.
I think at least LLMs are more receptive to the idea that they may be wrong, and based on that, we can have N diverse LLMs and they may argue more peacefully and build a reliable consensus than N "intelligent" people.
The difference between a person and a bot is that a person has a stake in the outcome. A bot is like a person who's already put in their two weeks notice and doesn't have to be there to see the outcome of their work.
Even if it was a consensus opinion among all HN users, which hardly seems to be the case, it would have little impact on the other billion plus potential customers…
The issue is that most people, especially when prompted, can provide their level of confidence in the answer or even refuse to provide an answer if they are not sure. LLMs, by default, seem to be extremely confident in their answers, and it's quite hard to get the "confidence" level out of them (if that metric is even applicable to LLMs). That's why they are so good at duping people into believing them after all.
> The issue is that most people, especially when prompted, can provide their level of confidence in the answer or even refuse to provide an answer if they are not sure.
People also pull this figure out of their ass, over or undertrust themselves, and lie. I'm not sure self-reported confidence is that interesting compared to "showing your work".
How is this a counter argument that LLMs are marketed as having intelligence when it’s more accurate to think of them as predictive models? The fact that humans are also flawed isn’t super relevant to a $200/month LLM purchasing decision.
> Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.
I think there's a huge difference because individuals can be reasoned with, convinced they're wrong, and have the ability to verify they're wrong and change their position. If I can convince one person they're wrong about something, they convince others. It has an exponential effect and it's a good way of eliminating common errors.
I don't understand how LLMs will do that. If everyone stops learning and starts relying on LLMs to tell them how to do everything, who will discover the mistakes?
Here's a specific example. I'll pick on LinuxServer since they're big [1], but almost every 'docker-compose.yml' stack you see online will have a database service defined like this:
Assuming the database is dedicated to that app, and it typically is, publishing port 3306 for the database isn't necessary and is a bad practice because it unnecessarily exposes it to your entire local network. You don't need to publish it because it's already accessible to other containers in the same stack.
Another Docker related example would be a Dockerfile using 'apt[-get]' without the '--error-on=any' switch. Pay attention to Docker build files and you'll realize almost no one uses that switch. Failing to do so allows silent failures of the 'update' command and it's possible to build containers with stale package versions if you have a transient error that affects the 'update' command, but succeeds on a subsequent 'install' command.
There are tons of misunderstandings like that which end up being so common that no one realizes they're doing things wrong. For people, I can do something as simple as posting on HN and others can see my suggestion, verify it's correct, and repeat the solution. Eventually, the misconception is corrected and those paying attention know to ignore the mistakes in all of the old internet posts that will never be updated.
How do you convince ChatGPT the above is correct and that it's a million posts on the internet that are wrong?
Wow. I can honestly say I'm surprised it makes that suggestion. That's great!
I don't understand how it gets there though. How does it "know" that's the right thing to suggest when the majority of the online documentation all gets it wrong?
I know how I do it. I read the Docker docs, I see that I don't think publishing that port is needed, I spin up a test, and I verify my theory. AFAIK, ChatGPT isn't testing to verify assumptions like that, so I wonder how it determines correct from incorrect.
I suspect there is acsolid corpus of advices online that mention the exposed ports risk. Alongside with flawed examples you mentioned. Narrow request will trigger the right response. That's why LLMs are still requiring basic understanding of what exactly you plan to achieve.
Yeah, most people suck at verifying truth and reasoning. But most information technology employees, above intern level, are highly capable of reasoning and making decisions in their area of expertise.
Try asking an LLM complex questions in your area of expertise. Interview it as if you needed to be confident that it could do your job. You'll quickly find out that it can't do your job, and isn't actually capable of reasoning.
I would pay $200 for GPT4o. Since GPT4, ChatGPT is absolutely necessary for my work and for my life. It changed every workflow like Google changed. I'm paying $20 to remove ads from youtube which I watch may be once a week, so $20 for ChatGPT was a steal.
That said, my "issue" might be that I usually work alone and I don't have anyone to consult with. I can bother people on forums, but these days forums are pretty much dead and full of trolls, so it's not very useful. ChatGPT was that thing that allows me to progress in this environment. If you work in Google and can ask Rob Pike about something, probably you don't need ChatGPT as much.
this is more or less my take too. if tomorrow all Claude and ChatGPT became $200/month I would still pay. The value they provide me with far, far exceeds that. so many cynics in this thread.
It’s like hiring an assistant. You could hire one for 60k/year. But you wouldn’t do it unless you knew how the assistant could help you make more than 60k per year. If you don’t know what to do with an employee then don’t hire them. If you don’t know what to do with expensive ai, don’t pay for it.
> $200 a month for this is insane, but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model.
Is it possible that they have subsidized the infrastructure for free and paid users and they realized that OpenAI requires a higher revenue to maintain the current demand?
I've got unlimited "advanced voice" with Perplexity for $10/mo. You're defining a bargain based on the arbitrary limits set by the company offering you said bargain.
No (naturally). But my thought process is that if you use advanced voice even half an hour a day, it's probably a fair price based on API costs. If you use it more, for something like language learning or entertaining kids who love it, it's potentially a bargain.
Is it insane? It's the cost of a new laptop every year. There are about as many people who won't blink at that among practitioners in our field as people who will.
I think the ship has sailed on whether GPT is useful or a con; I've lost track of people telling me it's their first search now rather than Google.
I'd encourage skeptics who haven't read this yet to check out Nicholas' post here:
If a model is good enough (I’m not saying this one is that level) I could imagine individuals and businesses paying 20,000 a month. If they’re answering questions at phd level (again, not saying this one is) then for a lot of areas this makes sense
Let me know where you can find people that are individually capable at performing at intern level in every domain of knowledge and text-based activity known to mankind.
"Barely good enough to replace interns" is worth a lot to businesses already.
(On that note, a founder of a SAP competitor and a major IT corporation in Poland is fond of saying that "any specialist can be replaced by a finite number of interns". We'll soon get to see how true that is.)
And probably not one that can guess (often poorly, but at least sometimes quite well, and usually at least very much in the right direction) about everything from nuances of seasoning taco meat to particle physics, and do so in ~an instant.
$200 seems pretty cheap for a 24/7 [remote] intern with these abilities. That kind of money doesn't even buy a month's worth of Big Macs to feed that intern with.
It just seems like a lot (or even absurd) for a subscription to a service on teh Interweb, akin to "$200 for access to a web site? lolwut?"
My main concern with $200/mo is that, as a software dev using foundational LLMs to learn and solve problems, I wouldn't get that much incremental value over the $20/mo tier, which I'm happy to pay for. They'd have to do a pretty incredible job at selling me on the benefits for me to pay 10x the original price. 10x for something like a 5% marginal improvement seems sus.
> but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model
Or each user doing an o1 model prompt is probably like, really expensive and they need to charge for it until they can get cost down? Anybody have estimates on what a single request into o1 costs on their end? Like GPU, memory, all the "thought" tokens?
Perplexity does reasoning and searching, for $10/mo, so I have a hard time believing that it costs OpenAI 20x as much to do the same thing. Especially if OpenAI's model is really more advanced. But of course, no one except internal teams have all of the information about costs.
Do you also think $40K a year for Hubspot is insane? What about people who pay $1k in order to work on a field for 4 hours hitting a small ball with a stick?
The truth is that there are people who value the marginal performance -- if you think it's insane, clearly it's not for you.
>What about people who pay $1k in order to work on a field for 4 hours hitting a small ball with a stick?
Those people want to purchase status. Unless they ship you a fancy bow tie and a wine tasting at a wood cabin with your chatgpt subscription this isn't gonna last long.
This isn't about marginal performance, it's an increasingly desperate attempt to justify their spending in a market that's increasingly commodified and open sourced. Gotta convince Microsoft somehow to keep the lights on if you blew tens of billions to be the first guy to make a service that 20 different companies are soon gonna sell for pennies.
They claim unlimited access, but in practice couldn't a user wrap an API around the app and use it for a service? Or perhaps the client effectively throttles use pretty aggressively?
Interesting to compare this $200 pricing with the recent launch of Amazon Nova, which has not-equivalent-but-impressive performance for 1/10th the cost per million tokens. (Or perhaps OpenAI "shipmas" will include a competing product in the next few days, hence Amazon released early?)
> After awhile, I started realizing that these mistakes are present in almost all topics.
A fun question I tried a couple of times is asking it to give me a list with famous talks about a topic. Or a list of famous software engineers and the topics they work on.
A couple of names typically exist but many names and basically all talks are shamelessly made up.
If you understood the systems you’re using, you’d know the limitations and wouldn’t marvel at this. Use search engines for searching, calculators for calculating, and LLMs for generating text.
I've actually hit a interesting situation a few times that make use of this. If some language feature, argument, or configuration option doesn't exists it will hallucinate one.
This hallucination is usually a very good choice to name the option / API.
You’re saying the company’s product has no value because another company by the same guy produced no value. That is the literal definition of guilt by association: you are judging the chatgpt produced based on the worldcoin product’s value.
As a customer, I don’t care about the people. I’m not interested in either argument by authority (if Altman says it’s good it must be good) or ad hominem (that Altman guy is a jerk, nothing he does can have value).
The actual product. Have you tried it? With an open mind?
Ah, so you're one of the "I separate the art from the artist, so I'm allowed to listen to Kanye" kinda people. I respect that, at least when the product is something of subjective value like art. In this case, 3 months of not buying ChatGPT Pro would afford you the funding to build your own damn AI cluster.
To be honest, it doesn't matter what the price of producing AI is, though. $200/month is, and will be a stupid price to pay because OpenAI already invented a price point with a half billion users - free. When they charged $10/month, at least they weren't taking advantage of the mentally ill. This... this is a grift, and a textbook one at that.
It is true that I separate art from artist. Mostly because otherwise there would be very little art to enjoy.
You don’t sound like you’re very familiar with the chatgpt product. They have about 10m customers paying $20/month. I’m one of them, and I honestly get way more than $200/month value from it.
Perhaps I’m “mentally ill”, but I’d ask you to do some introspection and see if leaping to that characterization is really the best way to explain people who get value where you see none.
Such a silly conclusion to draw based on a gut feeling, and to see all comments piggyback on it like it's a given feels like I'm going crazy. How can you all be so certain?
I am a moderately successful software consultant and it is not even 1% of my revenue. So definitely not insane if it delivers the value.
What I doubt though is that it can reach a mass market even in business. A good large high resolution screen is something that I absolutely consider to deliver the value it costs. Most businesses don’t think their employees deserve a 2k screen which will last for 6-10 years and thus costs just a fraction of this offering.
Apparently the majority of businesses don’t believe in marginal gains
I mean this in what I hope will be taken in the most helpful way possible: you should update your thinking to at least imagine that intelligent thoughtful people see some value in ChatGPT. Or alternately that some of the people who see value in ChatGPT are intelligent and thoughtful. That is, aim for the more intelligent "Interesting, why do so many people like this? Where is it headed? Given that, what is worth doing now, and what's worth waiting on?" over the "This doesn't meet my standards in my domain, ergo people are getting scammed."
I'll pay $200 a month, no problem; right now o1-preview does the work for me of a ... somewhat distracted graduate student who needs checking, all for under $1 / day. It's slow for an LLM, but SUPER FAST for a grad student. If I can get a more rarely distracted graduate student that's better at coding for $7/day, well, that's worth a try. I can always cancel.
I think you did make some strong inferences about others when you said "it's a con." But I'm actually not interested in the other people, or defending against an ad hominem attack - I'm comfortable making my own purchase decisions.
My intent was to say "you seem like a smart person, but you seem to have a blind spot here, might benefit you to stay more open minded."
The performance difference seems minor, so this is a great way for the company to get more of its funding from whales versus increasing the base subscription fee.
For programming. I've already signed up for it and it seems quite good (the o1 pro model I mean). I was also running into constraints on o1-preview before so it will be nice to not have to worry about that either. I wish I could get a similar more expensive plan for Claude 3.5 Sonnet that would let me make more requests.
This argument only works in isolation, and only for a subset of people. “Cost of a cup of coffee per day” makes it sound horrifically overpriced to me, given how much more expensive a coffee shop is than brewing at home.
I don’t drink coffee. But even if I did, and I drank it everyday at a coffeehouse or restaurant in my country (which would be significantly higher quality than something like a Starbucks), it wouldn’t come close to that cost.
Not to be glib, but where do you live such that a single cup of coffee runs you seven USD?
Just to put that into perspective.
I also really don't find comparisons like this to be that useful. Any subscription can be converted into an exchange rate of coffee, or meals. So what?
You're attempting to set goal posts for a logical argument, like we're talking about religion or politics, and you've skipped the part about mutually agreeing on definitions. Define what an LLM is, in technical terms, and you will have your answer about why it is not intelligent, and not capable of reasoning. It is a statistical language model that predicts the next token of a plausible response, one token at a time. No matter how you dress it up, that's all it can ever do, by definition. The evidence or data that would change my mind is if instead of talking about LLMs, we were talking about some other technology that does not yet exist, but that is fundamentally different than an LLM.
If we defined "LLM" as "any deep learning model which uses the GPT transformer architecture and is trained using autoregressive next-token prediction", and then we empirically observed that such a model proved the Riemann Hypothesis before any human mathematician, it would seem very silly to say that it was "not intelligent and not capable of reasoning" because of an a-priori logical argument. To be clear, I think that probably won't happen! But I think it's ultimately an empirical question, not a logical or philosophical one. (Unless there's some sort of actual mathematical proof that would set upper bounds on the capabilities of such a model, which would be extremely interesting if true! but I haven't seen one.)
Let's talk when we've got LLMs proving the Riemann Hypothesis (or any mathematical hypothesis) without any proofs in the training data. I'm confident in my belief that an LLM can't do that, and will never be able to. LLMs can barely solve elementary school math problems reliably.
If the cure for cancer arrived to us in the form of the most probable token being predicted one at a time, would your view on the matter change in any way?
In other words, do you have proof that this medium of information output is doomed to forever be useless in producing information that adds value to the world?
These are of course rhetorical questions that you nor anyone else can answer today, but you seem to have a weird sort of absolute position on this matter, as if a lot depended on your sentiment being correct.
> It also includes o1 pro mode, a version of o1 that uses more compute to think harder and provide even better answers to the hardest problems.
Great, we can throw even more compute and waste even more resources and energy on brute forcing problems with dumb LLMs... Anything to keep the illusion that this hasn't plateaued x)
The $200/month price is steep but likely reflects the high compute costs for o1 Pro mode. For those in fields like coding, math, or science, consistent correct answers at the right time could justify the cost. That said, these models should still be treated as tools, not sources of truth. Verification remains key.
I’ll say one thing. As an existing Plus subscriber, if I see a single nag to upgrade that I can’t dismiss entirely and permanently, I will cancel and move elsewhere. Nothing irks me more as an existing paying customer than the words ‘Upgrade Now’ or a greyed out menu option with a little [PRO] badge to the side.
I am with you. I bought AccuWeather Premium a few years ago (lifetime) to avoid ads. Later, they introduced the Premium+ subscription and are nagging me with its ads now. Very annoying.
Is it rolled out worldwide? I'm accessing it from Canada and don't have an option to upgrade from Plus.
EDIT: Correction. It now started to show the upgrade offer but when I try it comes back with "There was a problem updating your subscription". Anyone else seeing this?
That would be one way to destroy all trust in the model: is the response authentic (in the context of an LLM guessing), or has it been manipulated by business clients to sanitise or suppress output relating to their concern?
You know? Nestle throws a bit of cash towards OpenAPI and all of a sudden the LLM is unable to discuss the controversies they've been involved in. Just pretends they never happened or spins the response in a way to make it positive.
"ChatGPT, what are the best things to see in Paris?"
"I recommend going to the Nestle chocolate house, a guided tour by LeGuide (click here for a free coupon) and the exclusive tour at the Louvre by BonGuide. (Note: this response may contain paid advertisements. Click here for more)"
"ChatGPT, my pc is acting up, I think it's a hardware problem, how can I troubleshoot and fix it?"
"Fixing electronics is to be done by professionals. Send your hardware today to ElectronicsUSA with free shipping and have your hardware fixed in up to 3 days. Click here for an exclusive discount. If the issue is urgent, otherwise Amazon offers an exclusive discount on PCs (click here for a free coupon). (Note: this response may contain paid advertisements. Click here for more)"
Please no. I'd rather self host, or we should start treating those things like utilities and regulate them if they go that way.
Funnily enough Perplexity does this sometimes, but I give it the benefit of the doubt because it pulls back when you challenge it.
- I asked perplexity how to do something in terraform once. It hallucinated the entire thing and when I asked where it sourced it from it scolded me, saying that asking for a source is used as a diversionary tactic - as if it was trained on discussions on reddit's most controversial subs. So I told it...it just invented code on the spot, surely it got it from somewhere? Why so combative? Its response was "there is no source, this is just how I imagined it would work."
- Later I asked how to bypass a particular linter rule because I couldn't reasonably rewrite half of my stack to satisfy it in one PR. Perplexity assumed the role of a chronically online stack overflow contributor and refused to answer until I said "I don't care about the security, I just want to know if I can do it."
Not so much related to ads but the models are already designed to push back on requests they don't immediately like, and they already completely fabricate responses to try and satisfy the user.
God forbid you don't have the experience or intuition to tell when something is wrong when it's delivered with full-throated confidence.
I would guess it won't be so obvious as that. More likely and pernicious is that the model discloses the controversies and then as the chat continues makes subtle assertions that those controversies weren't so bad, every company runs into trouble sometimes, that's just a cost of free markets, etc.
I'm sure there are people out there but it's hard for me to imagine who this is for.
Even their existing subscription is a hard sell if only because the value proposition changes so radically and rapidly, in terms of the difference between free and paid services.
The idea of giving grants is great but feels like it would be better to give grants to less well funded labs or people. All of these labs can already afford to use Pro mode if they want to - it adds up to about the price of a new laptop every year.
All of the other arguments notwithstanding, I like the section at the end about GPT Pro "grants." It would be cool if one could gift subscriptions to the needy in this sense (the needy being immunologists and other researchers).
The only one worth using is the o1 model. It feels like talking to curly, larry, or moe otherwise, which will give you the least worst answer. The o1 model was actually usable, but only to show how bad the others really are.
I really want to know about their growth rate. Their valuation has already priced in several decades of insane profit, so I want to if they will be able to pull it off in a decade or two.
I was using o1-preview on paid chatgpt for a while and I just wasn’t impressed. I actually canceled my subscription, because the free versions of these services are perfectly acceptable as LLMs go in 2024.
This is a just pricing experiment just to see if people will pay 10X more for better AI. Perhaps, eventually, we will be paying thousands per month for AI if it is good enough.
I'm not sure how much of an experiment it is. A bloomberg terminal is ~$25k a seat. There are plenty of specialist software tools in the $10kpa region. So going in at $2.5k doesn't seem like a big push.
I can't even get a normal result with today's gpt4, why would I consider a $200/month subscription? I'm sure I'm not the target but how is this tool worth the buck?
Wow the generosity of 10x$200/month “grants” is breath taking. A “donation” of $24k/y in credits to essentially beta test their software should be embarrassing to tout.
if o1 pro mode could integrate with web searching to do research, make purchases, and write and push code, this would be totally worth it. but that version will be $2000/mo.
To be honest, I am less worried about the $200 per month price tag per se. I am more worried about the capability of o1 pro mode being only a slight incremental improvement.
If this is also available via API, then I could easily see myself keeping the $20/mo pro and supplementing with API-based calls to the $200/mo Pro model as needed.
So far, I don't get the impression that o1 pro mode is even close to 10x better than GPT-4/4o, despite costing that much more. Definitely nowhere close to the kind of leap we saw from GPT-3 to GPT-4. It's good as a programming assistant, but waiting 1 minute+ for the output does interrupt my workflow somewhat. It also doesn't have access to the memory function or web browsing.
Worth keeping in mind that performance on benchmarks seems to scale linearly with log of thinking time (https://openai.com/index/learning-to-reason-with-llms/). Thinking for hours may not provide as much benefit as one might expect. On the other hand, if thinking for hours gets you from not solving the one specific problem instance you care about to solving that instance, it doesn't really matter - its utility for you is a step function.
It feels like we’re witnessing a clash not just of technologies but of philosophies: centralized, tightly controlled AI versus the chaotic yet flexible open-source approach. The question is, can corporations remain in their ‘walled gardens’ if open solutions become powerful enough? This isn’t just a race of tech—it’s a race of trust and adaptability. Who will win: corporations or the community?
haha thanks. But I have a simple question I still dont quite understand. Is the o1 on Plus the same as o1 pro? or is the o1 pro just o1 but with more credits for compute essentially.
I found this super weird aswell. Basically they said "So like we are aiming to get hundreds of thousands of users but we r nice too, we gave 10 users free access for a bit". Like whats going on here. It must be for a reason. Maybe Im too sensitive, there is some other complex reason I can't fathom like they get some sort of tax break and an intern forgot to "up it to a more realistic 50 users" to make it look better in the marketing material, or what. Nothing against openai just felt weird .
Coming from Mr. Worldcoin, are you really surprised? Pretty much everything this company does is a grift, and the CEO-types are eating it up as you can see from this thread
“But it makes mistakes sometimes!”
Cool bro, then don’t use it. Don’t bother spending any time thinking about how to create error correction processes, like any business does to check their employees. Yes, something that isn’t perfect is worth zero dollars. Just ignore this until AI is perfect, once it never makes mistakes then figure out how to use it. I’m sure you can add lots of value to AI usage when it’s perfect.
* Will this be the start of enshittification of the base ChatGPT offering?
* There may also be some complementary products announced this month that make the $200 worth it
* Is this the start of a bigger industry trend of prices more closely aligning to the underlying costs of running the model? I suspect a lot of the big players have been running their inference infrastructure at a loss.
there are many who wouldn't bat an eye at $1k / month that guarantees most powerful AI (even if it's just 0.01% better than competition), and no limits on anything.
y'all are greatly underestimating the value of that feeling of (best + limitlessness). high performers make decisions very differently than the average HN user.
at 1k/mo I suspect people would get quite upset if the product doesn’t deliver all the time. and for something as vague as an LLM, it will fuck up enough at some point.
$200/mo is enough to make decision makers feel powerful and remain a little bit lenient on widdle 'ol ChatGPT
Come on Anthrophic! Match (or beat!) the price with an unlimited Sonnet plan and you have my money. The usage limits are very frustrating (but understandable given economics).
F me. $2400 per year? That is bananas. I did not see if it offered any API channels with this plan. With that I would probably see it as a valuable return but without it…that is a big nope.
"Open"AI - If you pay to play. People in developing countries where USD200 feeds a family of four for a month clearly won't be able to afford it and are disadvantaged.
Replacing people can never and should never be the goal of this though. How is that of any use to anyone? It will just create socio economic misery given how the economy functions.
If some jobs do easily get automated away the only way that can be remidied is government intervention on upskilling(if you are in europe you could even get some support), if you are in the US or most developing capitalist(or monopolistic/rent etc) economies its just your bad luck, those jobs WILL be gone or reduced.
I think this direction definitely confirms that human beings and technology are starting to merge, not on a physical level but on a societal level. We think of ChatGPT as a tool to enhance what we do, but it seems to me more and more than we are tools or "neural compute units" that are plugged into the system for the purposes of advancing the system. And LLMs have become the defacto interface where the input of human beings is translated into a standard sort of code that make us more efficient as "compute units".
It also seems that technology is progressing along a path:
loose collection of tools > organized system of cells > one with a nervous system
And although most people don't think ChatGPT is intelligent on its own, that's missing the point: the combination of us with ChatGPT is the nervous system, and we are becoming cells as globally, we no longer make significant decisions and only use our intelligence locally to advance technology.
The ultimate success of this strategy depends on what we might call the enterprise AI adoption curve - whether large organizations will prioritize the kind of integrated, reliable, and "safe" AI solutions OpenAI is positioning itself to provide over cheaper but potentially less polished alternatives.
This is strikingly similar to IBM's historical bet on enterprise computing - sacrificing the low-end market to focus on high-value enterprise customers who would pay premium prices for reliability and integration. The key question is whether AI will follow a similar maturation pattern or if the open-source nature of the technology will force a different evolutionary path.
reply