Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI to discontinue support for the Codex API
175 points by thejosh on March 21, 2023 | hide | past | favorite | 131 comments
Just received this email, can't find a blog post yet?

On March 23rd, we will discontinue support for the Codex API. All customers will have to transition to a different model. Codex was initially introduced as a free limited beta in 2021, and has maintained that status to date. Given the advancements of our newest GPT-3.5 models for coding tasks, we will no longer be supporting Codex and encourage all customers to transition to GPT-3.5-Turbo. About GPT-3.5-Turbo GPT-3.5-Turbo is the most cost effective and performant model in the GPT-3.5 family. It can both do coding tasks while also being complemented with flexible natural language capabilities.

You can learn more through: GPT-3.5 model overview Chat completions guide

Models affected The following models will be discontinued: code-cushman:001 code-cushman:002 code-davinci:001 code-davinci:002

We understand this transition may be temporarily inconvenient, but we are confident it will allow us to increase our investment in our latest and most capable models.

—The OpenAI team




In my testing, Codex is still better at some code tasks than GPT 3.5/4. I am sad to see it go.

It looks like Codex will still be available via Azure: https://learn.microsoft.com/en-us/azure/cognitive-services/o...

I imagine MSFT would never give their customers 3 days notice of imminent service shutdown.


The problem that I saw with Codex is that it routinely bugs out and starts repeating something infinitely. And the model parameters can't stop that from happening.


I see something similar with ChatGPT too... it seems to forget a previous constraints and jumps to the generic. In my example, I was trying to generate an ffmpeg command line that given a source file will, during transcode, generate stereo audio even if the source file had no audio (so basically output a blank audio stream). It kept using 0:a in the complex filter and I would ask it what would happen if there was no audio stream in the input file? It would actually correct itself but something else would break in the fix. I would point out something else was wrong and it would fix that but reintroduce 0:a.


Maybe the output was long enough that the context window would miss your previous prompts?


I'm late, but the model can remember 4000 tokens, which according to openai is about 3000 words


This feels like an aggressive push to force codex users to adapt to GPT3.5/4. I'd imagine they will be integrating some of codex's features into GPT4 soon.


It's ridiculous. Now we all have to implement "chat" even if we all want is code completion


No. I have an app that uses both. You do not have to implement chat.

All you have to do is 1) call the chat completion API, 2) structure your prompt in JSON as system+agent+user, and 3) get the completion from a different JSON node. You don’t have to round trip or maintain context or anything.


I'm seeing that the chat api wants to always wrap its response in "```python ```" when it generates code. Is there a prompt trick to stop this, or should I just be stripping it manually


[flagged]


Consumers of the API


Why would consumers need to implement it?

Isn’t it just like, send prompt, get response? Why is it necessary to implement chat if you don’t care about chat? Is it used for scheduling or something?


Its all to do with how the API is designed. Now the prompts need to be sent in the form of chat, and we need to worry about text being output from the model, and not just code. Basically, users need to comb through their codebase, change their API calls and see if nothing breaks. While gpt turbo is quite good, it was quite a headache to replace davinci calls with GPT Turbo, especially when I closely integrated it with a chain of prompts. I can only imagine it being more of a headache for codex users


You already had to deal with text being output from the Codex models. I regularly ran into text from Python mailing lists when using the Codex API. Sometimes it writes code, loses coherency and types out pages and pages of Python mailing list comments and signatures.

Any human text I get from the Chat completion API with the task of completing code is at least relevant to the code being written.


I tried a prompt like this to get it to spit code out:

“Pretend you are a SQL expert. You convert requests written in English to Postgres SQL. Your output is always SQL only. If you need to respond with something other than SQL code, add it to comments within the SQL code you generate and prepend it with ‘Note: ‘. Your output will automatically be executed on a read-only Postgres database called ‘my_db’ and the result of that will be given to you. If you are still working on getting an answer, add ‘Working’ at the end of your message, or ‘Done’ if you for the result you needed to answer the request.”

Request: How many API calls were made by clients last month?”

You can keep that at the top of every request, while running the SQL code iteratively until it says Done.


While I agree that some changes have to be done in your code, preventing the model from outputting text is relatively easy, you just have to have a strong system message to set the model to give you the output you want.


Until someone accidentally jailbreaks it by putting weird comments in their code. Somewhat /s but within the realm of possibility.


Azure is good for its years-in-advance warnings of shutdowns


This isn't totally out of the blue - they have previously e-mailed API users claiming that the GPT-3.5-Turbo model is just as good/better as well as being faster & cheaper per token than their other models, and would only require a small change to prompts (claims of being better, of course, isn't true for all cases).

I suspect the reason for this (especially given the extremely short timeline) is they're desperately trying to free up space in their clusters for GPT-4 nodes - they've clearly been struggling under the demand for GPT-4 (with ChatGPT Plus subscribers initially getting 100 GPT-4 messages every few hours, then 50, then 25 with a warning that the limit will be reduced further this week).

I assume the limiting factor here is how many A100 80GB instances Microsoft is able to get and rack from nVidia.


If it is for capacity issue, they can just say it.

Also cushman is, allegedly, a 12B model. It can be served using much lesser demanding hardware.

Honestly, they can just severely reduce the nodes allocate to those models, employing punishing rate limits, instead of announcing such a short noticed retirement, if resources are REALLY the issue.


If you're trying to convince people to replace humans with your tech then I could see a business not wanting to appear like they're struggling with demand (although, like you I don't think that's the right choice).

Like the sibling comment said, it could also be limited by developer capacity - they reportedly have 350 employees (excluding contractors helping with training), with that elite size you wouldn't want to be supporting a large set of different clusters.


Riffing on that a bit, I just don't get the sense OpenAI wants to be in the business of hosting mission critical APIs for Fortune 500 companies. To your point, running AWS level services is an entirely different business that doesn't make sense for OpenAI's relatively small team. If you need rock solid stability and scale, OpenAI and Microsoft would probably rather have you just use "Azure OpenAI" services, which is still hosting Codex: https://learn.microsoft.com/en-us/azure/cognitive-services/o...

Reading between the lines, I think OpenAI is signaling that if you want a more mature and stable product, go with Azure. If you want the latest cutting edge, and don't mind a little instability, keep using OpenAI API's.


That's a good take, I hadn't thought about it like that


I don’t think they owe you an explanation.

And resources aren’t just A100’s. They have engineers maintaining those older models, and support staff answering questions, and so on.

OpenAI has found themselves at the center of a gold rush. I don’t blame them for not wanting to operate silver mines, and I don’t expect them to ask my permission or justify themselves to me.

(I will be impacted by the codex shutdown. It’s fine. If I wanted to be in a perfectly stable space I’d go support cobol banking applications)


Interesting… so finally a tech advantage to being in Australia. I still get the 100 messages/4h during the day here.

Late at night (it’s about 1am now) it’s hard to get logged in due to congestion.


That's not cool giving your paying customers only a 3-day notice before making a breaking change / shutting down a service.


> Codex was initially introduced as a free limited beta in 2021, and has maintained that status to date.


free or not, 3 days of warning for any service termination sucks.


I bet we know what took them down the other day now…


So? They're still paying customers if they're paying for something else but also using Codex.


So? They're still getting what they pay for.


I'm struggling to find a comparison but I think it's probably because there's a clear and easy migration path that they can shut it off for users who pay the company, and expect support, and not that it was free and in beta. I don't mean legally acceptable because it's spelled out in the ToS, but acceptable to the community.


It will still break all apps out there that rely on a workflow with this model. Some will need to regenerate all their data


Yeah that's why it seems bad now. But hopefully for them they thought it through and most wanted to upgrade right away anyway the data regeneration will be quick.


AWS is an example. There are plenty of services which have a free tier. If you are depending on that service you would hope that Amazon doesn't just delete it from out under you. Unlike a pricing update. You can't just pay more to prevent production going down.


seriously? you're being disingenuous

codex being in beta meant it was improving, not that it'd vanish overnight.

it's not about "getting what they pay for," it's about showing some respect to users who trusted the service.


Beta doesn't mean improving, it means not ready for production and with limited or no guarantees.


“Beta” also very much means “use at your own risk”.


So? It's still not cool.


Codex (code-davinci-002) is free (limited beta).


Given the fairly poor level of service paying customers get it's not too surprising.


It seems on brand for OpenAI.


I wonder if this is to stop JetBrains from competing with whatever solution Microsoft will (presumably) bundle in Visual Studio.


Doubtful. OpenAI, and Microsoft by extension, would gladly like to sell access to their APIs.

Copilot, for example, is available for Jetbrains IDEs.


But do they want JB to create their own Copilot though? It is surely a possibility with OpenAI's API, that is what GitHub is doing anyway, building a layer upon essentially Codex models.


I just got the email as well. Pretty disappointed that they're doing such a short wind-down time, we were actively using the codex models for a number of experimental platforms in our research group.


ChatGPT is extremely similar and doesn't flip out and start repeating like code-davinci-002 does sometimes (and the model parameters lie freq penalty don't stop it).


Codex should still be available via Azure, just not via OpenAI.


It's 10 cents/1K tokens - far cry from the 2 cents/1K tokens that equivalent size models OpenAI was offering


codex (code-davinci-002) was free. This is going to be a huge deal for research groups.


OpenAI was likely running it at significant loss, that's just not sustainable when they're already making huge losses on every single api call.


legit question, how do you know they are losing money?


As hhh already said, it was free for a long time. In addition it costs a fortune to run their models. They've mentioned as much on twitter a few times now.

Their compute is provided by Microsoft and they've got investors but even with the new premium plan its extremely unlikely they are making anything close to enough to cover their costs.

Eventually OpenAI will likely be absorbed by Microsoft as they are betting heavily on their tech and have made waves about significant investments followed by a significant ownership stake.

This article covers things fairly well: https://archive.is/qMUPH

For some more (pre GPT-4) cost estimates this article is on the mark: https://indianexpress.com/article/technology/tech-news-techn...

At $0.0003 per word (will be more with GPT-4 as its a model that requires more resources to run) their costs were over $100K per day back in December. Given how much it's taken off since then we're likely into well over $1 million per day now. The subscription service obviously isnt going to make that much for a long, long time, if ever.

In 2021 they lost over $500 million as well so Microsofts $10bn investment will cover them for a few years but will take some doing to recoup.


It was free.


Interesting. That probably means a Github copilot upgrade to a GPT-4 model is incoming.


sure hope so, especially with copilot labs brushes..

imagine highlighting some code and dumping the contents of a jira ticket into a text prompt.. these days are coming.


I’m currently slapping together something like this, though the highlighting of code is automated – in a sense – when an error is piped into the program and the error stack is used as a prompt to find a solution to the error.

The goal is to splice a solution into a file on a new branch, write a test to ensure it resolves the issue (if necessary), then create a pull request describing the change set and how the test addresses the issue.

I’ve so far worked through it manually and it works okay, but for fairly limited scopes. Like, why didn’t this environmental variable load properly? Why couldn’t we parse this input? Why did this time out? No big picture stuff.

I have a feeling GitHub will be adding this directly to repositories so initial solutions and ideas (perhaps even fixes) can be generated by an AI trained on your repo and its issues/PRs/dependencies. I’m not doing this with aspirations of building a successful tool so much as learning to think with AI as a tool for improving systems.

It’s a very tractable thing at the moment and I think future iterations of GPT will make it truly useful.


I agree, it seems inevitable that github would include this sort of auto-resolve functionality. I have had good luck with GPT outputting unified diff and LSP format. As the neighbor comment mentions, context window will be a limitation especially with large codebases - I'm thinking for that, it will be necessary to either be able to fine-tune or to summarize the rest of the code base and take multiple requests, and store the results of these intermediate summaries. I can construct this sort of thing by hand on larger codebases already, surely it can be automated to a large degree.


I think the problem will be context and token limit size. Right now, GitHub Copilot uses the current file “and related files” as context. But if you have several thousand files, even GPT-4 won’t be able to handle that. Either you need to cleverly know exactly which files to feed in for context or have a very very high token limit size.


Does GitHub Copilot use Codex model?


Yes


Actually, there is no way to be sure^. If you think about the costs + scale, it's likely to be cushman (code-cushman-001).

^ Unless you are from OpenAI, in which case I have more questions for you :)


They use similar models to the codex models, they mentioned it a couple years ago


cushman and code-davinci are similar for sure (same architecture). Perhaps that's what they meant.


was codex API used in GitHub copilot if so what are implications on Copilot as our team depends significantly on copilot for our work in neovim?


Not the API itself, but the LLM behind it was a codex model. They are essentially saying Codex era models are obsolete so that means an upgrade announcement is probably only a few weeks off.


could also just migrate to gpt3.5 turbo as it's more cost effective


what are drawbacks of gpt-4 relative to codex ?


It costs money, mainly.


any other drawbacks on codex vs gpt-4


One thing that hasn't gotten much focus with 3.5 and 4 is fine tuning. It would be interesting to see what the road map looks like for this. Even with more tokens, fine tuning still seems necessary.


This is where we are at right now. Even 128k tokens would not cut it for what we want to do. Currently experimenting with fine-tunes on the base models and seeing some promising progress.

I suspect that the chat models will never be opened for tuning due to all of the moderation and security concerns.

My strategy will be to go into the DIY dark forest if we can't get curie or davinci to start behaving as expected.


What stack are you using? If C# I’m pretty close to having (what I think) is a much nicer fine tuning approach than the raw apis. I’m hoping to push this out tonight if I get a few spare hours. (OpenAILib on nuget).


We are using C#. We do already have an in-house interface for all of this.

Took us about a week to get through the prompt/completion training mess to a stable pattern. Whitespace and newlines are a hell of a thing with these LLMs.


Agree. That is the problem I aim to improve a bit.


But this will leave no code completion model from OpenAI directly

ChatGPT model isn’t a direct replacement to Codex. Also I believe code davinci 002 is only recently trained/released.

So moving forward few can compete with Copilot for code completion business by just piggybacking on OpenAI’s API

Someone mentioned GPT4, which currently is so slow and more expensive than cushman. If GitHub is going to swap with GPT4, they probably need to 10x of the charges to their customers (currently 10 dollars)

For some tasks they could do with GPT4 but not the completion business.


I always felt like Codex API is redundant, We can surely use GPT-3.5 API to make it work. Just provide this system instruction to GPT-3.5-turbo: "You are Codex Open AI model, which returns the next line or block of code as per user-provided code in chat. Obey indentations and don't provide text instructions as your chat output will be used directly in the code file without any cleaning or refactoring. Note: If completion is done on the same line don't repeat earlier code."


3.5-turbo is not the same thing at all, it's in a chat format so it's not very suitable for things like code completion in editors (Copilot is powered by a version of Codex)


If Microsoft is replacing copilot with GPT4, this is basically how they would make it behave:

They prepend each request with a system message that reads

> You are a coding assistant. Anything fed to you should have the current line, or code block, or file completed, based off of your best interpretation of the context.

> Should you encounter a comment in the code, and it begins with the string "prompt: ", interpret the remaining text of the comment as a prompt from the user regarding specific output that should be generated, rather than implicit code completion

In a case like this, you could just not use "prompt" behavior and it should work basically the same.


This gets into an inception-style loop.

A completion ML being used as a chat bot being used as a completion tool.


What does this mean for text-davinci-001, text-davinci-002, text-davinci-003? Are they safe?


They're paid so presumably safer at least


Also had this thought. We are working on base davinci via API right now and wondering how far this goes. Is OpenAI going to corral everyone into the neutered chatbot API, or are tunable models staying on the roadmap?


Can you elaborate on why you see it as neutered?


Primarily lack of fine tuning options. Even 32k tokens is hilariously inadequate for something that could realistically cover our business domain.

If things don't clear up soon, I might go a DIY path with a more open model. At least on this path, I don't have a dark cloud of deprecation over my head.


What is your business domain?


We develop, configure and manage software products for US financial institutions.

The amount of information describing just one of our customers' needs would easily overrun any practical context window. Also, the cost of running maxed-out prompts 24/7 is definitely untenable at our scale per current OpenAI pricing models.

It's not like we need a ridiculous amount of context, but 32k tokens is definitely not enough. We are currently experimenting with fine-tunes of davinci & curie. Already seeing good progress with training runs as small as 200k tokens. We are going to take this up 2 orders of magnitude over the next few weeks.


Isn’t the embed api meant to solve the issue of providing the model with access to a knowledge base?


I still have zero clue how the embed API is actually meant to be used. The documentation leaves a lot to be desired.

The lack of definition around use cases is making me think I got distracted with shiny bullshit again. The python notebooks with the olympic use case was difficult to follow without some abstract description/diagram/overview of how it all fits together.


In my opinion you should migrate to ChatGPT 3.5 turbo which is superior for many tasks cheaper than davinci-003


Its significantly worse than davinci-003 in maintaning a character or not doing its PC behaviour. I would migrate to Claude or Llama if chat was the only option.


The obvious play here is to impede competition with Copilot.


This timely announcement saved me a day. I was going to prototype something with Codex this week.


What are the alternatives? Is there any open source/free models that you can implement yourself?


was codex API used in GitHub copilot if so what are implications on Copilot as our team depends significantly on copilot for our work in neovim?


Microsoft owns github. Microsoft owns half of OpenAI. You’ll be okay.


seems like Codex api is used in Github copilot https://openai.com/blog/openai-codex ; from their page, we are seriously worried about losing the tab completions features of codex api we have come to rely on in the excellent Github Copilot for neovim in our team

>> Codex is the model that powers GitHub Copilot, which we built and launched in partnership with GitHub a month ago. Proficient in more than a dozen programming languages, Codex can now interpret simple commands in natural language and execute them on the user’s behalf—making it possible to build a natural language interface to existing applications. We are now inviting businesses and developers to build on top of OpenAI Codex through our API.


They are shutting down the free API, not copilot.


> we are seriously worried about losing the tab completions features of codex api

Not sure how to say this nicely, but if you rely on an AI so much for programming, maybe you are doing something horribly wrong.

Unless youre writing, say, Zig, LSP (e.g. native LSP with Mason, or coc-nvim) will give you as good of an autocomplete as you can get.


> Unless youre writing, say, Zig, LSP (e.g. native LSP with Mason, or coc-nvim) will give you as good of an autocomplete as you can get.

That's not true, Copilot can complete a whole function based on a comment by understanding the intent behind it, no LSP server can do that. On the other hand, Copilot can't do the kind of code analysis that an LSP server can, so you use them together, they don't really compete with each other


Yes, I get that, but "seriously worried" about not having that? Are they hiring monkeys?


If Copilot is making people 30% more productive then obviously they would be seriously worried that it might go away.

Reports vary massively on how useful Copilot is, some people say it makes them 3x as productive whereas others say it just gets in the way, I think 30% is a conservative estimate if you are working with the kind of code that Copilot is good for (front end JS/TS and backend python).


I would say for languages like bash haskell erlang 30% to 3x and doesn’t yet get in the way. It’s useful and acts as a muse when stuck with a cup-de-sac in terms of scarce human creativity


Imagine your laundry and dryer unit both break. Sure, you can wash everything by hand and hang dry, but I’m assuming most people are going to scramble to get it fixed instead of throwing up their hands and going back “to the old way”


Yes hoping what ($gpt-4) they replace the codex “obsolete” models to ease their compute needs is half as good as current codex


Ughh and they don't offer logprobs for the chat interface...


PSA: we will continue to support Codex access via our Researcher Access Program. If you are not already part of it, we encourage you to apply in order to maintain access to Codex. Looking forward to seeing what you are working on!

https://openai.com/form/researcher-access-program


I've become really used to Copilot. Hope this doesn't impact that. Also goes to show you the risks of building off other people's models.


With how fast everything is moving I wouldn't be surprised to see "run a codex like LLM model via torrent" in reaction to this.


That and comparable performance running locally. On r/MachineLearning there was just a post about running 7B Alpaca on Android mobile.


Good to see openAI is in fact still a software company.


Huh, how does GPT have the same features? Do you have to write natural language prompts to get code completion now?


GPT is just a very advanced text completion engine. If you put the start of some code in, or a function name it will try to complete it.


Nooooooooo code-davinci-002 was the best one! https://twitter.com/repligate/status/1618703361230139393


The drawbridges are being pulled up.

Deep learning and large language models grew through free and open source software tools, open-protocol networks and open-access research. The next revolution in computing is a reversion to the closed world of computing’s early days, when the number of machines could be counted on one hand.

A black box language model is the ultimate “binary blob.”


Realistically, it seems like the drawbridges will fall down even faster.

Open source models like Chinchilla are... not that far behind GPT-4.

It's questionable how much of a moat OpenAI will have in 5 years... Bard, Chinchilla, are all competitive.


> Open source models like Chinchilla

How can I download it? Honest question - a cursory search turned up nothing


Who is going to spend $1B+ on training and release an open model? That’s where we’re headed, with estimates of $250M for training GPT-4’s successor.


Chin is not OS, you mean Llama


It was not though. ChatGPT is more consistent and doesn't bug out. And now GPT-4 is better at code than code-davinci-002.


It's not the same thing, a chat format isn't very suitable for autocompletion for example


Yes it is, you just instruct it that it is how it is supposed to behave.


That might sort of work but it won’t be as good as how it already works


I cannot imagine building anything but toys using OpenAIs stack. Not your model, not your product.


Wasn't this a free beta offering?


Now there’s only a few more models left that support fine tuning? None of the new ones do.


Suspiciously discontinuing this after outages feels like it was a resource hog.


The outage was most likely due to a issue where you could inherit other users sessions and see their chats and whatnot. While that is unconfirmed, we know the outage was different from regular at-capacity outages because they intentionally killed their API (status code 404 on all api endpoints) and you couldn’t even bypass the capacity message by being a Plus subscriber.


I did actually get to see someone else's topics in the chat history sidebar yesterday, so I think you're right


what about the edit models? I use clippy-ai like every day :(. Sad to see these changes.

Also what happens to copilot?


I was told by OpenAI not to use the edit models - that the edit API is (effectively) deprecated. We were hitting a very low rate limit, so switched to using a completion model for the same thing (it's as good).


wait i also realise we wont have logprobs or fim?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: