Not sure if people picked up on it, but this is being powered by the unreleased ...

lordofgibbons · 2025-02-03T02:30:58 1738549858

> Which might explain why it leaps ahead in benchmarks considerably and aligns with the claims o3 is too expensive to release publicly

It's the only tool/system (I won't call it an LLM) in their released benchmarks that has access to tools and the web. So, I'd wager the performance gains are strictly due to that.

If an LLM (o3) is too expensive to be released to the public, why would you use it in a tool that has to make hundreds of inference calls to it to answer a single question? You'd use a much cheaper model. Most likely o3-mini or o1-mini combined with o4-mini for some tasks.

og_kalu · 2025-02-03T13:09:17 1738588157

>why would you use it in a tool that has to make hundreds of inference calls to it to answer a single question? You'd use a much cheaper model.

The same reason a lot of people switched to GPT-4 when it came out even though it was much more expensive than 3 - doesn't matter how cheap it is if it isn't good enough/much worse.

xbmcuser · 2025-02-03T00:31:33 1738542693

It was expensive as they wanted to charge more for it but deepseek has forced their hand

willy_k · 2025-02-03T01:29:53 1738546193

They’ve only released o3-mini, which is a powerful model but not the full o3 that is being claimed as too expensive to release. That being said, DeepSeek for sure forced their hand to release o3-mini to the public.

shawabawa3 · 2025-02-03T05:44:08 1738561448

o3 mini was previewed in December. Deepseek maybe made them release it a few weeks early but it was already on its way

sdesol · 2025-02-03T08:10:00 1738570200

I guess the question is, did DeepSeek force them to rethink pricing? It's crazy how much cheaper it (v3 and R1) is, but considering they (Deepseek) can't keep up with demand, the price is kind of moot right now. I really do hope they get the hardware to support the API again. The v3 and R1 models that are hosted by others are still cheap compared to the incumbents, but nothing can compete with DeepSeek on price and performance.

kandesbunzler · 2025-02-03T08:10:04 1738570204

no they didn't, this was literally all announced in December with a release date for January

Sparkyte · 2025-02-03T00:45:27 1738543527

Rightfully so, some models are getting super efficient.

bbor · 2025-02-03T00:43:31 1738543411

Interesting, thanks for highlighting! Did not pick up on that. Re:"leading", tho:

Effectiveness in this task environment is well beyond the specific model involved, no? Plus they'd be fools (IMHO) to only use one size of model for each step in a research task -- sure, o3 might be an advantage when synthesizing a final answer or choosing between conflicting sources, but there are many, many steps required to get to that point.

xendipity · 2025-02-03T04:47:23 1738558043

I don't believe we have any indication that the big offerings (claude.ai, Gemini, operator, tasks, canvas, chatgpt) use multiple models in one call (other than for different modalities like having Gemini create an image). It seems to actually be very difficult technically and I'm curious as to why.

I wonder how much of an impact our being still so early in the productization phase of this all is. Like it takes a ton of work and training and coordination to get multiple models synced up into an offering and I think the companies are still optimizing for getting new ideas out there rather truly optimizing them.

someothherguyy · 2025-02-03T07:59:57 1738569597

...or its all a farce, for now.

mistercheph · 2025-02-03T01:34:41 1738546481

I'm sure o3 will be a generation ahead of whatever deepseek, google and meta are doing today when it launches in 10 months, super impressive stuff.

petesergeant · 2025-02-03T02:53:59 1738551239

I’m not sure if you’re implying this subtly in your comment or not, as it’s early here, but it does of course need to be a generation ahead of what 10 months of their competitors moving forward have done too. Nobody is standing still

bruce511 · 2025-02-03T04:34:48 1738557288

I read a fair amount of sarcasm in the parent comment ;)

bitshiftfaced · 2025-02-03T04:05:27 1738555527

> but this is being powered by the unreleased o3 model

What makes you believe that?

_bin_ · 2025-02-03T04:15:09 1738556109

they explicitly stated it in the launch

bitshiftfaced · 2025-02-03T04:34:45 1738557285

The linked article says,

> Powered by a version of the upcoming OpenAI o3 model that’s optimized for web browsing and data analysis, it leverages reasoning to search, interpret, and analyze massive amounts of text, images, and PDFs on the internet, pivoting as needed in reaction to information it encounters.

If that's what you're referring to, then it doesn't seem that "explicit" to me. For example, how do we know that it doesn't use less thinking than o3-mini? Google's version of deep research uses their "not cutting edge version" 1.5 model, after all. Are you referring to something else?

golol · 2025-02-03T08:34:49 1738571689

o3-mini is not really "a version of the o3 model", it is a different model (less parameters). So their language strongly suggests, imo, that Deep Research is powered by a model with the same number of parameters as o3.

ai-christianson · 2025-02-03T00:49:32 1738543772

Has anyone here tried it out yet?

nycdatasci · 2025-02-03T04:08:39 1738555719

Pro user. No access like everyone else.

OpenAI is very much in an existential crisis and their poor execution is not helping their cause. Operator or “deep research” should be able to assume the role of a Pro user, run a quick test, and reliably report on whether this is working before the press release right?

maroonblazer · 2025-02-03T01:54:39 1738547679

Per the below, seems it's not available to many yet.

https://news.ycombinator.com/item?id=42913575