Hacker Newsnew | past | comments | ask | show | jobs | submit | noahbp's commentslogin

>As you dig down you would get lighter and lighter on your feet.

Fun fact: Gravity doesn't decrease the entire way down! Only when you get to the core does it decrease monotonically: https://physics.stackexchange.com/questions/18446/how-does-g...


If you can't see it, you're likely on the free tier and using the latest mini model.


Not true. I've been a paid user forever and on the Android app they have definitely obscured the model selector. It's readily visible to me on desktop / desktop browser. But on the Android app the only place I can find it is if I click on an existing response already sent by chatGPT and then it gives me the option to re-generate the message with a different model.

And while I'm griping about their Android app, it's also very annoying to me that they got rid of the ability to do multiple, subsequent speech-to-text recordings within a single drafted message. You have to one-shot anything you want to say, which would be fine if their STT didn't sometimes failed after you've talked for two minutes. Awful UX. Most annoying is that it wasn't like that originally. They changed it to this antagonistic one-shot approach a several months ago, but then quickly switched back. But then they did it again a month or so ago and have been sticking with it. I just use the Android app less now.


Sounds like there are a lot of frustrations here but as a fellow android user just wanted to point out that you can tap the word ChatGPT in your chat (top left) and it opens the model selector.

Although if they replace it all with gpt5 then my comment will be irrelevant by tomorrow


On desktop at least the model selector only shows GPT-5 for me now, with Pro and Thinking under "Other Models" but no other options.


When you start a new conversation it says "chatGPT" at the top. Tap that to select a model.

For the multiple messages, I just use my keyboard's transcription instead of openai's.


>The logic above can support exactly the opposite conclusion: LLM can do dynamic typed language better since it does not need to solve type errors and save several context tokens.

If the goal is just to output code that does not show any linter errors, then yes, choose a dynamically typed language.

But for code that works at runtime? Types are a huge helper for humans and LLMs alike.


As long as at least half of the lines are well-written tests, this is more achievable than you'd think.


“In some regions”

Guatemala is the only place outside the US where I was not quoted 3-5x Uber prices by very pushy taxi drivers.

Taxi drivers are scam artists and thieves. There’s no reputational damage either, as you will never see them again.

Uber solves the reputation problem: every driver is rated, and poorly rated and badly behaved drivers do not get to work for them.


This would change if there wasn’t a culture of giving 5 stars to every driver. It started because Uber unfairly punished good drivers for very good but honest 4/5 reviews, and now every Uber driver who uses their phone while driving or has an interior smelling of cigarette smoke gets 5 stars out of obligation.


> I've tried:

> - Cursor (can't remember which model, the default)

> - Google's Jules

> - OpenAI Codex with o4

Cursor's "default model" rarely works for me. You have to choose one of the models yourself. Sonnet 4, Gemini 2.5 Pro, and for tricky problems, o3.

There is no public release of o4; you used o4-mini, a model with poorer performance than any of the frontier models (Sonnet 4, Gemini Pro 2.5, o3).

Jules and Codex, if they're like Claude Code, do not work well with "Build me a Facebook clone"-type instructions. You have to break everything down and make your own tech stack decisions, even if you use these tools to do so. Yes they are not perfect and make regressions or forget to run linters or check their work with the compiler, but they do work extremely well if you learn to use them, just like any other tool. They are not yet magic that works without you having to put in any effort to learn them.


What is your preferred static text embedding model?

For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?


It depends a bit on the task and language, but my go-to is usually minishlab/potion-base-8M for every task except retrieval (classification, clustering, etc). For retrieval minishlab/potion-retrieval-32M works best. If performance is critical minishlab/potion-base-32M is best, although it's a bit bigger (~100mb).

There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.


Thanks man this is incredible work, really appreciate the details you went into.

I've been chewing on if there was a miracle that could make embeddings 10x faster for my search app that uses minilmv3, sounds like there is :) I never would have dreamed. I'll definitely be trying potion-base in my library for Flutter x ONNX.

EDIT: I was thanking you for thorough benchmarking, then it dawned on me you were on the team that built the model - fantastic work, I can't wait to try this. And you already have ONNX!

EDIT2: Craziest demo I've seen in a while. I'm seeing 23x faster, after 10 minutes of work.


Thanks so much for the kind words, that's awesome to hear! If you have any ideas or requests, don't hesitate to reach out!


Onnxruntime supports CoreML, though if my experience with converting an embedding model to CoreML using Apple's CoreML conversion tool is similar to the ORT maintainers', I can see why it would be unmaintained.

It took multiple tries to get the model to convert at all to the mlpackage format, and then a lot of experimenting to get it to run on the ANE instead of the GPU, only to discover that constant reshaping was killing any performance benefit (either you have a fixed multiplication size or don't bother), and even at a fixed size and using the attention mask, its operations were slower than saturating the GPU with large batches.

I discovered an issue where using the newer iOS 18 standard would cause the model conversion to break, and put an issue in on their GitHub, including an example repository for easy replication. I got a response quickly, but almost a year later, the bug is still unfixed.

Even when George Hotz attempted to hack it to use it without Apple's really bad and unmaintained CoreML library, he gave up because it was impossible without breaking some pretty core OS features (certificate signing IIRC).

The ANE/CoreML is just not serious at all about making their hardware usable at all. Even Apple's internal MLX team can't crack that nut.


ONNX is horrible for anything that has variable input shapes and that is why nobody uses it for LLMs. It fundamentally is poorly designed for anything that doesn't take a fixed size image.


ANE itself is also limited to fixed computation "shapes" so I'm not sure how much that would matter practically.


I've never heard of this, and I'm pretty sure my coworkers haven't either. Thanks for mentioning it!

https://chatgpt.com/share/68104c37-b578-8003-8c4e-b0a4688206...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: