Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This is not a problem in my unreliable calculator use-cases; are you disputing that or dropping the analogy?

If you use an unreliable calculator to sum a list of numbers, you then need to use a reliable method to sum the numbers to validate that the unreliable calculator's sum is correct or incorrect.



Yes, so in my first example in the GP, this happens first. Humans do the work. The calculator double checks and gives me a list of all errors plus 5% of the non-errors, and I only need to double check that list.

In my third example, the calculator does the hard work of dividing, and humans can validate by the simpler task of multiplication, only having to do extra work 5% of the time.

(In my second, the unreliablity is a trade-off against speed, and we need the speed more.)

In all cases, we benefit from the unreliable tool despite not knowing when it is unreliable.


In your first example, you appear to assume that for calculations where "each mistake could cost $millions or lives", engineers who calculated by hand typically didn't double-check by redoing the calculation, so a second check with a 95% accuracy tool is better than nothing. This assumption is false. I suggest you watch the 2016 film Hidden Figures to understand the level of safety at NASA when calculations were done by hand. You are suggesting lowering safety standards, not increasing them.

Your third example is unclear. No calculators can perform factoring of large numbers, because that is the expected ability of future quantum computers that can break RSA encryption. It is also unclear why multiplication and division have different difficulties, when dividing by n is equal to multiplying by 1/n.


> It is also unclear why multiplication and division have different difficulties, when dividing by n is equal to multiplying by 1/n.

Well sure, but once you multiply by 1/n you leave N (or Z) and enter Q, and I suspect that's what makes it more difficult because Q is just a much more complex structure because it formally consists of equivalence relations. In fact it's easy to divide an integer x by an integer y, it's just x/y ... the problem is that we usually want the fraction in lowest terms, though.


>you appear to assume that for calculations where "each mistake could cost $millions or lives", engineers who calculated by hand typically didn't double-check by redoing the calculation

Not at all! For any n extra checks, having an n+1 phase that takes a 20th of the effort is beneficial. I did include triple-checks to gesture at this.

>It is also unclear why multiplication and division have different difficulties, when dividing by n is equal to multiplying by 1/n.

This actually fascinates me. Computers and human both take longer to divide than to multiply (in computers, by roughly an order of magnitude!) I'm not really sure why this is in a fundamental information theory kind of way, but it being true in humans is sufficient to make my point.

To address your specific criticism: you haven't factored out the division there, you've just changed the numerator to 1. I'd much rather do 34/17 in my head than 34 * (1/17).


I'd like to second the point made to you in this thread that went without reply: https://news.ycombinator.com/item?id=43702895

It's true that we use tools with uncertainty all the time, in many domains. But crucially that uncertainty is carefully modeled and accounted for.

For example, robots use sensors to make sense of the world around them. These sensors are not 100% accurate, and therefore if the robots rely on these sensors to be correct, they will fail.

So roboticists characterize and calibrate sensors. They attempt to understand how and why they fail, and under what conditions. Then they attempt to cover blind spots by using orthogonal sensing methods. Then they fuse these desperate data into a single belief of the robot's state, which include an estimate of its posterior uncertainty. Accounting for this uncertainty in this way is what keeps planes in the sky, boats afloat, and driverless cars on course.

With LLMs It seems like we are happy to just throw out all this uncertainty modeling and to leave it up to chance. To draw an analogy to robotics, what we should be doing is taking the output from many LLMs, characterizing how wrong they are, and fusing them into a final result, which is provided to the user with a level of confidence attached. Now that is something I can use in an engineering pipeline. That is something that can be used as a foundation to something bigger.


>went without reply

Yeah, I was getting a little self-conscious about replying to everyone and repeating myself a lot. It felt like too much noise.

But my first objection here is to repeat myself- none of my examples are sensitive to this problem. I don't need to understand what conditions cause the calculator/IDE/medical test/LLM to fail in order to benefit from a 95% success rate.

If I write a piece of code, I try to understand what it does and how it impacts the rest of the app with high confidence. I'm still going to run the unit test suite even if it has low coverage, and even if I have no idea what the tests actually measure. My confidence in my changes will go up if the tests pass.

This is one use of LLMs for me. I can refactor a piece of code and then send ChatGPT the before and after and ask "Do these do the same thing". I'm already highly confident that they do, but a yes from the AI means I can be more confident. If I get a no, I can read its explanation and agree or disagree. I'm sure it can get this wrong (though it hasn't after n~=100), but that's no reason to abandon this near-instantaneous, mostly accurate double-check. Nor would I give up on unit testing because somebody wrote a test of implementation details that failed after a trivial refactor.

I agree totally that having a good model of LLM uncertainty would make them orders of magnitude better (as would, obviously, removing the uncertainty altogether). And I wouldn't put them in a pipeline or behind a support desk. But I can and do use them for great benefit every day, and I have no idea why I should prefer to throw away the useful thing I have because it's imperfect.


> none of my examples are sensitive to this problem.

That's not true. You absolutely have to understand those conditions because when you try to use those things outside of their operating ranges, they fail at a higher than the nominal rate.

> I'm still going to run the unit test suite even if it has low coverage, and even if I have no idea what the tests actually measure. My confidence in my changes will go up if the tests pass.

Right, your confidence goes up because you know that if the test passes, that means the test passed. But if the test suite can probabilistically pass even though some or all of the tests actually fail, then you will have to fall back to the notions of systematic risk management in my last post.

> I can refactor a piece of code and then send ChatGPT the before and after and ask "Do these do the same thing". I'm already highly confident that they do, but a yes from the AI means I can be more confident. If I get a no, I can read its explanation and agree or disagree. I'm sure it can get this wrong (though it hasn't after n~=100)

This n is very very small for you to be confident the behavior is as consistent as you expect. In fact, it gets this wrong all the time. I use AI in a class environment so I see n=100 on a single day. When you get to n~1k+ you see all of these problems where it says things are one way but really thing are another.

> mostly accurate double-check

And that's the problem right there. You can say "mostly accurate" but you really have no basis to assert this, past your own experience. And even if it's true, we still need to understand how wrong it can be, because mostly accurate with a wild variance is still highly problematic.

> But I can and do use them for great benefit every day, and I have no idea why I should prefer to throw away the useful thing I have because it's imperfect.

Sure, they can be beneficial. And yes, we shouldn't throw them out. But that wasn't my original point, I wasn't suggesting that. What I had said was that they cannot be relied on, and you seem to agree with me in that.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: