*So for safety critical systems, one should not look or check if code has been A...

supriyo-biswas · 2025-07-01T05:10:16 1751346616

You do understand that LLM output is non-deterministic and tends to have a higher error ratio than compiler bugs, which do not exhibit this “feature”.

I see in one of your other posts that you were loudly grumbling about being downvoted. You may want to revisit if taking a combative, bad faith approach while replying to other people is really worth it.

CamperBob2 · 2025-07-01T15:00:21 1751382021

I see in one of your other posts that you were loudly grumbling about being downvoted. You may want to revisit if taking a combative, bad faith approach while replying to other people is really worth it.

(Shrug) Tool use is important. People who are better than you at using tools will outcompete you. That's not an opinion or "combative," whatever that means, just the way it works.

It's no skin off my nose either way, but HN is not a place where I like to see ignorant, ill-informed opinions paraded with pride.

rvz · 2025-07-01T15:37:21 1751384241

> If you don't review the code your C compiler generates now, why not?

That isn't a reason why you should NOT review AI-generated code. Even when comparing the two, a C compiler is far more deterministic in the code that it generates than LLMs, which are non-deterministic and unpredictable by design.

> Compiler bugs still happen, you know.

The whole point is 'verification' which is extremely important in compiler design and there exists a class of formally-verified compilers that are proven to not generate compiler bugs. There is no equivalent for LLMs.

In any case, you still NEED to check if the code's functionality matches the business requirements; AI-generated or not; especially in safety critical systems. Otherwise, it is considered as a logic bug in your implementation.

CamperBob2 · 2025-07-01T17:56:39 1751392599

If you can look at what's happening today, and imagine that code will still be generated the same way in 10-15 years as it is today, then your imagination beats mine.

99.9999% of code is not written with compilers that are "formally verified" as immune to code-generation bugs. It's not likely that any code that you and I run every day is.

rvz · 2025-07-01T20:42:42 1751402562

> 99.9999% of code is not written with compilers that are "formally verified" as immune to code-generation bugs.

Again, that isn't a reason to never check or write tests for your code because an "AI-generated it" or even assuming that an AI will detect all of them.

In fact, it means you NEED to do more reviewing, checking and testing than ever before.

> It's not likely that any code that you and I run every day is.

So millions of phones, cars, control systems, medical devices and planes in use today aren't running formally verified code every day?

Are you sure?

CamperBob2 · 2025-07-02T00:18:13 1751415493

Yes, I'm very sure. 99.9999% of the code you are running is not formally proven to be correct, and was not generated by a compiler whose output was formally proven to be correct.

Just curious, how much time have you spent in (a) industry, (b) a CS classroom, or (c) both?

rvz · 2025-07-03T03:26:04 1751513164

> Yes, I'm very sure.

You do understand that you are proving my entire point? It is still not a reason to *NOT* test or check your code implementation at all or to only rely on an LLM to check it for you.

What it really means is that software testing is extremely more important.

For running formally verified code every day, seL4 runs on the iPhone's security chip (secure enclave) in the hands of billions of users and it is a formally verified microkernel used for cryptographic operations from payments to disk encryption everyday.

This kernel is also used on medical devices, cars and in defense equipment, relied on by hundreds of millions of users.

> Just curious, how much time have you spent in (a) industry, (b) a CS classroom, or (c) both?

Lots of decades to know that no process developing safety critical system software would allow AI-generated code that isn't checked by a human or is only checked by other LLMs and using that as a substitute to writing tests.