So many reports like this, it's not a question of working out the kinks. Are we ...

randcraw · 2025-07-19T21:01:37 1752958897

Yeah, after daily working with AI for a decade in a domain where it _does_ work predictably and reliably (image analysis), I continue to be amazed how many of us continue to trust LLM-based text output as being useful. If any human source got their facts wrong this often, we'd surely dismiss them as a counterproductive imbecile.

Or elect them President.

locallost · 2025-07-19T21:15:31 1752959731

I am beginning to wonder why I use it, but the idea of it is so tempting. Try to google it and get stuck because it's difficult to find, or ask and get an instant response. It's not hard to guess which one is more inviting, but it ends up being a huge time sink anyway.

BobbyTables2 · 2025-07-19T21:04:55 1752959095

HAL 9000 in 2028!

trod1234 · 2025-07-19T21:13:11 1752959591

Regulation with active enforcement is the only civil way.

The whole point of regulation is for when the profit motive forces companies towards destructive ends for the majority of society. The companies are legally obligated to seek profit above all else, absent regulation.

Aurornis · 2025-07-19T22:16:06 1752963366

> Regulation with active enforcement is the only civil way.

What regulation? What enforcement?

These terms are useless without details. Are we going to fine LLM providers every time their output is wrong? That’s the kind of proposition that sounds good as a passing angry comment but obviously has zero chance of becoming a real regulation.

Any country who instituted a regulation like that would see all of the LLM advancements and research instantly leave and move to other countries. People who use LLMs would sign up for VPNs and carry on with their lives.

trod1234 · 2025-07-19T23:18:47 1752967127

Regulations exist to override profit motive when corporations are unable to police themselves.

Enforcement ensures accountability.

Fines don't do much in a fiat money-printing environment.

Enforcement is accountability, the kind that stakeholders pay attention to.

Something appropriate would be where if AI was used in a safety-critical or life-sustaining environment and harm or loss was caused; those who chose to use it are guilty until they prove they are innocent I think would be sufficient, not just civil but also criminal; where that person and decision must be documented ahead of time.

> Any country who instituted a regulation like that would see all of the LLM advances and research instantly leave and move to other countries.

This is fallacy. Its a spectrum, research would still occur, it would be tempered by the law and accountability, instead of the wild-west where its much more profitable to destroy everything through chaos. Chaos is quite profitable until it spread systemically and ends everything.

AI integration at a point where it can impact the operation of nuclear power plants through interference (perceptual or otherwise) is just asking for a short path to extinction.

Its quite reasonable that the needs for national security trump private business making profit in a destructive way.

Ukv · 2025-07-20T01:25:43 1752974743

> Something appropriate would be where if AI was used in a safety-critical or life-sustaining environment and harm or loss was caused; those who chose to use it are guilty until they prove they are innocent I think would be sufficient, not just civil but also criminal

Would this guilty-until-proven-innocent rule apply also to non-ML code and manual decisions? If not, I feel it's kind of arbitrarily deterring certain approaches potentially at the cost of safety ("sure this CNN blows traditional methods out of the water in terms of accuracy, but the legal risk isn't worth it").

In most cases I think it'd make more sense to have fines and incentives for above-average and below-average incident rates (and liability for negligence in the worse cases), then let methods win/fail on their own merit.

trod1234 · 2025-07-20T03:09:31 1752980971

> Would this guilty-until-proven-innocent rule apply also to non-ML code and manual decisions?

I would say yes because the person deciding must be the one making the entire decision but there are many examples where someone might be paid to just rubberstamp decisions already made. Letting the person who decided to implement the solution off scot-free.

The mere presence of AI (anything based on underlying work of perceptrons) being used accompanied by a loss should prompt a thorough review which corporations currently are incapable of performing for themselves due to lack of consequences/accountability. Lack of disclosure, and the limits of current standing, is another issue that really requires this approach.

The problem of fines is that they don't provide the needed incentives to large entities as a result of money-printing through debt-issuance, or indirectly through government contracts. Its also far easier to employ corruption to work around the fine later for these entities as market leaders. We've seen this a number of times in various markets/sectors like JPM and the 10+ year silver price fixing scandal.

Merit of subjective rates isn't something that can be enforced, because it is so easily manipulated. Gross negligence already exists and occurs frighteningly common but never makes it to court because proof often requires showing standing to get discovery which isn't generally granted absent a smoking gun or the whim of a judge.

Bad things happen certainly where no one is at fault, but most business structure today is given far too much lee-way and have promoted the 3Ds. Its all about: deny, defend, depose.

Ukv · 2025-07-20T04:27:07 1752985627

> > Would this guilty-until-proven-innocent rule apply also to non-ML code and manual decisions?

> I would say yes [...]

So if you're a doctor making manual decisions about how to treat a patient, and some harm/loss occurs, you'd be criminally guilty-until-proven-innocent? I feel it should require evidence of negligence (or malice), and be done under standard innocent-until-proven-guilty rules.

> The mere presence of AI (anything based on underlying work of perceptrons) [...]

Why single out based on underlying technology? If for instance we're choosing a tumor detector, I'd claim what's relevant is "Method A has been tested to achieve 95% AUROC, method B has been tested to achieve 90% AUROC" - there shouldn't be an extra burden in the way of choosing method A.

And it may well be that the perceptron-based method is the one with lower AUROC - just that it should then be discouraged because it's worse than the other methods, not because a special case puts it at a unique legal disadvantage even when safer.

> The problem of fines is that they don't provide the needed incentives to large entities as a result of money-printing through debt-issuance, or indirectly through government contracts.

Large enough fines/rewards should provide large enough incentive (and there would still be liability for criminal negligence where there is sufficient evidence of criminal negligence). Those government contracts can also be conditioned on meeting certain safety standards.

> Merit of subjective rates isn't something that can be enforced

We can/do measure things like incident rates, and have government agencies that perform/require safety testing and can block products from market. Not always perfect, but seems better to me than the company just picking a scape-goat.

Jensson · 2025-07-20T07:39:33 1752997173

> So if you're a doctor making manual decisions about how to treat a patient, and some harm/loss occurs, you'd be criminally guilty-until-proven-innocent?

Yes, that proof is called a professional license, without that you are presumed guilty even if nothing goes wrong.

If we have licenses for AI and then require proof that the AI isn't tampered with for requests then that should be enough, don't you think? But currently its the wild west.

Ukv · 2025-07-20T13:50:17 1753019417

> Yes, that proof is called a professional license, without that you are presumed guilty even if nothing goes wrong.

A professional license is evidence against the offense of practicing without a license, and the burden of proof in such a case still rests on the prosecution to prove beyond reasonable doubt that you did practice without a license - you aren't presumed guilty.

Separately, what trod1234 was suggesting was being guilty-until-proven-innocent when harm occurs (with no indication that it'd only apply to licensed professions). I believe that's unjust, and that the suggestion stemmed mostly from animosity towards AI (maybe similar to "nurses administering vaccines should be liable for every side-effect") without consideration of impact.

> If we have licenses for AI and then require proof that the AI isn't tampered with for requests then that should be enough, don't you think?

Mandatory safety testing for safety-critical applications makes sense (and already occurs). It shouldn't be some rule specific to AI - I want to know that it performs adequately regardless of whether it's AI or a traditional algorithm or slime molds.

ViscountPenguin · 2025-07-20T02:22:21 1752978141

A very simple example would be a mandatory mechanism for correcting mistakes in prebaked LLM outputs, and an ability to opt out of things like Gemini AI Overview on pages about you. Regulation isn't all or nothing, viewing it like that is reductive.

weatherlite · 2025-07-20T04:20:54 1752985254

> Are we getting close to our very own Stop the Slop campaign?

I don't think so. We read about the handful of failures while there are billions of successful queries every day, in fact I think AI Overviews is sticky and here to stay.

mepiethree · 2025-07-20T13:29:45 1753018185

Are we sure these billions of queries are “successful” for the actual user journey? Maybe this is particular to my circle, but as the only “tech guy” most of my friends and family know, I am regularly asked if I know how to turn off Google AI overviews because many people find them to be garbage

gtsop · 2025-07-20T17:28:11 1753032491

Why on earth are you accepting his premise that there are billions of successful requests? I just asked chatgpt about query success rate and it replied (part):

"...Semantic Errors / Hallucinations On factual queries—especially legal ones—models hallucininate roughly 58–88% of the time

A journalism‑focused study found LLM-based search tools (e.g., ChatGPT Search, Perplexity, Grok) were incorrect in 60%+ of news‑related queries

Specialized legal AI tools (e.g., Lexis+, Westlaw) still showed error rates between 17% and 34%, despite being domain‑tuned "