I don't know what you mean by "pure-AI systems." I work in this field and have many times implemented a review in the loop, or a route for review. It's an old technique, predating computers.
A "pure-AI system" is a fully-autonomous ML expert system. For example, a spam classifier. In these systems, humans are never brought into the loop at decision-making time — instead, the model makes a decision, acts, and then humans have to deal with the consequences of "dumb" actions (e.g. by looking through their spam folders for false positives) — acting later to reverse the model's action, rather than the model pausing to allow the human to subsume it. This later reversal ("mark as not spam") may train the model; but the model still did a dumb thing at the time, that may have had lasting consequences ("sorry, I didn't get your message, it went to spam") that could have been avoided if the model itself could "choose to not act", emitting a "NULL" result that would crash any workflow-engine it's embedded within unless it gets subsumed by a non-NULL decision further up the chain.
Yes, I'm certain that training ML models to separately classify low-confidence outputs, and getting a human in the loop to handle these cases, is a well-known technique in ML-participant business workflow engine design. But I'm not talking about ML-participant business workflow engine design; I'm talking about the lower-level of raw ML-model architecture. I'm talking about adversarial systems component design here: trying to create ML model architectures which assume the business-workflow-engine designer is an idiot or malfeasant, and which force the designer to do the right thing whether they like it or not. (Because, well, look at most existing workflow systems. Is this design technique really as "well-known" as you say? It's certainly not universal; let alone considered part of the Engineering "duty and responsibility" of Systems Engineers—the things they, as Engineers, have to check for in order to sign off on the system; the things they'd be considered malfeasant Engineers if they forget about.)
What I'm saying is that it would be sensible to have models for which it is impossible to ask them to make a purely enumerative classification with no option for "I don't know" or "this seems like an exceptional category that I recognize, but where I haven't been trained well-enough to know what answer I should give about it." Models that automatically train "I don't know" states into themselves — or rather, where every high-confidence output state of the system "evolves out of" a base "I don't know" state, such that not just weird input, but also weird combinations of normal input that were unseen in the training data, result in "I don't know." (This is unlike current ML linear approximators, where you'll never see a model that is high-confidence about all the individual elements of something, but low-confidence about the combination of those elements. Your spam filtering engine should be confused the first time it sees GTUBE and the hacked-in algorithmic part of it says "1.0 confidence, that's spam." It should be confused by its own confidence in the face of no individual elements firing. You should have to train it that that's an allowed thing to happen—because in almost all other situations where that would happen, it'd be a bug!)
Ideally, while I'm dreaming, the model itself would also have a sort of online pseudo-training where it is fed back the business-workflow process result of its outputs — not to learn from them, but rather to act as a self-check on the higher-level workflow process (like line-of-duty humans do!) where the model would "get upset" and refuse to operate further, if the higher-level process is treating the model's "I don't know" signals no differently than its high-confidence signals (i.e. if it's bucketing "I don't know" as if it meant the same as some specific category, 100% of the time.) Essentially, where the component-as-employee would "file a grievance" with the system. The idea would be that a systems designer literally could not create a workflow with such models as components, but avoid having an "exceptional situation handling" decision-maker component (whether that be a human, or another AI with different knowledge); just like the systems designer of a factory that employs real humans wouldn't be able to tell the humans to "shut up and do their jobs" with no ability to report exceptional cases to a supervisor, without that becoming a grievance.
When designing a system with humans as components, you're forced to take into account that the humans won't do their jobs unless they can bubble up issues. Ideally, IMHO, ML models for use in business-process workflow automation would have the same property. You shouldn't be able to tell the model to "shut up and decide."
(And while a systems designer could be bullheaded and just switch to a simpler ML architecture that never "refuses to decide", if we had these hypothetical "moody" ML models, we could always then do what we do for civil engineering: building codes, government inspectors, etc. It's hard/impractical to check a whole business rules engine for exhaustive human-in-the-loop conditions; but it's easy/practical enough to just check that all the ML models in the system have architectures that force human-in-the-loop conditions.)
> humans have to deal with the consequences of "dumb" actions (e.g. by looking through their spam folders for false positives)
Email programs generally have a mechanism for reviewing email and changing the classification. I think your "pure-AI" phrase describes a system that doesn't have any mechanism for reviewing and adjusting the machine's classification. The fact that a spam message winds up in your inbox sometimes is probably that low-confidence human-in-the-loop process we've been talking about. I'm sure that the system errs on the side of classifying spam as ham, because the reverse is much worse. Why have two different interfaces for reading emails, one for reading known-ham and one for reviewing suspected-spam, when you can combine the two seamlessly?
Perhaps you've confused bad user interface decisions for bad machine learning system decisions. I'd like to see some kind of likelihood-spam indicator (which the ML system undoubtedly reports) rather than a binary spam-or-not, but the interface designer chose to arbitrarily threshold. I think in this case you should blame the user interface designer for thinking that people are stupid and can't handle non-binary classifications. We're all hip to "they" these days.
https://en.m.wikipedia.org/wiki/Dead_letter_mail