The whole problem is that they are not neutral. They token-complete based on the corpus that was fed into them and the dimensions that were extracted out of those corpuses and the curve-fitting done to those dimensions. Being "completely transparent" means exposing _all_ of that, but that's too large for anyone to reasonably understand without becoming an expert in that particular model.
And then we're right back to "trusting expert human beings" again.
Nothing is truly neutral. Humans all have a different corpus too. We roughly know what data has gone in, and what the RL process looks like, and how the models handle a given ethical situation.
With good prompting, the SOTA models already act in ways I think most reasonable people would agree with, and that's without trying to build this specifically for that use case.
And then we're right back to "trusting expert human beings" again.