I think "Explainable AI" is a related research direction, but perhaps not popula...

shawntan · on Feb 23, 2023

I think part of the issue is what level of explanation is satisfactory. We can explain how every linear transformation computes its output, but the sum of it is in many ways more than its parts.

Then there are efforts that look like this one: https://news.ycombinator.com/item?id=34821414 They go probing for specific capabilities of Transformers to figure out which cell fires under some specific stimulus. But think a little bit more about what people might want from explainability and you quickly find that something like this is insufficient.

There may be a tradeoff we're looking at where explainability (for some definition of it) will have to be exchanged for performance (under some set of tasks). You can build more interpretable models these days, but you usually pay for it in terms of how well you do on benchmarks.

behnamoh · on Feb 23, 2023

Impossible to explain the inner workings of GPT-3 without having access to the model and its weights. Does anyone know if any methods exist for this?

PeterisP · on Feb 23, 2023

Since it's impossible to run inference on the model without having access to the model and its weights, interpretable AI generally does assume that you have access to all of that. Otherwise, why you would want to try to explain the inner workings of something that you don't have and can't use?

IncRnd · on Feb 23, 2023

I asked ChatGPT for some in-depth source code that realistically mimics chatgpt. ChatGPT replied with various answers in python. I'm not sure any of them are correct, though.