Setting temperature to 0 does not make it completely deterministic, from their documentation:
> OpenAI models are non-deterministic, meaning that identical inputs can yield different outputs. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain.
My understanding of LLMs is sub-par at best, could someone explain where the randomness comes from in the event that the model temperature is 0?
I guess I was imagining that if temperature was 0, and the model was not being continuously trained, the weights wouldn’t change, and the output would be deterministic.
Is this a feature of LLMs more generally or has OpenAI more specifically introduced some other degree of randomness in their models?
It's not the LLM, but the hardware. GPU operations generally involve concurrency that makes them non-deterministic, unless you give up some speed to make them deterministic.
Specifically, as I ubderstand it, the accumulation of rounding errors differs with the order in which floating point values are completed and intermediate aggregates are calculated, unless you put wait conditions in so that the aggregation order is fixed even if the completion order varies, which reduces efficient use of available compute cores in exchange for determinism.
> OpenAI models are non-deterministic, meaning that identical inputs can yield different outputs. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain.