I don’t know if this is how it still works, but early attempts were modeled as classification problems with hundreds of hand picked completions. Can’t predict something really bad if it isn’t in your prediction list. This limits the surface of bad things to cases of tone mismatch like “sounds great” when talking about someone grieving a loss or something.