For some reason — no idea why — this problem went away when I set n_samples to 1 and scale to 10.0 or less. Why these parameters would impact memory usage, I don’t know, but the image quality seems fine, afaict.
n_samples is the batching number. Total memory used scales like "Model Mem Size + n_samples * Batch Mem Size". The memory needed for a batch is smaller than the model but not trivial.