You also have fine-tuned models for specific tasks that may see very similar inputs for a variety of outputs. Think an LLM trained on pulling out specific types of information, no matter where it was stored within the file. E.g. "find the date of the shipment for product# 5432" and then you pass in 10k json documents with a similar shape.
Yeah, but I was under the impression that for the same prompt, implementations are already share the KV cache. This area is so new so these obvious ideas might not get implemented as widely as I thought.