Curious to learn more about your use case. If fine-tuning is only ineffective for your most complex queries (and presumably those are less frequent as well, since you mentioned you have few examples), then couldn't you use fine-tuning to handle the simpler queries (presumably the lion's share) and thus free up excess man hours to focus on the more complex queries? Is there any benefit to AI being able to answer 90% of queries vs 0%?