If we ask LLM to grade something, we must create a prompt with good instructions. Otherwise, we will have no idea what 0.5 means or whether it is given consistently.
(A rule of thumb: Is it likely that various people, not knowing the context of a given task, will give the same grade?)
The most robust approach is to ask to rank things within a task. That is, "given blog post titles, grade them according to (criteria)" rather than asking about each title separately.
If we ask LLM to grade something, we must create a prompt with good instructions. Otherwise, we will have no idea what 0.5 means or whether it is given consistently.
(A rule of thumb: Is it likely that various people, not knowing the context of a given task, will give the same grade?)
The most robust approach is to ask to rank things within a task. That is, "given blog post titles, grade them according to (criteria)" rather than asking about each title separately.