The side-by-side comparisons are not a good signal because the models vary acros...

The side-by-side comparisons are not a good signal because the models vary across multiple dimensions, but the user isn't given the option to indicate the dimension on which they're scoring the model.

The recent side-by-side comparisons presented a more accurate model that communicates poorly vs a less accurate model with slightly better communication.