Besides complexity, a price you pay with multi-armed bandits is that you learn less about the non-optimal options (because as your confidence grows that an option is not the best, you run fewer samples through it). It turns out the people running these experiments are often not satisfied to learn "A is better than B." They want to know "A is 7% better than B," but a MAB system will only run enough B samples to make the first statement.