Bandit algorithms aren't different only because they have multiple arms. They also follow the least possible amount of harm by updating the conclusions in a Bayesian style during the entire experiment, not only on the end.
They also can stand more arms appearing during the experiment after people know more.
* standard treatment + placebo
* standard treatment + some drug
* standard treatment + double amount of drug
and other combinations, like using two drugs in one arm.