in a world where you have many options and have to figure out which is best by r...

in a world where you have many options and have to figure out which is best by repeated experimentation, but where experimentation itself has some cost, you have a multi-armed bandit problem. (the name is supposed to evoke a room full of slot machines -- you want to find the one with the highest payouts by repeatedly playing them, while losing as little money as possible before you find it.)

for example, if you have a few medications, you might start by trying them all equally at random and then as data comes in, use a bandit algorithm to gradually shift more and more new patients onto the ones that prove most effective, in a way that optimally trades off accurately estimating the effects with wasting time testing the less effective drugs.

interestingly, the first formulation of the problem is due to Dr. Thompson at the Yale Pathology Department in the 1930s; he came up with Thompson sampling. So these are techniques that were originally designed for medical trials.

I think that designers of medical trials probably do have a good grasp of this stuff (some statistical estimators that originated in the medical world have even been successfully imported into reinforcement learning/MAB research) so probably they would be using a bandit-like technique if they felt it made sense.