"Multi armed bandit methods work best with immediate success-fail metrics. This one has time delays."
Well, sure, but everything works best with immediate success-fail metrics. That's one of the most basic results from learning theory is that the longer the latency between stimulus and response the slower the learning rate can be. I'm not sure how multi-armed bandit is special in this regard in any particular dimension. All learning techniques are going to be susceptible to the problem you outline in your second paragraph.
This is one of those "there is no perfect solution" situations. It's really easy to say that out loud. It's quite difficult to internalize it.
(Also, just as a note to your other post, bear in mind that our hard-core "social distancing" efforts in the US are just about to reach approx. 1 incubation period. It is only just this week that we're going to start seeing the results of that, and it'll phase in as slowly as our efforts 1-2 weeks ago did. My state just went to full lockdown today, though we've been on a looser lockdown for a week before that.)
Everything works better with immediate success/fail metrics. However the simplest approach is easiest to analyze, and is easiest to analyze after the fact in as many ways as you want. The more complex the decision making, the less we should be willing to put it under the control of a computer program. (Unless that program has been well-studied for our exact problem so that we trust it more.)
Which medicine looks effective? Which medicine gets people out of the hospital faster? What underlying conditions interacted badly with given medicines? These questions do not have to be asked up front. But they can be answered afterwards. And knowing the answers, matters.
Here is an example. Suppose that we find one medication that gets people out of bed faster but kills some. In areas with overwhelmed hospitals, cycling people through the bed may save net lives. If your hospital is not overwhelmed, you wouldn't want to give that medicine. Now I'm not saying that any of these medicines will come to a conclusion like that. But they could. And if one did, I definitely want human judgement to be applied about when to use it
I don't think anyone is proposing actually removing all humans from the loop, so I think that's an argument against a strawman.
Even if they were proposing it, there's no realistic chance of it happening.
I don't want people blindly copying "standard" scientific procedures either, where we run high-stastistical-power studies for months with double-blind scenarios then carefully peer-review it and come up with some result somewhere in 2022.
So, hopefully there will be blinded researchers who analyse the data.
They'll probably use sequential stopping rules to take samples of incoming data.
If one of the treatments works much much better, then they'll almost certainly recommend that (but doctors will probably figure this out first, anyway).
Well, sure, but everything works best with immediate success-fail metrics. That's one of the most basic results from learning theory is that the longer the latency between stimulus and response the slower the learning rate can be. I'm not sure how multi-armed bandit is special in this regard in any particular dimension. All learning techniques are going to be susceptible to the problem you outline in your second paragraph.
This is one of those "there is no perfect solution" situations. It's really easy to say that out loud. It's quite difficult to internalize it.
(Also, just as a note to your other post, bear in mind that our hard-core "social distancing" efforts in the US are just about to reach approx. 1 incubation period. It is only just this week that we're going to start seeing the results of that, and it'll phase in as slowly as our efforts 1-2 weeks ago did. My state just went to full lockdown today, though we've been on a looser lockdown for a week before that.)