I totally agree that the confirmed case counts are way too low, because even most people who were symptomatic weren't able to get tested (e.g. me), let alone random people who were asymptomatic.
But the 21% study is seriously flawed because it didn't do a random sampling of the population. We need that at a minimum to know with any certainty what the actual exposure rate is. The figures that are coming back from studies using random samples in other places have been much lower.
It did a random sampling. We should do followups to screen off possible biases people have proposed, but stopping random people in the grocery store is by any reasonable standard randomization.
No. It’s a random sampling of people who are out during lockdown. It can’t being extrapolated to the whole population when large parts are not leaving their homes.
You do both. This is why study design matters. And it's one of the reasons all of the early antibody studies have issues (the other being test accuracy).
There is no such list. No US state has a master list of all residents. The DMV has a fairly high percentage but even that tends to miss children, older people, undocumented immigrants, etc.
I would be willing to bet that if you combine all the different lists that New York State and its various agencies have (DMV, DOE, Department of Taxation and Finance, NYC ID, jury duty, voter registration, social services, etc.), that you would easily get >99% coverage of all people who've resided here for at least one year.
This would be a much better list to sample randomly from than "go to a grocery store and test everyone who walks in".
I should point out on /r/nyc, some local redditors saw the testing going on all week in the same location and posted about it, informing others. I suspect this led people who wanted a free test to actively seek them out, especially because it's so hard to get tested otherwise. I'm pretty sure I had it over a month ago and I still haven't gotten tested, so if I'd seen those posts in time I'd have headed over there to get tested myself. Point is, the sample is even further biased because word spread around and some number of people getting tested there were actively seeking it out for reasons.
Now you're talking about a huge legal issue just to get access to the data, followed by a huge record linkage issue to remove duplicates. So with the time pressure involved, your proposal is so completely impractical as to be ridiculous.
These are all state agencies. They're already sharing data with each other anyway (e.g. the jury selection tool is getting feeds from many of these other sources).
What huge legal issues? This is all the government. Of course it has lists of all of its citizens, and can and does use said lists.
You say "almost like", but scientific studies rarely sample the population this way because researchers generally don't have access to a list of all residents.
The state is running these studies. I guarantee you New York State has many good lists of people living in the state. Start with the jury duty list, for example. It pulls data from the DMV, voter registration files, state tax filers, non-driver's IDs such as NYC ID, and more. That covers all the adults. You can get a good list of adults residing in the state to pull your random sample from, and to include the children go get data from the school system and/or just test whatever children live with any given adult that you pick randomly.
Well, the type of information you're trying to gather is rather unique. Usually we just wait for a virus to run course then test lots of people to see the resulting case counts. But we can't do that here. Normally you just use a control and test group, but that doesn't work for figuring out underlying infection rates.
There are some tests trying to sample everyone in a geographic area (SF Mission census block) but the data isn't out yet because they're conducting tests as we speak.
I guess we'll see. I don't share your certainty. Although I'd love to be able to go outside sooner.
Also elsewhere in this thread it's mentioned that the Florida and Santa Clara results could be entirely explained by high type II error in the test. The Florida test appeared to have a false positive rate of ~15% when independently validated, which is basically the infection rate they found. In other words, this is a specific form of base rate fallacy where the test accuracy is really low.
> It will also show an undercount of at least an order of magnitude.
It seems pretty likely that the data will come out showing at least 10%, so it's literally impossible for it to undercount by an order of magnitude.
How do you think a random sample of inhabitants would be off by a whole order of magnitude, anyway? Can you explain the mechanism whereby that might happen? The only thing that comes to mind would be using a worthless test with a 90+% false negative rate.
That's definitely relevant to the question of why it's difficult to do such a study. However, it's not relevant to the question of whether such a study is necessary to make strong inferences about the population as a whole. The difficulty of making the right study does not change our ability to draw inferences from the wrong study. (We can't.)
But the 21% study is seriously flawed because it didn't do a random sampling of the population. We need that at a minimum to know with any certainty what the actual exposure rate is. The figures that are coming back from studies using random samples in other places have been much lower.