I totally agree that the *confirmed* case counts are way too low, because even m...

SpicyLemonZest · on April 26, 2020

It did a random sampling. We should do followups to screen off possible biases people have proposed, but stopping random people in the grocery store is by any reasonable standard randomization.

matwood · on April 26, 2020

No. It’s a random sampling of people who are out during lockdown. It can’t being extrapolated to the whole population when large parts are not leaving their homes.

SpicyLemonZest · on April 26, 2020

And if you go around to people's homes, you'll oversample people who aren't out during the lockdown. There's no silver bullet here.

matwood · on April 26, 2020

You do both. This is why study design matters. And it's one of the reasons all of the early antibody studies have issues (the other being test accuracy).

CydeWeys · on April 26, 2020

It's almost like you need to do your random sample based on a list of all residents, and not just go out and try to find people at various locations.

nradov · on April 26, 2020

There is no such list. No US state has a master list of all residents. The DMV has a fairly high percentage but even that tends to miss children, older people, undocumented immigrants, etc.

CydeWeys · on April 26, 2020

I would be willing to bet that if you combine all the different lists that New York State and its various agencies have (DMV, DOE, Department of Taxation and Finance, NYC ID, jury duty, voter registration, social services, etc.), that you would easily get >99% coverage of all people who've resided here for at least one year.

This would be a much better list to sample randomly from than "go to a grocery store and test everyone who walks in".

I should point out on /r/nyc, some local redditors saw the testing going on all week in the same location and posted about it, informing others. I suspect this led people who wanted a free test to actively seek them out, especially because it's so hard to get tested otherwise. I'm pretty sure I had it over a month ago and I still haven't gotten tested, so if I'd seen those posts in time I'd have headed over there to get tested myself. Point is, the sample is even further biased because word spread around and some number of people getting tested there were actively seeking it out for reasons.

nradov · on April 26, 2020

Now you're talking about a huge legal issue just to get access to the data, followed by a huge record linkage issue to remove duplicates. So with the time pressure involved, your proposal is so completely impractical as to be ridiculous.

CydeWeys · on April 26, 2020

These are all state agencies. They're already sharing data with each other anyway (e.g. the jury selection tool is getting feeds from many of these other sources).

What huge legal issues? This is all the government. Of course it has lists of all of its citizens, and can and does use said lists.

SpicyLemonZest · on April 26, 2020

You say "almost like", but scientific studies rarely sample the population this way because researchers generally don't have access to a list of all residents.

CydeWeys · on April 26, 2020

The state is running these studies. I guarantee you New York State has many good lists of people living in the state. Start with the jury duty list, for example. It pulls data from the DMV, voter registration files, state tax filers, non-driver's IDs such as NYC ID, and more. That covers all the adults. You can get a good list of adults residing in the state to pull your random sample from, and to include the children go get data from the school system and/or just test whatever children live with any given adult that you pick randomly.

joshuamorton · on April 26, 2020

Well, the type of information you're trying to gather is rather unique. Usually we just wait for a virus to run course then test lots of people to see the resulting case counts. But we can't do that here. Normally you just use a control and test group, but that doesn't work for figuring out underlying infection rates.

There are some tests trying to sample everyone in a geographic area (SF Mission census block) but the data isn't out yet because they're conducting tests as we speak.

SpicyLemonZest · on April 26, 2020

I guarantee you that when the data comes out:

* It will also show an undercount of at least an order of magnitude.

* Commenters will still pop up to explain why the results can't be trusted and which further studies are absolutely required before we believe them.

joshuamorton · on April 26, 2020

I guess we'll see. I don't share your certainty. Although I'd love to be able to go outside sooner.

Also elsewhere in this thread it's mentioned that the Florida and Santa Clara results could be entirely explained by high type II error in the test. The Florida test appeared to have a false positive rate of ~15% when independently validated, which is basically the infection rate they found. In other words, this is a specific form of base rate fallacy where the test accuracy is really low.

CydeWeys · on April 26, 2020

> It will also show an undercount of at least an order of magnitude.

It seems pretty likely that the data will come out showing at least 10%, so it's literally impossible for it to undercount by an order of magnitude.

How do you think a random sample of inhabitants would be off by a whole order of magnitude, anyway? Can you explain the mechanism whereby that might happen? The only thing that comes to mind would be using a worthless test with a 90+% false negative rate.

suzzer99 · on April 26, 2020

Does anyone not think case count is an order of magnitude less than infected count?

asdfasgasdgasdg · on April 26, 2020

That's definitely relevant to the question of why it's difficult to do such a study. However, it's not relevant to the question of whether such a study is necessary to make strong inferences about the population as a whole. The difficulty of making the right study does not change our ability to draw inferences from the wrong study. (We can't.)