This is a fascinating discussion, to which I have little to add, except this. Quoting the article (including the footnote):
> [I]f you carefully craft random data so that it does not contain a Dunning-Kruger effect, you will still find the effect. The reason turns out to be embarrassingly simple: the Dunning-Kruger effect has nothing to do with human psychology[1].
> [1]: The Dunning-Kruger effect tells us nothing about the people it purports to measure. But it does tell us about the psychology of social scientists, who apparently struggle with statistics.
It seems to me that despite rudely criticizing a broad swath of academics for their lack of statistical prowess, the author here is himself guilty of a cardinal statistical sin: accepting the null hypothesis.
The fact that data resemble a random simulation in which no effect exists does not disprove the existence of such an effect. In traditional statistical language, we might say such an effect is not statistically significant, but that is different from saying that the effect is absolutely and completely the result of a statistical artifact.
Later in the article the author points to an article which does systematically illustrate that the D-K effect is probably not real. They achieve this by using college education level as an independent proxy for test skill with the Y variable being an unrelated assessment of skill - self-assessment. So we can be pretty confident that the D-K effect is at least very small.
I'm probably nitpicking your language, but L1 regularization is precisely that: regularization. (See https://en.wikipedia.org/wiki/Regularization_(mathematics)#R....) In your typical linear regression setting, it does not replace the squared error loss but rather augments it. In regularized linear regression, for example, your loss function becomes a weighted sum of the usual squared error loss (aiming to minimize residuals/maximize model fit) and the norm of the vector of estimated coefficients (aiming to minimize model complexity).
Hey, I appreciate your correction! I wrote my comment late and night and definitely mashed the details. Your nice "nitpick" is a much needed correction to my inaccuracy.
Releasing any data or statistic based on sensitive data--even once--bears a privacy risk. The primary purpose of differential privacy is to quantify that risk, both for a single release of data and over many releases of data.
As for the number of analyses you can run, that depends on what you mean. You're right that differential privacy won't allow you to set up a database of _confidential data_ that can be arbitrarily queried infinitely many times with any meaningful privacy guarantee, but this is in no way unique to differential privacy.
What you can do with differential privacy is release noisy statistics once and let researchers use those statistics for arbitrarily many analyses. This is what the 2020 US Census is doing, for example.
One of the interesting insights in differential privacy is that to provide privacy protections that can't be reverse-engineered, the process has to be random rather than deterministic. The sort of algorithm that OP describes is really neat, but in addition to what dp_throw says, deterministic algorithms like this that choose how to anonymize things based on private data can reveal information about that private data in the very way that they format the final data. (This may be less relevant in the case at hand, but consider a setting where it would be sensitive to know if someone is in the database at all, e.g., a medical study.)
A few of these articles are suggesting that uploads be performed from public places (e.g., Starbucks) for the sake of anonymity/deniability. But it would seem that performing these actions in public would potentially reveal your identity, actions, and secret codename to any eyes or cameras around. As a question of general curiosity about anonymity, how does one weigh the benefits of using an open internet access point with the more literal visibility that using a public access point might entail?
When I wake my computer after a session of normal use and connect it to an open network, it immediately starts sending out messages to a bunch of different entities (google, dropbox, evernote, etc, etc) that can be rather easily traced back to that access point. IMHO this should be an even bigger concern than identifying a public wifi user through surveillance images, and it's a point that articles like this one routinely ignore, or gloss over:
> "Use as much caution and good sense as you can about distancing yourself from equipment and network locations you might be connected to."
Possibly an audience who has never heard of Tor before (the target for this piece) needs some more concrete advice about this than "use caution and good sense."
Hmm, yeah. The linked video from the Globe and Mail says to avoid surfing the web during the procedure, but avoiding explicitly going to Facebook and Twitter (their examples, IIRC) won't stop all identifiable network traffic from your device. I suppose that's where the suggestion of using a boot-from-USB OS might come in.
I consider myself to be quite savvy and I'd have zero confidence in my ability to reliably shut down all identifiable traffic, except by setting up a tool that blocks all traffic (like Little Snitch[1]) and then making an exception for Tor traffic.
And then there's the fact that most people's MAC addresses are ultimately tied to their identities in a way that a powerful actor could recover it without too much difficulty.
This comment likely also misses the point, but the zoning laws are not really intended to cover situations where the entire city is destroyed and rebuilt. Realistically, if the city were rebuilt from scratch today, much of what people love about it now would be gone regardless of zoning laws.
It seems like it's theoretically possible for residents and visitors in a given city to enjoy its current building stock while simultaneously desiring different standards for new construction. Whether or not that is the case in Somerville, I don't really know, but a statement that the city couldn't legally be built from scratch the same way today doesn't necessarily imply (in my mind) that the zoning laws are illegitimate.
I get that there are other issues here too, of course.
Yes, the fact that you couldn't raze and rebuild Somerville does imply that the existing zoning laws are illegitimate.
Proponents of zoning talk about things like reducing congestion and preserving the character of the neighborhood. If you can't exchange like for like these arguments fall flat. Changing the rules of the game after you've won, so to speak, smacks of crony capitalism. Cynically I believe that zoning laws primarily exist to extend property rights over land that a person is unwilling or unable to buy.
Just to be clear, I didn't mean my comment as a broad defense of zoning laws. Rather, I think it's only fair to consider that there's a very large difference between the relevance of zoning laws when the city is in its infancy and zoning laws when the city is pretty much entirely built up. When the city is, say, 97% built up, any zoning laws should reflect how its residents want the last 3% developed—if at all. I do realize it's more complicated than that (what about redevelopment, for example?), and I realize zoning is fraught with issues and controversy, but I don't think it logically follows that if present zoning laws in an older built-up area don't reflect the existing building stock, that the laws are illegitimate or corrupt. You may not like the zoning laws or the general concept of zoning (I'm not sure I do either, frankly), but that's not the same argument.
> It seems like it's theoretically possible for residents and visitors in a given city to enjoy its current building stock while simultaneously desiring different standards for new construction.
Is it? Can you explain why this would be so? Or even better, provide any evidence that this is true in any of the locations people have been discussing (Montreal, Somerville, Manhattan, Portland, etc.?)
Let's drill down and focus in particular on laws about multi-family dwellings. What logic would make them good when they already exist, but bad if you wanted to build more, and even bad if you wanted rebuild them after a natural disaster?
Talk to a typical owner of an old house, and you'll likely hear something about how they love its quirks. But would they build a house the same way today? Some probably would, but others would certainly not. This actually probably applies well outside of old homes too. I suspect something akin to the endowment effect (https://en.wikipedia.org/wiki/Endowment_effect) plays in here.
It also makes sense to me that someone would appreciate a historical landmark but place a lesser value on a modern replica of that landmark. Similarly, I could imagine a person admiring a historical neighborhood more than a modern recreation of it.
> Let's drill down and focus in particular on laws about multi-family dwellings. What logic would make them good when they already exist, but bad if you wanted to build more, and even bad if you wanted rebuild them after a natural disaster?
I'm not sure the answer needs to be logical (see above), but even if it must be, I don't know why placing limits on something implies that it was never actually good in the first place. Natural disasters are another story for sure, but I would imagine that sort of massive rebuilding was not the primary motivation for the present zoning laws in Somerville.
> What logic would make them good when they already exist, but bad if you wanted to build more
A duplex is split between two families. They both think multi-family housing is for college students, the universal bane of Massachusetts housing debates.
Or "Notification and Federal Employee Antidiscrimination and Retaliation Act".
Does legislation that can't be summarized in a clumsy acronym ever get passed? I can just imagine cabinet meetings: "Sure, world peace is a nice idea, but we can't think of terrible enough acronym for it, so we've decided against it."
> [I]f you carefully craft random data so that it does not contain a Dunning-Kruger effect, you will still find the effect. The reason turns out to be embarrassingly simple: the Dunning-Kruger effect has nothing to do with human psychology[1].
> [1]: The Dunning-Kruger effect tells us nothing about the people it purports to measure. But it does tell us about the psychology of social scientists, who apparently struggle with statistics.
It seems to me that despite rudely criticizing a broad swath of academics for their lack of statistical prowess, the author here is himself guilty of a cardinal statistical sin: accepting the null hypothesis.
The fact that data resemble a random simulation in which no effect exists does not disprove the existence of such an effect. In traditional statistical language, we might say such an effect is not statistically significant, but that is different from saying that the effect is absolutely and completely the result of a statistical artifact.
The nuance of statistics is never-ending.