"We have gone to considerable lengths to confirm that the information sent to us is accurate. We compared the tax data in our possession to other sources of the same information wherever we could find them, some of which were public (a tax return for a candidate for national office), others of which were private. In every instance we were able to check — involving tax filings by more than 50 separate people — the details provided to ProPublica matched the information from other sources."
I find it unlikely that even a state actor would have access to literally all the same private data that ProPublica has acquired over the years, and that they'd know what data ProPublica has and what can be safely manipulated.
Let's say a state actor had access to a whole pile of tax returns and wanted to manipulate them to change the conclusions ProPublica would draw. The state actor changes half the data points. Let's say ProPublica was able to check an average of 3 data points on those 50 individuals they reviewed. The data point could have either been manipulates or untouched. I'd model this like a coin flip and say that if ProPublica didn't find the manipulation after checking 150 data points, it's like flipping 150 heads in a row or 2^150 or basically impossible.
Firstly we have no idea how many of the 50 individuals data were public or not. All public data can be discounted since the state actor can just copy it.
Secondly, for the private data, the definition of ‘private’ is unspecified. It really just means not part of a published record. If propublica has access to it, then why couldn’t someone else?
I agree that if there were 150 separate sources with data not disclosed anywhere else, it would be impossible to guess.
But that’s just a made up scenario.
There could be many correct records that are public, and one or two that are private but available to (or even provided through another channel of) the state actor.
As long as the fake records are not part of the public or private data propublica already has, there would be no way to verify them.
This of course assumes that propublica’s list of records itself is kept securely.
I’m assuming good faith on ProPublica’s part, that a reasonable amount of the data was private and that it was truly private. If I didn’t trust them I wouldn’t read their reporting.
This is a straw man. There is no reason they would need ‘literally all’ of Propublica’s data. We don’t know how many data points were verified, but it need not be many.
>In every instance we were able to check — involving tax filings by more than 50 separate people — the details provided to ProPublica matched the information from other sources.
So we know it was at least 50 data points and not all of them were public. It was likely many hundreds of data points since it would be trivial to check more than one number if you already had 2 copies of a tax return pulled up.
If we take ProPublica's words to be accurate, then how would a state actor know exactly which 50 individuals ProPublica would have access to given that they would have a vast network of contacts and can and did ask the individuals involved to review the information they received and point out any inaccuracies.
Either these are real tax returns, ProPublica is lying or the state actor has a crystal ball.
Still a straw man - how many of these 50 were public? It’s only the private ones that matter.
> It was likely many hundreds of data points since it would be trivial to check more than one number if you already had 2 copies of a tax return pulled up.
How is this relevant? Multiple points from public sources don’t show anything.
The only thing that matters is the number of sources who are both independent and private.
All an attacker would need to do is have access to a few of these private records and they could make their leak look genuine.
"We have gone to considerable lengths to confirm that the information sent to us is accurate. We compared the tax data in our possession to other sources of the same information wherever we could find them, some of which were public (a tax return for a candidate for national office), others of which were private. In every instance we were able to check — involving tax filings by more than 50 separate people — the details provided to ProPublica matched the information from other sources."