Data is verified probabilistically on the Sia network. The blockchain has access to the Merkle root of the data that the host is supposed to be storing. The blockchain will request that the host provide a 64 byte segment of the data (chosen randomly) along with a Merkle proof that the data is part of the Merkle root.
If the host can provide the data and the proof, the host is rewarded as though they've demonstrated that they have all of the data. If the host cannot provide those 64 bytes along with a proof, the host is punished as though they are not storing any of the data.
How does punishment work? What stops one bad actor from agreeing to collect infinite data from a variety of sources and tanking both the trust and profitability of data hosts?
Also what about bandwidth constraints on the host end
When a host agrees to accept data, they put up out-of-pocket money. This makes it expensive for a bad actor to accept an infinite amount of data, as each piece of data requires more collateral to be put forward by the host.
Before a renter creates a contract with a host, the renter will perform some measurements on the host and determine if the host is suitable. A renter in China will chose different hosts than a renter in the US, because the latencies and throughputs of each host will be different.
Is there somewhere I can read about the punishments in more detail? E.g. how often the quizzes are, what the penalty is for getting it wrong / not being available for the answer?
If I recall correctly, the client pre-computes hashes of random fragments of each file then "quizzes" the hosts on that information periodically. If they're not actually storing a copy of the data, they won't be able to compute the resulting hash. (I may be misremembering some of the details, but I believe that's how it works in principle.)