Frequently data produced by hand (by developers writing the tests) or randomly generated (by something like quickcheck) does not have all the edge cases you would encounter in real data.
So there is some justification to having real data available - this is why in the finance space at least developer machines have relatively stringent restrictions placed on them compared to a lot of other organisations.