We have always solved this by having a large dev database machine, periodically synced it from live, and connected to it from the dev machines. Hardware is extremely cheap, compared to dev time. (in the cases I've seen, it was infeasible to select a representative sample - we may need any of the database rows; approximating the live database from just a sample was not an option. I understand that your case would be okay with not having specific data on-hand, as long as the sample is representative of the full set?)
One of my old companies also did this. It worked ok most of the time but you just need to be that little bit more careful that your changes don't affect other's since it's a shared database.