Simulation proves only the theory - sampling and experimentation proves the implementation. Otherwise it might be entirely possible for the packaging process to fill an exact match at a far higher rate - and for the mix to NOT be random.
Right, so he didn't really prove the implementation, he'd need to repeat his experiment a number of times. Trivial in a simulation, less so if you have to buy and count them.
How would he know what distribution to use for the simulation without a sufficiently large dataset to use as an example? A simulation of a made-up system is pretty worthless.
But "fill bags with random colors" is already basing your simulation on the assumption that the colors are uniformly distributed. And in fact, that seems to be one of the things he found incorrect in his actual experiment.