Passing dieharder doesn't mean anything at all with respect to cryptographic security. It's trivial to define a random bit generator that passes randomness tests and has no real security.
I would like to learn more about practical cryptographic issues, and I need some help: what are the tests that can prove or disprove stronger guarantees for cryptographic security of a PRG than diehard? A wikipedia page doesn't give me much info about which one provides stronger guarantee and in which criteria:
There aren't such tests, at least not that work like dieharder. You analyze a CSPRNG the same way you'd analyze a cipher construction (they are essentially the same thing, and often we draw our conclusions about the strength of a CSPRNG by noticing that it's built on and thus inherits the formal security commitments of ciphers and hashes run in modes and constructions that themselves have been shown to be trustworthy).
There are no automated tests for this, since cryptographic randomness requires unpredictability. A statistical test can only tell you when a random number generator is broken, but no statistical test can tell you whether a random number generator is cryptographically sound.
And so the only test is to analyse all known knowledge for any method that is capable of predicting some portion of the random numbers. If none exists, the process is "random".
> what are the tests that can prove or disprove stronger guarantees for cryptographic security of a PRG than diehard?
None. The difference between a good CSPRNG and a broken one might not even be in the construction at all, but in who knows the seed. For example, a keystream generated using Chacha20 or AES-CTR makes for a good CSPRNG... except if the attacker knows the key.
The responses you've gotten so far are pretty bleak even if they are accurate. It's true that once you start mixing randomness, or get into algorithms the only thing statistical tests can really tell you is if it's broken.
Those tests can be used on raw sources to learn about the quality of those inputs. In this case applying those tests directly to the analogRead() on a specific source of hardware (your entire circuit and manufacturing process will effect this, and will even vary from board to board) can give you an estimate as to how much entropy you can expect from each call.
Understanding your where that entropy is coming from is significantly more important, gate voltage breakdown, fluctuations from the pins acting as antennas, in the current temperature and humidity is where analogRead() on a floating pin largely comes from. Other sources can be radioactive decay of particles, timing of events that are outside of the system (such as the time between a device being plugged in and the first time a person touches a key).
These all provide small amounts of entropy (except for radioactive decay, that's a really good one). The next step is mixing entropy. There is a lot of good math showing that with proper mixing, even adding known inputs from an attacker into an entropy pool doesn't decrease the entropy in the pool (it's no less random). If time isn't an issue you can add in a large number of readings from the same source, though sampling faster than the source changes won't get you anything.
That mixing allows you get to up to a minimum threshold of randomness (the seed) where you can use a cryptographically secure pseudorandom number generator (CSRNG). These also have proofs of a different type showing that input bits have an equal chance of modifying any bit of the output which can then be mixed back into the seed getting a very very large amount of effectively good randomness that can be used for keys and the like.
The trick here is that you're effectively at war with attackers, the more of your entropy sources an attacker can predict or control, the weaker your overall input to the CSRNG is going to be. If they can get this down to a small possibility space they can predict the input to the CSRNG and in turn fully predict its output which will reveal your keys.
If an attacker has a way to measure timings on the device a large number of times they may be able to infer the internal state of the system and once again get your keys.
So it's not really about the quality of that final output that is the problem and that's largely what people doing these projects analyze with these tests.
One final bit I'd like to cover. These tests can provide you some information about the final quality of the output (mostly whether it's broken or not) but even for that they're usually used incorrectly. If the CSRNG is implemented correctly but say you always seed it with the value "0", it will pass the tests with flying colors.
For devices like these they should be fully reset, have a small amount of randomness output, fully reset, sampled again... thousands to millions of times. This will help you determine if the range of possible inputs to the system is inherently flawed and most projects I've seen (including this one) don't seem to do that.