"He also claims he takes the fastest runtime of the queries over several runs, which is not how real tests are performed."
It isn't, but it should be. For any non-randomized run of the exact same work [1], by definition of non-randomized, any noise that appears in the results of multiple test runs can only be due to factors unrelated to the test being run, such as the OS decisions and such. Taking the minimum is the most honest way to factor those out. A possible exception for the first run, before CPU caches are warmed up.
[1]: Ponder that closely; a lot of the knee-jerk reactions to what I just said is handled by that; e.g., if the second time something is cached by the algorithm itself that was not cached the first time around, that's not the same work. Conceptually you can imagine that I'm shooting for the exact same sequence of CPU instructions being run on the exact same data values, though in practice that can be a tall bar to leap even before considering concurrency; as with almost everything in life if you look closely enough the boundaries get fuzzy.
It isn't, but it should be. For any non-randomized run of the exact same work [1], by definition of non-randomized, any noise that appears in the results of multiple test runs can only be due to factors unrelated to the test being run, such as the OS decisions and such. Taking the minimum is the most honest way to factor those out. A possible exception for the first run, before CPU caches are warmed up.
[1]: Ponder that closely; a lot of the knee-jerk reactions to what I just said is handled by that; e.g., if the second time something is cached by the algorithm itself that was not cached the first time around, that's not the same work. Conceptually you can imagine that I'm shooting for the exact same sequence of CPU instructions being run on the exact same data values, though in practice that can be a tall bar to leap even before considering concurrency; as with almost everything in life if you look closely enough the boundaries get fuzzy.