Thank you. Is there any way to statistically evaluate a model? Something like te...

Karrot_Kream · on Jan 31, 2021

You could generate a set of random vectors that span the entire input space, exercise the system with those vectors, and publish some sort of "accuracy" (e.g. generate random vectors through i.i.d. uniform R.V.s over the input space, evaluate f(input), and use the successes in a hierarchical binomial distribution). Remember that most of the time we try to verify programs by building a model to model _edge cases_; after all the "happy path" of the program is simple to test. Edge cases are, by their nature, rare occurrences. As a trivial example, think of a boolean function f(x, y) = x & y. f evaluates to 0 for every value of (x, y) except (1, 1). If we were to create a model of this function, f_model = 0, f_model would appear to evaluate similarly to f 75% of the time. With a sufficiently large input state space, it would be quite feasible to hide essential edge cases in very small tail probabilities (e.g. < 99.5%).