Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Evaluating the accuracy of a model is an unsolved problem with both formal and informal approaches.

Formally: we could ask multiple separate model/proof assistants to generate separate models from the same underlying specification, and then attempt to find discrepancies between their predicted results. This really just punts the responsibility: now we're relying on the accuracy of the abstract specification, rather than the model(s) automatically or manually produced from it. It's also not sound; it just allows us to feel more confident.

Informally: we can have a lot of different people look at the model very closely, and produce test vectors for the model based on human predictions of behavior.



Thank you. Is there any way to statistically evaluate a model? Something like testing a model.


You could generate a set of random vectors that span the entire input space, exercise the system with those vectors, and publish some sort of "accuracy" (e.g. generate random vectors through i.i.d. uniform R.V.s over the input space, evaluate f(input), and use the successes in a hierarchical binomial distribution). Remember that most of the time we try to verify programs by building a model to model _edge cases_; after all the "happy path" of the program is simple to test. Edge cases are, by their nature, rare occurrences. As a trivial example, think of a boolean function f(x, y) = x & y. f evaluates to 0 for every value of (x, y) except (1, 1). If we were to create a model of this function, f_model = 0, f_model would appear to evaluate similarly to f 75% of the time. With a sufficiently large input state space, it would be quite feasible to hide essential edge cases in very small tail probabilities (e.g. < 99.5%).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: