This experiment is way out of my depth, (and I don't know if that's going on here? <shrug>) but I think what you're describing is an interesting phenomenon, which I'll try to explain.
I think it's important to keep in mind is those cases certain models are being tested. In good science, you do experiments with certain models in mind, and of course associated models of your apparatus and machines themselves, and then you compare the results for consistency. You have to be very careful with any other additions in your statistical analysis that wasn't generated by the model, including the models of the machines (say some kind of 'epistemic uncertainty' or 'procedural uncertainty' or something like that) to correct after the fact, as I believe it potentially invalidates the base models by itself.
For example, say you measure gravity at sea level with 1 apparatus and report gravity is 9.9 +/- 0.1. Then you get a second apparatus, and measure 9.2 +/- 0.1 (i.e. something went wrong). The difference is significant. Then you realize there must be some error, so you add a 'experimental error parameter', which you can tune, and it has implications on both measurements: you adjust it until the uncertainties are compatible (which is to be expected from consistency of experiments), and arrive say at 9.9 +/- 0.6 and 9.2 +/- 0.6 for the first and second experiment. This new parameter clearly doesn't belong in the model, and there's no model for the parameter itself: there's no explanatory mechanism involved, only a new free parameter. Something you could say honestly, is that we known there's experimental error in one or both of the experiments, or the base models are significantly incorrect. But you can't say take an average of both results and say gravity is 9.55 +/- (..), because the existence of either experimental error or base model error (at least to a few sigma of certainty) invalidates this procedure -- that is, unless you just want a guess for some sort of immediate practical application and the "experimental error" is acceptable.
Another common and well known effect in experiments is knowing the result you want to get, and trying different "adjustments" or redoing analysis until (subconsciously or not) the analysis yields the result that agrees with previous observations. This has been reported by Feynman in his books. I believe some modern experiments shield against this by, among other ways, not seeing the results of an analysis until you're sure the experiments/analysis is good (so you can't fine tune the experiments analysis to get known results).
I think it's important to keep in mind is those cases certain models are being tested. In good science, you do experiments with certain models in mind, and of course associated models of your apparatus and machines themselves, and then you compare the results for consistency. You have to be very careful with any other additions in your statistical analysis that wasn't generated by the model, including the models of the machines (say some kind of 'epistemic uncertainty' or 'procedural uncertainty' or something like that) to correct after the fact, as I believe it potentially invalidates the base models by itself.
For example, say you measure gravity at sea level with 1 apparatus and report gravity is 9.9 +/- 0.1. Then you get a second apparatus, and measure 9.2 +/- 0.1 (i.e. something went wrong). The difference is significant. Then you realize there must be some error, so you add a 'experimental error parameter', which you can tune, and it has implications on both measurements: you adjust it until the uncertainties are compatible (which is to be expected from consistency of experiments), and arrive say at 9.9 +/- 0.6 and 9.2 +/- 0.6 for the first and second experiment. This new parameter clearly doesn't belong in the model, and there's no model for the parameter itself: there's no explanatory mechanism involved, only a new free parameter. Something you could say honestly, is that we known there's experimental error in one or both of the experiments, or the base models are significantly incorrect. But you can't say take an average of both results and say gravity is 9.55 +/- (..), because the existence of either experimental error or base model error (at least to a few sigma of certainty) invalidates this procedure -- that is, unless you just want a guess for some sort of immediate practical application and the "experimental error" is acceptable.
Another common and well known effect in experiments is knowing the result you want to get, and trying different "adjustments" or redoing analysis until (subconsciously or not) the analysis yields the result that agrees with previous observations. This has been reported by Feynman in his books. I believe some modern experiments shield against this by, among other ways, not seeing the results of an analysis until you're sure the experiments/analysis is good (so you can't fine tune the experiments analysis to get known results).