IIRC there was also a paper analyzing how often results in some NLP conference held up when a different random seed or hyperparameters were used. It was quite depressing.
In topics where there is less reliance on relatively small numbers of cases (as is typical for medicine), there is also less reliance on marginal, but statistically "significant", findings.
So areas such as biochemistry, chemistry, even some animal studies, are less susceptible to over-interpretation or massaging of data.