When the measurement is flawed you will not know if the therapy works. Sorry.
And the measurement is flawed! Actually, there is a crisis of reliability concerning modern research papers.
(probably this is the source that many try to build castle out of dough?...)
Flawed is not completely useless. Even a flawed p-hacked measurement can distinguish between big effects efficiently.
Plus even if all the previous measurements were totally useless, that doesn't mean we should just give up and stop trying to measure soft stuff.
No, it's quite the opposite. Instead of many small underpowered experiments and studies we should be spending on fewer but well designed and run larger ones. (Even if that's naturally harder.)
So are you arguing that useful measurements are impossible for specific topics or are you just stating that flawed research is less useful than it could be (which it obviously is)?
I can recommend having a look at the book „how to measure anything“ to get a sense of how to apply measurement techniques to „soft“ contexts.