Personally I prefer my agents not to run random commands on my machine without me telling them to first.
Imagine you just cloned some random project from GitHub and fired up Claude Code in that folder, but it turned out to be malicious and running 'npm test' stole all your files.
Tests have dependencies. Crawling all of those dependencies to check for malicious code could require inspecting millions of lines of code, if you could even obtain the code.
It's also beginning to sound like needing to solve the halting problem.
Look, I know you have a lot invested in this project but I don't see why you think it is somehow unreasonable to expect an AI agent to run tests in a repository. You don't need super intelligence for that.