a) Pick a single measure that is very very hard to game (life expectancy, for example). But that's non-trivial, even something like life expectancy can be gamed by deciding who's included in the stats, by changing the start date (e.g. for cancer diagnosis), by ignoring quality of life, etc.
b) Use a robust mix of different measures that is harder to game. Continuing the example of the above, one common metric is the "disability-adjusted life year", which combines quality of life, length of life, economic productivity, and subjective ratings of happiness
c) Socialize your workers to believe in "the cause", so they are more likely to do what you mean rather than what you measure (hard to do when you're not a governmental or a political organization). One of the big motives of early Soviet purges (back when they just involved kicking someone out of the Party) was to remove people for which this didn't work.
d) Not exactly "fair", but often effective - investigate and punish people when they do counterproductive things to optimize their stats. This was another reason for said early purges of the CPSU membership rolls.
In the (probably fictional) Russian Nail Factory example, the customer would reject the nails, if there was a real customer, not a planned economy.
External metrics are harder to game. For example, many car companies pay attention to IIHS safety statistics, JD Power Ratings, etc.
You still need internal metrics, but it's harder to have target drift when you also have external organizations measuring things.
In the NYPD example, you could imagine state and federal law enforcement oversight of the city. Or the city could measure civilian complaints. At some point you have to have good faith, just metrics don't do anything by themselves.