I would go a step farther and say that uncontended shared_ptrs are typically fine. The problem comes in when you have shared_ptrs that are frequently copied in more than one thread. That requires synchronizing the shared state across cores, sometimes across NUMA nodes. That's slow.
The same thing goes for mutexes. Modern, uncontended mutexes are very fast. But they can start sucking resources when you're locking them from multiple threads at the same time.
The same thing goes for mutexes. Modern, uncontended mutexes are very fast. But they can start sucking resources when you're locking them from multiple threads at the same time.