Didn't one of your benchmarks show that nodatacow on Btrfs resulted in a major performance improvement? But that might just show an issue with Btrfs's CoW implementation rather than CoW in general.
Yes, I've done some tests on BTRFS with nodatacow, and it improved the performance and behavior in general. Still slower tha XFS/EXT4, but better than ZFS (with "full" CoW).
But as you mention, that does not say anything about CoW filesystems in general. It merely hints the BTRFS implementation in not really optimized.
FWIW while I do a lot of benchmarks (both out of curiosity and as part of my job, when evaluating customer systems), I've learned to value stability and predictability over performance. That is, if the system is 20% slower, but provides stable and predictable behavior, it's probably OK. If you really need the extra 20% you can probably get that by adding a bit more hardware, and it's cheaper than switching filesystems etc. (Sure, if you have more such systems, that changes the formula.)
With EXT4/XFS/ZFS you can get that - predictable, stable performance. With BTRFS not so much, unfortunately.