NATS can saturate a network port with hundreds of thousands of messages/second with minimal impact on CPU, it scales linearly with cores, and it has minimal/zero effort fail-over. Whether across large geographic boundaries (ie: AWS cross-region) or in the same rack, NATS clustering is reliable and definitely speedy.
Well said; the other project involving NATS from the Rapidloop team (gRPC over NATS essentially) mentions some benefits as they see it which are what you've pointed out: https://github.com/rapidloop/nrpc
Ethernet or raw IP over TCP are slow (without some really creative hacks involving multiple TCP links) due to the double ACK problem. Adding UDP would probably not be hard.
I seem to recall other L2 overlay projects, UDP-based, that are both portable to other OS (subject to TAP requirement) and have no problem with multicast.
NATS is just a message broker, and runs wherever Go runs. The limitations are in this project. It's a 105-line proof of concept, clearly the author didn't bother making it portable.