Yeah often the best way to tackle exactly-once delivery to a database is a uniqueness constraint, but that isn't free – there's the index cost, additional write cost, and the error needs to be handled when it's thrown back to the client on a collision (something many applications don't handle well).
Stateful services are far harder to scale than stateless ones. Typically a stateless service can be scaled out horizontally with relative ease, but when there's state storage involved this becomes harder. You can scale vertically, but only so far. You can scale horizontally, but then typically need to introduce some sort of sharding/consistent hashing/etc to ensure that the service has a consistent view of the world without needing to connect to every database instance.
Not sure where the expectation of things being free comes from.
If your stating point is stateless then you can consider the tradeoff of introducing state vs. processing the same request multiple times.
It's probably more accurate to say that toy systems can maintain the illusion of exactly-once for a while, but it doesn't scale. You can't keep a record of every message ever seen in a message based system. Message passing systems exist to handle rates of traffic that cannot be achieved by rolling your own event queue as a thin wrapper around other tools like databases. It's not just the storage, it's the throughput.
The first time I encountered RabbitMQ it could only handle 60 messages per second with delivery guarantees. We already had our own bespoke system that used a database to handle a couple multiples of that. So we ended up limping along with what we had.
Yes, with uniqueness constraint.
the service is no longer stateless too, which means scalability is a problem
Do you have a specific problem in mind?