> To defend against duplicate messages, we employ an embedded key/value store - namely, RocksDB. RocksDB contains an in-memory Bloom Filter that we use to check for, and ignore, previously applied messages.
Wait, what? Can anybody explain this part? I don't understand how you could use a bloom filter for this. A bloom filter will have false positives by definition. When a false positive is encountered, the message would be ignored even though it has not been previously applied? I'm guessing the explanation is missing something as this seems a very straight forward issue to miss.
We persist the IDs to disk with RocksDB itself when that happens, periodically pruning them away when the messages are complete. The Bloom filter is mostly an optimization - even though it does the job most of the time. You're right - we intentionally omitted further discussion of that piece.
Wait, what? Can anybody explain this part? I don't understand how you could use a bloom filter for this. A bloom filter will have false positives by definition. When a false positive is encountered, the message would be ignored even though it has not been previously applied? I'm guessing the explanation is missing something as this seems a very straight forward issue to miss.