*> To defend against duplicate messages, we employ an embedded key/value store -... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

RyanZAG on Nov 12, 2015 | parent | context | favorite | on: Onyx 0.8.0: Automatic State Management

> To defend against duplicate messages, we employ an embedded key/value store - namely, RocksDB. RocksDB contains an in-memory Bloom Filter that we use to check for, and ignore, previously applied messages.

Wait, what? Can anybody explain this part? I don't understand how you could use a bloom filter for this. A bloom filter will have false positives by definition. When a false positive is encountered, the message would be ignored even though it has not been previously applied? I'm guessing the explanation is missing something as this seems a very straight forward issue to miss.

XPherior on Nov 12, 2015 [–]

We persist the IDs to disk with RocksDB itself when that happens, periodically pruning them away when the messages are complete. The Bloom filter is mostly an optimization - even though it does the job most of the time. You're right - we intentionally omitted further discussion of that piece.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact