Hacker News new | past | comments | ask | show | jobs | submit login

In a time-series datastore, you may have to replace a set of invalid/corrupt events within an index. Having IDs that are in some way deterministic from the source data, you are able to replace the invalid documents by ID by simply re-indexing that time period with your patch applied. This is the most simple and least risky solution, with minimal downtime

If the IDs are UUID, then the easiest way to fix the values is to drop the index and re-create it, making all of the other data in the index unavailable as it's being recreated.

The less-easy way with UUIDs is to select just the broken events, create new patched events, delete the old events, and insert the new ones in the right index. But you'd have to branch off of your regular indexing logic to do this, probably writing a separate script. Of course if you make a mistake, you may end up with either duplicate documents or loss of data, compounding the original problem.

So I agree, have IDs that are deterministic (that they can be recreated using some known formula and source data, for example: documenttype_externalid_timestamp).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: