Hacker News new | past | comments | ask | show | jobs | submit login

By backfilling I assume you mean retroactively applying labels to a time series? Can you link to the thread with more context if you have it around? As a prometheus user myself at a 100+ eng org company I've never had a use case for a flow that would require editing old timeseries apart from having to restore data after a crash



What I mean by that is, I have an exporter that (for example) might be reading a bunch of logs and exposing some inferred metrics information through the exporter. Two days later, I realize that there's some useful information from those logs or another log file, or hell, another server, that wasn't inferred. That way, in some monitoring analysis I'm running, both data points would be side by side. Prometheus doesn't allow for that, because opinions. Just a toy example, so please don't HN-penanticize it :)

Not sure why restoring data after a crash isn't a bigger problem in your eyes, though. In the context of fully understanding system stability through prometheus, that's a pretty giant gap in your post portem analysis without any way to correct it. It forces you into either using two tools to be able to do that sort of analysis, or accepting that you're always going to risk losing valuable data.


Gotcha, it seems like you want to run backtests from within Prometheus for richer post-hoc analysis. That sounds like a normal use case, I often do fairly non-trivial post hoc analysis in Prometheus (e.g. Z scores and stddevs of metrics) since the query language is quite well suited to it, so I can see why you want to leverage it more.

Re: restoration, we use facilities like EBS snapshots, so its separated from prometheus. It could theoretically be an issue for post mortem analysis but data loss is relatively rare so in practice the ability to run post mortems has never been affected. Generally we don't consider our metrics as "business-critical" as production data (e.g. we wouldn't consider the above acceptable for MySQL) so that's where the leniency comes in. I can see different orgs weighing this differently -- we're not really an "infrastructure" company so I think the tradeoff is right for us, but if I was at a PaaS or IaaS I would feel differently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: