OTel is flawed for sure, but I don't understand the stance against metrics and logs. Traces are inherently sampled unless you're lighting all your money on fire, or operating at so small a scale that these decisions have no real impact. There are kinds of metrics and logs which you always want to emit because they're mission-critical in some way. Is this a Sentry-specific thing? Does it just collapse these three kinds of information into a single thing called a "trace"?
I mean, when you're the one selling the gas to light that money on fire you have a vested interest in keeping it that way right?
I do agree that logging and spans are very similar, but I disagree that logs are just spans because they aren't exactly the same.
I also agree that you can collect all metrics from spans and, in fact, it might be a better way to tackle it. But it's just not feasible to do so monetarily so you do need to have some sort of collection step closer to the metric producers.
What I do agree with is that the terminology and the implementation of OTEL's SDK is incredibly confusing and hard to implement/keep up to date. I spent way too many hours of my career struggling with conflicting versions of OTEL so I know the pain and I desperately wish they would at least take to heart the idea of separating implementation from API.
> Traces are inherently sampled unless you're lighting all your money on fire
You can burn a lot of money with logs and metrics too. The question is how much value you get for the money you throw on the burning pile of monitoring. My personal belief is that well instrumented distributed tracing is more actionable than logs and metrics. Even if sampled.
I actually take the opposite approach. In my experience, well instrumented metrics and finely tuned logs are more actionable than distributed traces! Interesting how that works out.
I believe on the infrastructure side that might be correct. Within applications that doesn’t match my experience. In many cases the concurrent nature of servers makes it impossible to repro issues and narrow down the problem without tracing or trace aware logs.
With only sampled traces though it’s very hard to understand the impact of the problem. There are some bad traces but is it affecting 5%, 10% or 90% of your customers. Metrics shine there.
Whether it is affecting 5% or 10% of your customers, if it is erroring at that rate you are going to want to find the root cause ASAP. Traces let you do that, whereas the precise number does nothing. I am a big supporter of metrics but I don't see this as the use case at all.
(not your OP) This is true, but I find that metrics are useful whether something is going wrong or not (metrics that show 100% success are useful in determining baselines and what "normal" is), whereas collecting traces _when nothing is going wrong_ is not useful -- it's just taking up space and ingress, and thus costing me money.
My typical approach in the past has been to use metrics to determine when something is going wrong, then enable either tracing or logs (usually logs) to determine exactly what is breaking. For a dev or team that is highly connected to their software, simply knowing what was recently released is enough to zero in on problems without relying upon tracing.
Traces can be useful, but they're expensive relative to metrics, even if sampled at a very low rate.
Strange example, you'd think you want to fix this as quickly as humanly possible, no?
Also we don't sample traces, it's a fire hose of data aimed at the OTel collector. We do archive them / move them to colder and cheaper storage after a little time though, and we found that a viable money-saving strategy and a good balance overall.
How? Distributed traces show you what’s going on with a request in detail.
Metrics tell you zero detail, by definition. Logs show you whenever people have decided to log manually which is usually very incomplete
Hey, this isn't the sort of telemetry we are talking about with OTel.
About the only "privacy concern" with otel is that you are probably shipping traces/metrics to a cloud provider for your internal applications. This isn't the sort of telemetry getting baked into ms or google that is used to try and identify personal aspects of individuals, this is data that tells you "Foo app is taking 300ms serving /bar which is unusual".
After I added OTel to an open source project I run, I spent a bit of time arguing with someone about telemetry - they kept saying they didn't opt in and that we need to inform our users about it, etc., and I kept saying no, that's not the same type of telemetry. I wonder how common this misconception is.
Client-side transport is pretty unusual with OTel. I think almost everybody is sending things from the server side, so I don’t think your concern is usually relevant.
I think you are more talking about RUM which isn't yet supported by OpenTelemetry. I think they are working on it.
I am not sure if it will support session replays like some vendors like Sentry or New Relic offer. Technically, I think session replays (rrweb etc) is pretty cool but as a web visitor I am not a fan.