Hacker News new | past | comments | ask | show | jobs | submit login

FYI for all, the W3C TraceContext specification will become a Proposed Recommendation later this week. I'm one of the co-chairs of the group and am happy to answer questions about our W3C work or OpenTelemetry.



Just curious, what was the rationale for randomizing the spanId at each hop? (As opposed to a more structured format that could let you track the request tree without relying on another field like timestamp)


Existing tracing systems (Dapper, Zipkin, Dynatrace, Stackdriver, etc.) already randomize with each hop, and there was a desire to be consistent with the models that they already used. It's also more straightforward to implement.

There's a discussion about "correlation context" inside of this W3C group called , which maps to what you're describing. It'd be worth reaching out to Sergey (one of the other co-chairs) if you want to find out more.


Timestamps across distributed systems don't work well as correlation tools as time tends not to be accurate enough to order application retries particularly but also fan out type requests. You really want parent / child or follows from relationships to collect and represent the graph correctly.

Source: Working on distributed tracing at Twilio and Stitch Fix




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: