FYI for all, the W3C TraceContext specification will become a Proposed Recommend...

Lx1oG-AWb6h_ZG0 · on Nov 13, 2019

Just curious, what was the rationale for randomizing the spanId at each hop? (As opposed to a more structured format that could let you track the request tree without relying on another field like timestamp)

mmclean · on Nov 13, 2019

Existing tracing systems (Dapper, Zipkin, Dynatrace, Stackdriver, etc.) already randomize with each hop, and there was a desire to be consistent with the models that they already used. It's also more straightforward to implement.

There's a discussion about "correlation context" inside of this W3C group called , which maps to what you're describing. It'd be worth reaching out to Sergey (one of the other co-chairs) if you want to find out more.

mentat · on Nov 13, 2019

Timestamps across distributed systems don't work well as correlation tools as time tends not to be accurate enough to order application retries particularly but also fan out type requests. You really want parent / child or follows from relationships to collect and represent the graph correctly.

Source: Working on distributed tracing at Twilio and Stitch Fix