Capturing a consistent snapshot of all goroutines requires stopping the world. However, this can be very quick as the GC relies on the same mechanism.
The bigger problem is capturing the stack traces for all goroutines. Rhys added a patch to Go 1.19 [1] that mostly moves this work outside of the critical STW section, which greatly reduces the overhead. Unfortunately this improvement only applies to the official goroutine profiling APIs, and those do not provide details such as goroutine ids. This means fgtrace has to use runtime.Stack() which returns the stack traces as text (yikes) and isn't optimized like the other goroutine profiling APIs.
There are various ways the implementation details of fgtrace and the Go runtime could be improved for this use case (wallclock timeline views), and I'm hoping to work on contributions in the coming months.
The bigger problem is capturing the stack traces for all goroutines. Rhys added a patch to Go 1.19 [1] that mostly moves this work outside of the critical STW section, which greatly reduces the overhead. Unfortunately this improvement only applies to the official goroutine profiling APIs, and those do not provide details such as goroutine ids. This means fgtrace has to use runtime.Stack() which returns the stack traces as text (yikes) and isn't optimized like the other goroutine profiling APIs.
There are various ways the implementation details of fgtrace and the Go runtime could be improved for this use case (wallclock timeline views), and I'm hoping to work on contributions in the coming months.
[1] https://go-review.googlesource.com/c/go/+/387415