>A profiler can tell you when you should drop asynchronously Is there any profil...

MaulingMonkey · on May 31, 2020

> Is there any profiler that does this today?

It requires interpretation, but yes.

> What are the drawbacks with asynchronous drops?

You pay some overhead for enqueuing work for later. Taken too far, lock contention / false sharing can make performance way worse.

The allocators and destructors must be thread safe if offloading to a worker thread.

The core running your thread is likely to have some of this data in L1/L2/L3 cache, which might not be true for whatever core would deque the work for asyncronously dropping.

It can be harder to attribute the costs of dropping to the right code if it all gets mixed up into a single work queue cleaned up by opaque worker threads when profiling.

If you don't use some kind of backpressure mechanism, allocations can potentially outpace deallocations and run you out of memory.

----

So, concrete example: Using Telemetry - a flamegraph style realtime profiler requiring invasive annotations - I was able to track down the cause of a framerate hitch in a game I was working on, to the sudden release of several graphics resources in a game. During events which would significantly restyle the look of some of the terrain, we'd eat several 10s/100s of milliseconds of overhead freeing things - more than enough to cause us to miss vsync. Would've stuck out like a sore thumb in any profiler capable of giving you a rough idea of the stack(s) involved in a 100ms timeframe that you can correlate to a vsync miss / missed frames.

D3D9 isn't thread safe (although freeing resources might've been?), but I didn't need to offload the work onto another thread just to amortize the cost over a few frames. Instead, a simple work queue did the trick. Problem solved! New problem: level transitions took significantly longer when doing mass frees of the same resources - more than doubling the cost of deallocation IIRC, for reasons I never did fully understand. Cache thrashing of some sort? We were still maxing out the core running the main thread with mostly cleanup logic...

Final code we shipped with used a hybrid solution that would choose between syncronous (high-throughput) and asyncronous (non-stalling) cleanup logic depending on what was happening in-game. Worked like a charm. Of course, this logic was hideously project specific and unable to be automatically chosen correctly for you by the programming language...

ehsanu1 · on May 30, 2020

See some discussion here: https://www.reddit.com/r/rust/comments/gntv7l/dropping_heavy...