Fantastic story. Reminded me, we've seen similar issues back in the day when we were not monitoring memory usage as well as we should have been. Because we deployed daily or weekly, the services restarted and life was good. After everything was stable, a slow memory leak appeared that took weeks to grow to a point where the service would crash.
We used a paging/alerting product. It was intended to be fired by command line, I guess, so each run was totally cleaned up because the process died.
They added library (DLL) functionality so we could run it from a script. So the script ran and ran and ran and sent alerts, and randomly crashed from time to time.
Why is it crashing? The stand-alone worked fine! Well, they weren't cleaning up their handles, so eventually we get to the process handle count limit and die!