I definitely agree on the long-term benefits. At the moment, I think we are still changing too many things too fast to keep a non-core set of tests. We are actually using our friends from RainforestQA to do a bulk of testing, which has been really helpful, but some things still fall through the cracks. Like an ajax script failing due to a new caching implementation, that only starts to fail after enough events have taken place. That's when it's really helpful to know that a large number of your users is offline, and they are not all going to trip up at the same time.