I really wish this went into more detail. You are notified when a query is slow. You can EXPLAIN it in chat so everybody can see. What happens next? Are slow queries treated as high priority? Is there any tooling around debugging complex queries? Basically, what makes GitHub's process different to the decades-old "grep the slow query log and run an EXPLAIN"?
> Basically, what makes GitHub's process different to the decades-old "grep the slow query log and run an EXPLAIN"?
Sam's chatops tools opens this process up to a lot more people that otherwise wouldn't be comfortable logging on to servers to access the slow query logs (assuming that they even have access to the servers). It's a great way for app developers to level up their sql skills from other more experienced coworkers.
The impact of slow queries determines their priority. How frequent are the slow queries? Is it from a background job, or does it cause exceptions on important pages or API calls?
I don't know if this process is streets ahead of what other companies have, but it's made a hugely positive impact on our MySQL infrastructure.
When it's easy to extract, well-documented, and has a clear team of maintainers, we try to open source. Sometimes it's difficult to nail one or all of those bullet points, though.
So, maybe a toy example but I can see the query included a join. Curious how you guys clone enough tables (and their keys) to troubleshoot things like that? Seems like it gets a lot more complex than the example suggests pretty quickly. Wondering if you have neat tools for that.
[Edit: Just noticed poster is author. Hi Sam and welcome to HN :)]
"Once we have decided if we want to modify our schema we can perform an incremental rollout across our cluster. I will cover this more in another post." - looking forward to that.
Thanks for posting this! I like hearing about internal tooling. Is there more on the query tagging? How do you guys bubble query annotations through the stack?
I don't know if you're at liberty to discuss further, but how has it been scaling a giant Rails app? Have there been any pushes to break it up into smaller components? Ie. fast moving stuff stays Rails, core infra moves to something statically typed?
The query annotations show up as a mysql comment next to the query. I don't know if we have any automatic indexing of the annotations themselves.
We try to stick to ruby/rails since so many people are comfortable in that environment. We try to balance the desire to break pieces out with the fact that it lowers the number of devs qualified to work on it.