How We Work on Queries at GitHub

wldlyinaccurate · on Aug 29, 2014

I really wish this went into more detail. You are notified when a query is slow. You can EXPLAIN it in chat so everybody can see. What happens next? Are slow queries treated as high priority? Is there any tooling around debugging complex queries? Basically, what makes GitHub's process different to the decades-old "grep the slow query log and run an EXPLAIN"?

technoweenie · on Aug 30, 2014

> Basically, what makes GitHub's process different to the decades-old "grep the slow query log and run an EXPLAIN"?

Sam's chatops tools opens this process up to a lot more people that otherwise wouldn't be comfortable logging on to servers to access the slow query logs (assuming that they even have access to the servers). It's a great way for app developers to level up their sql skills from other more experienced coworkers.

The impact of slow queries determines their priority. How frequent are the slow queries? Is it from a background job, or does it cause exceptions on important pages or API calls?

I don't know if this process is streets ahead of what other companies have, but it's made a hugely positive impact on our MySQL infrastructure.

samlambert · on Aug 30, 2014

If we get a spike in slow queries we get alerted via pager etc. If a query is slow enough to be killed Hubot tells us https://twitter.com/isamlambert/status/502818333914566656

I am working on some query linting as a side project.

revskill · on Aug 30, 2014

Why doesn't Github open source their products/libraries regularly ? There is too few open source projects from Github on Github.

holman · on Aug 30, 2014

We do!

https://github.com/github https://github.com/libgit2 https://github.com/boxen

When it's easy to extract, well-documented, and has a clear team of maintainers, we try to open source. Sometimes it's difficult to nail one or all of those bullet points, though.

nacs · on Aug 30, 2014

You forgot https://github.com/atom

joshmn · on Aug 30, 2014

Haystack looks interesting ;)

NicoJuicy · on Aug 30, 2014

You can't expect them to opensource everything, i think it's awesome they opensourced hubot!

famousactress · on Aug 29, 2014

So, maybe a toy example but I can see the query included a join. Curious how you guys clone enough tables (and their keys) to troubleshoot things like that? Seems like it gets a lot more complex than the example suggests pretty quickly. Wondering if you have neat tools for that.

[Edit: Just noticed poster is author. Hi Sam and welcome to HN :)]

samlambert · on Aug 29, 2014

Basically you can hit /mysql clone for any table and it will make its way to an isolated db for the user running the command.

It would be cool to be able to pass the script a query and have it clone all the tables.

samlambert · on Aug 29, 2014

Hey! Thank you for the welcome :)

nathantotten · on Aug 29, 2014

Now, does anyone have the time to figure out how many users Github has in their users table? :) http://dheera.net/projects/blur

minimaxir · on Aug 29, 2014

Atleast 3,815,207 users. (who have performed some meaningful GitHub action)

Via BigQuery:

   SELECT COUNT(DISTINCT actor) FROM [githubarchive:github.timeline];

petercooper · on Aug 29, 2014

The mosaic effect they used doesn't appear to be real, as no number has a horizontal gap between two parts.

However, GitHub user IDs are/always used to be sequential, create a new account and see what its user ID is. It's around 8.6 million at the moment.

joshmn · on Aug 29, 2014

Nice read Sam, thanks for the writeup.

Thought I'd let you know you have an error in your json - section.name (http://i.imgur.com/CoHol0f.png)

;]

samlambert · on Aug 29, 2014

haha thank you :)

simonw · on Aug 29, 2014

"Once we have decided if we want to modify our schema we can perform an incremental rollout across our cluster. I will cover this more in another post." - looking forward to that.

techdebt5112 · on Aug 30, 2014

my guess is a thin wrapper around PTOSC.

blaincate · on Aug 29, 2014

from : http://ghtorrent.org/downloads.html

atleast :

   4,151,457 repos   

   2,480,478 users

popular repos :

twbs/bootstrap 40662

jquery/jquery 34633

joyent/node 34522

mbostock/d3 30247

h5bp/html5-boilerplate 28736

popular users:

visionmedia 10712

torvalds 9984

paulirish 5885

schacon 4431

mattt 4053

pjhyett 3732

src code: https://github.com/akuchlous/githublike

yRetsyM · on Aug 29, 2014

Are all github internal services a dark colour scheme? good way of determining internal vs external I suppose

Caged · on Aug 30, 2014

There's no hard rules on it. When I started working on it a couple of years ago, I was fond of dark color schemes for monitoring interfaces.

samlambert · on Aug 30, 2014

I'm not sure tbh, I think it makes it easier for graphs to stand out. I do love the interface of Haystack.

ckluis · on Aug 29, 2014

Awesome. I’m going to share this to see if we can do something similar where I work. Neat idea.

albertoleal · on Aug 29, 2014

Is haystack open sourced anywhere?

samlambert · on Aug 29, 2014

Unfortunately not. It is so closely tied to our applications.

possibilistic · on Aug 30, 2014

Thanks for posting this! I like hearing about internal tooling. Is there more on the query tagging? How do you guys bubble query annotations through the stack?

I don't know if you're at liberty to discuss further, but how has it been scaling a giant Rails app? Have there been any pushes to break it up into smaller components? Ie. fast moving stuff stays Rails, core infra moves to something statically typed?

technoweenie · on Aug 30, 2014

The query annotations show up as a mysql comment next to the query. I don't know if we have any automatic indexing of the annotations themselves.

We try to stick to ruby/rails since so many people are comfortable in that environment. We try to balance the desire to break pieces out with the fact that it lowers the number of devs qualified to work on it.

samlambert · on Aug 30, 2014

There is a little more info on query comments here: http://samlambert.com/posts/the-power-of-query-comments/

grk · on Aug 30, 2014

Looks like they're using https://github.com/basecamp/marginalia for the tagging.