Hacker News new | past | comments | ask | show | jobs | submit login
How We Work on Queries at GitHub (samlambert.com)
82 points by samlambert on Aug 29, 2014 | hide | past | favorite | 29 comments



I really wish this went into more detail. You are notified when a query is slow. You can EXPLAIN it in chat so everybody can see. What happens next? Are slow queries treated as high priority? Is there any tooling around debugging complex queries? Basically, what makes GitHub's process different to the decades-old "grep the slow query log and run an EXPLAIN"?


> Basically, what makes GitHub's process different to the decades-old "grep the slow query log and run an EXPLAIN"?

Sam's chatops tools opens this process up to a lot more people that otherwise wouldn't be comfortable logging on to servers to access the slow query logs (assuming that they even have access to the servers). It's a great way for app developers to level up their sql skills from other more experienced coworkers.

The impact of slow queries determines their priority. How frequent are the slow queries? Is it from a background job, or does it cause exceptions on important pages or API calls?

I don't know if this process is streets ahead of what other companies have, but it's made a hugely positive impact on our MySQL infrastructure.


If we get a spike in slow queries we get alerted via pager etc. If a query is slow enough to be killed Hubot tells us https://twitter.com/isamlambert/status/502818333914566656

I am working on some query linting as a side project.


Why doesn't Github open source their products/libraries regularly ? There is too few open source projects from Github on Github.


We do!

https://github.com/github https://github.com/libgit2 https://github.com/boxen

When it's easy to extract, well-documented, and has a clear team of maintainers, we try to open source. Sometimes it's difficult to nail one or all of those bullet points, though.



Haystack looks interesting ;)


You can't expect them to opensource everything, i think it's awesome they opensourced hubot!


So, maybe a toy example but I can see the query included a join. Curious how you guys clone enough tables (and their keys) to troubleshoot things like that? Seems like it gets a lot more complex than the example suggests pretty quickly. Wondering if you have neat tools for that.

[Edit: Just noticed poster is author. Hi Sam and welcome to HN :)]


Basically you can hit /mysql clone for any table and it will make its way to an isolated db for the user running the command.

It would be cool to be able to pass the script a query and have it clone all the tables.


Hey! Thank you for the welcome :)


Now, does anyone have the time to figure out how many users Github has in their users table? :) http://dheera.net/projects/blur


Atleast 3,815,207 users. (who have performed some meaningful GitHub action)

Via BigQuery:

   SELECT COUNT(DISTINCT actor) FROM [githubarchive:github.timeline];


The mosaic effect they used doesn't appear to be real, as no number has a horizontal gap between two parts.

However, GitHub user IDs are/always used to be sequential, create a new account and see what its user ID is. It's around 8.6 million at the moment.


Nice read Sam, thanks for the writeup.

Thought I'd let you know you have an error in your json - section.name (http://i.imgur.com/CoHol0f.png)

;]


haha thank you :)


"Once we have decided if we want to modify our schema we can perform an incremental rollout across our cluster. I will cover this more in another post." - looking forward to that.


my guess is a thin wrapper around PTOSC.


from : http://ghtorrent.org/downloads.html

atleast :

   4,151,457 repos   

   2,480,478 users

popular repos :

twbs/bootstrap 40662

jquery/jquery 34633

joyent/node 34522

mbostock/d3 30247

h5bp/html5-boilerplate 28736

popular users:

visionmedia 10712

torvalds 9984

paulirish 5885

schacon 4431

mattt 4053

pjhyett 3732

src code: https://github.com/akuchlous/githublike


Are all github internal services a dark colour scheme? good way of determining internal vs external I suppose


There's no hard rules on it. When I started working on it a couple of years ago, I was fond of dark color schemes for monitoring interfaces.


I'm not sure tbh, I think it makes it easier for graphs to stand out. I do love the interface of Haystack.


Awesome. I’m going to share this to see if we can do something similar where I work. Neat idea.


Is haystack open sourced anywhere?


Unfortunately not. It is so closely tied to our applications.


Thanks for posting this! I like hearing about internal tooling. Is there more on the query tagging? How do you guys bubble query annotations through the stack?

I don't know if you're at liberty to discuss further, but how has it been scaling a giant Rails app? Have there been any pushes to break it up into smaller components? Ie. fast moving stuff stays Rails, core infra moves to something statically typed?


The query annotations show up as a mysql comment next to the query. I don't know if we have any automatic indexing of the annotations themselves.

We try to stick to ruby/rails since so many people are comfortable in that environment. We try to balance the desire to break pieces out with the fact that it lowers the number of devs qualified to work on it.


There is a little more info on query comments here: http://samlambert.com/posts/the-power-of-query-comments/


Looks like they're using https://github.com/basecamp/marginalia for the tagging.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: