Advice for new questions: instead of contrived ones, pick a problem you've actually encountered in your job and ask the candidate to solve it. I've never had to implement a datastore like this -- so unless that's what you're actually doing, it doesn't seem particularly relevant. Another plus is that if you're doing interesting work, you have an almost endless supply of questions to choose from. Candidates also seem to respond better when you wrap up the question with "this is something I actually worked on a few weeks ago."
FWIW, Ning turned me down back in April 2008 so feel free to ignore my advice :)
Asking relevant questions at an interview is a good idea, but I've noticed that a lot of technical interviewers ask questions it took them weeks to understand and fix when they use this approach. Understanding a problem is often more difficult than solving it ... knowing that the interviewer understands the problem and constraints is even more of a challenge.
I once had an interviewer give me the kobayashi maru without telling me. I thought the guy was a jerk. He could just have pulled a gun on me and demanded my wallet if he wanted to judge my personality. I declined the second round.
Yeah, but it wasn't technical. The point of the question was to assess my entrepreneurial IQ, or so I was told after. Generally if an unemployed person's applying for a job, their entrepreneurial IQ is low at the moment and they need money. Nevertheless: I was to acquire a desk to work at, which needed to have certain dimensions for under a certain dollar figure. The internet is down, the local office supply company doesn't have it, etc, etc, until I'm starting to think the interviewer is nuts. OK, I'll sit on the floor. His point was that you need to generate options, which isn't bad advice. But it was a dumb interview question.
The point of the Kobayashi-Maru test isn't to test whether they're capable of passing impossible tests, it's to test their character facing death. If you're choosing fellows to go to war with, you want to choose ones that won't shit their pants or otherwise make your last moments alive together awkward.
You can look up the Star Trek reference, but in an interviewing context, it refers to an question that is essentially impossible to answer and/or basically never occurs in real life and yet is somehow meant to judge a candidate's ability to "think outside the box" or under extreme pressure.
As well as, of course, to showcase the interviewer's vast wealth of experience and keen insight into the human condition that enables him/her to assess as much in 5 minutes from candidate's response to a completely contrived question.
I feel the same way about little puzzles that companies like to present to the candidates, especially the ones with a "trick" answer.
I never worked in a place where I had to solve tons of small little puzzles all day, or have to memorize a set of obscure language features, and can't look them up online if I need to.
As you said, the best interviews are the ones by a white-board, working together to solve something resembling a real problem. Because, eventually that is what I will end up doing if I work there.
As someone who was recently on the job market, I will say I wish more employers did this as well. As a candidate it gives you much more information about what you will actually be doing at the company too. There is just as much buzzword packing in job descriptions as there is in resumes these days and sorting through the interesting from non-interesting jobs is not always easy.
True. During my latest interview, Heroku asked me to design a billing/invoice system. I won the position and my first assignment was to work on the billing / invoice system. When I have the opportunity of conducting an interview, I will most certainly use this approach.
I truly apologize in advance - this is not meant to be flame bait or anything of that nature - but it bothers me when someone states they "won" a position. The interview process should be used to find a good match, both for the company as well as the candidate. When I am interviewed I am evaluating the company as well and will have no reservations turning them down if I feel I wouldn't be a good fit. If I have to win to get in to the company, then that alone would make me back off and possibly lose interest in the company.
Upvoted. Just for once, I'd like for an interviewer to present a question that they themselves don't know how to solve (or didn't know how to solve in the first few times they thought about it).
An idea I've tried a bit in the past (and would like to try more, given the chance): ask a question that you don't know the answer to. This shifts the power balance of the interview - you don't automatically have the answer over the candidate. You also don't have your mind closed by your expected answer(s). Both of these make the resulting discussion more comparable to how you will work with a colleague, which is what you are trying to test the candidate for, after all.
Doing this requires more effort and guts from the interviewer, but it's still easier than being on the other side of the desk. Finding a supply of suitable questions is probably the trickiest part - you could try looking up some problems from competitions just prior to the interview, or you could perhaps take a real problem you are trying to solve at that point in time (abstracted as required).
I have done this before, usually if I am working on something interesting, and for which I don't like any solution I have yet hit on. In this case, I do the best I can to describe the problem, thinking thus far, etc, and turn the interview into a design session at the whiteboard.
I can only recommend doing it with a candidate you are more selling to then evaluating, and whom you think will be a good fit.
A strong candidate will jump in and you'll have a great session (I had one in particular with a great young erlanger whom we wound up losing to someone else, but the interview left a lasting impression and we kept in touch long afterwards).
This was the approach my current employer chose. They basically were like "Here's the challenges we expect you to tackle once you start. Let's discuss how you would approach each."
Essentially a high-level design session where they got to see my thought process and I got to ask lots of questions about the firm. The best part is that when I started they were pretty comfortable with my approach to certain projects as it was previously covered, and I knew what I was getting into. No big surprises for either side.
That's a very good idea. I would go further and tell the candidate that I don't know the solution myself, thus hopefully putting him/her at ease and making it clear that I am more interested in the thought and design process, rather than figuring out the "trick" in the question.
These types of questions are terrible interviewing techniques. An interviewer who, even jokingly, goes into the room with an interrogative mindset is fundamentally failing at the core purpose of interviewing: finding good talent, preferably relatively undervalued talent. An interviewer should not be looking for someone who mirrors their way of thinking or or their approach to solving a problem. They should be looking for people with skills that complement the team, not duplicate it. Questions like these are typically signs of bad interviewing technique, but not always depending on how they are presented.
More often than not, however, people "design" interview questions to cause problems for prospective employees. You might think this is clever. You might think this helps weed out people who don't possess the ability to think quickly or react to feedback on their feet. All you really are doing is artificially limiting the prospective candidate pool.
Every dork with a chip on their shoulder can manufacture an a question to be unanswerable. Deep down inside what most of these bad interviewers really want is to design a question that creates one of two atmospheres in the interview room: (a) a submissive one, where the candidate is forced to placate the interviewer and reinforce their apparent intellectual superiority; or, (b) a confrontational one where the candidate and interviewer battle it out for dominance as alpha-geek. Either (a) or (b) can be seen as a positive depending on the type of interviewer that goes in. Ultimately, any interviewer that uses this technique is a bad one and any input they make into the hiring process should be seriously discounted.
The interviewer should adapt their interview style to the candidate based on reading their body language, communication skills, and overall demeanor in the interview room. Its the interviewer's responsibility to create the best interview experience possible. Some people are nervous; some don't think on their feet well; most rely on Google heavily to point them in the right direction; lots of them know more than they can demonstrate in the 5 minutes you give them to react to your stupid question.
Your job is find diamonds in the rough. People who are the best, not people who act, think, behave, or answer the way you think they should. Questions like the one in this blog post are tell-tale signs of alpha-geek interview dominance gone wild, and out of that you'll typically only get one style of candidate that makes it through the gauntlet, certainly guaranteeing you're not getting the best talent available.
The question presented is a simple, open-ended question with a number of acceptable solutions. It's not even remotely "unanswerable". It's at least mildly interesting as indicated by the number of responses at a similar question on StackOverflow: http://stackoverflow.com/questions/2573653/given-a-1tb-data-... . It's not far from the problems Ning solves. It's not an "aha" problem that you either get or you don't get. It's complex enough to reveal the candidate's thought process and problem solving methodology, but it's small enough for good candidates to get to a reasonable solution in the 45-60 minutes of an ordinary interview.
Nothing about the article indicates that the author has a chip on his shoulder, nor that he's a dork who wants to "battle it out for dominance as alpha-geek."
Nothing about the article indicates that author is looking for a specific solution reflective only of his particular prejudices in programming.
What in the world is the basis for your entire diatribe here?
Seeing as you worked at Ning and likely participated in these types of interviews it's not surprising you don't get it. Surely Ning was a great place to work and I'm sure you found really top-notch employees through your well known tough hiring process, but I'd be willing to wager they all look very much the same when you blur your eyes a little bit.
My diatribe, as you so eloquently put it, is based on the article. The list of things that the author expects to see in the answer. The way the question is presented. The lead off with "interrogation". The "gotcha" follow-up regarding how things might fail. These questions are great at finding a very specific type of candidate, but the entire point of my comment is that this one candidate type is typically not indicative of the best engineer. Of course you'll disagree, I'm sure, seeing as you were part of this process.
Asking how it will fail is a gotcha? To my knowledge and experience thus far, all systems fail. The question is "given this one, that we now have a good shared understanding of, which you designed with the tools you are most familiar with, how is it going to fail?
Not one candidate I can think of has been either surprised by this question, or felt stuck, fwiw. Again, I may be completely self delusional, but I hope not!
Well, I'm eager to hear what you think the candidate type of the best engineer is, why that type of candidate would fail to come up with a reasonable train of thought given the cited question in an interview, and how you select for that candidate type.
If questions like the one posted find a type of candidate who can think well under mild pressure, who can quickly brainstorm sensible approaches to technical problems, and who can communicate their ideas and reasoning to their peers, then that sounds like a pretty good outcome to me.
Wow. You seem angry, or frustrated. Did I say something that offended you?
I only participated in Ning's interviews as an interviewee. I worked for Ning for six weeks before the layoffs.
The only thing I'd notice when I blur my eyes a little bit is that Ning employees were better engineers than most I'd previously worked with. The same applies at my current employer, also known for its rigorous interviews. The similarities ended there.
You complain here about the things the author expects to see in an answer. He's looking to see what they think of that scale of problem and how their thoughts comport with their resume. He's looking to see what questions they ask to get more detail. He's looking to see that the solution is in reasonable bounds for hardware--that the candidate understands the limitations of the silicon we have to work with. He's looking to see whether they would reinvent the wheel or use proven technologies. He's looking to see what languages and technologies a candidate would choose to use. He's looking to see how well the candidate understands those tools. He's looking to see how cognizant candidates are of the failure modes of the system they're designing.
Maybe I'm dense, but I don't see any issue with the list of things the author expects to see expects to see. Can you be more specific, perhaps?
Likewise I don't understand the problem with the way the question is presented. He describes the data you have and the load you need to support. Is there something wrong with that presentation?
The crossed-out "interrogation" is obviously a joke. I'm certainly not known for my intuitive perceptiveness and I still saw that (and no, I never met brianm while working at Ning, to my knowledge).
How is asking about failure modes a "gotcha"? Every good engineer needs to consider how his designs and systems can fail. What in that question do you expect trips up good engineers?
The only "specific type of candidate" this question seems oriented toward finding is "good engineers." He's not asking for particular technologies. He's not even looking for a particular solution. He's just asking people an open-ended question about how they would design a system. All the best engineers I know would be able to answer that question in 30 minutes.
I don't disagree because I was part of the process, I disagree because I can't understand your objections. I hate to sink to personal attacks like you have, but have you considered that your opinion on questions like this may have been formed by your abilities? It seems a more likely possibility than that questions like this are fundamentally flawed in some way that you've as yet been unable to describe. Both companies I've worked with that have had rigorous interview questions of this sort have hired the highest quality talent I've ever had the pleasure of working with.
I would say that having "interrogation techniques" crossed out is a pretty good indication that the author's state of mind in an interview situation is aggressive and domineering.
I would have said the same thing until I read the rest of the article. In context I think it's meant primarily to be funny.
The author clearly maintains some expectations about the level of response from the candidate, but overall it's designed to be interactive. Chances are that unless the candidate gives a really superior answer, it's not going to be the only question asked.
What is the meaning, in an essay, of a cross-out word?
Generally, I would say that any time an author takes the extra trouble to write in an extra word and cross it out, they are making some kind of meta-comment deliberately, rather than straightforwardly revealing that the crossed-out word is representative of their true state of mind. The crossed-out word is a meta-comment on what the author thinks about the shared idea-space of the author-plus-audience, usually to say "hey I know that we all know that this concept exists", usually to also say "and that this concept is an unmentionable", and often to also say "and I think that's a bit funny".
In this case, we all know that some people conduct interviews as interrogations (and some people think that indeed they should do so), and on the other hand some people think that this is offensive. And brianm wants you to know that he knows we all know it.
MAYBE it's on his mind because he's an aggressive domineering jerk who likes to interrogate people, but he's sensitive enough to be a little defensive about it. MAYBE it's on his mind because he's NOT aggressive or domineering, and he's frustrated by how easily interviews become interrogations, and his whole deal is trying to prevent that without making the whole interview meaningless. MAYBE something else. How're you going to find out? Read the article!
My point, tl;dr: If you see a word crossed-out, by an author you don't already know, and you think "Oh this author really means the crossed-out thing", I think that's a poor comprehension strategy.
> Questions like these are typically signs of bad interviewing technique, but not always depending on how they are presented.
Given that the author presents the exact question and includes a moderately lengthy discussion on how he uses the question, what is your evaluation in this specific case, rather than the general one?
Consider, for example, this comment from the article:
I’m looking for quite a few things as we go through the question. The first is their opinion of “a terabyte of data” and “5000 lookups per second.” Do they consider this to be a lot of data, or a fairly boring amount, same with the lookups per second. Leaning either way isn’t a failure, it is just information gathering for me, and referencing it against how you represented yourself in your resume, cover letter, and phone screen.
Your reaction says lot more about your attitude than it does about the OP's. I found the article interesting and informative, with none of the passive/aggressive overtunes you mention.
Your lip service about finding diamonds in the rough and catering to candidates who are nervous or can't think on their feet leads to just one comment: "Put up or shut up."
The OP has given us a very good, concrete example of his interview process, the traits it reveals in candidates, and how he tailors it to meet individual candidate reactions. Until you do the same your comment here is nothing but five paragraphs of whining about how some people don't interview well.
To which I reply, "Duh."
We all know the 45-60 minute interview is imperfect, and that both interviewer and candidate are just trying to make the best of a very contrived interaction to begin with. So, please, lose the attitude and give us something we can learn from rather than bashing the OP with vague generalities.
I appreciate the feedback, but don't think I fall into, nor describe in the linked blog post, the interviewer category you describe (though it is certainly possible I do and just don't realize it).
Do you consider this question to be unanswerable? As I stated, I'll accept pretty much any viable answer, my goal is to get folk to design something ~realistic (okay, this is a simplistic system, but we have finite time in an interview slot).
Honestly, I think the question I blogged about is pretty straightforward, and exists for purposes of framing a discussion about real systems and constraints. There is no answer I was fishing for at any point I used this question.
"Questions like the one in this blog post are tell-tale signs of alpha-geek interview dominance gone wild," is so far off base from what I am trying to convey that I think I utterly failed to convey my point, which was quite literally the opposite :-(
Part of what's problematic about this interviewing approach is that while your (secret) internal mindset may be "I'll accept pretty much any viable answer", the full-court-press interviewing style effects the exact opposite impression on the candidate -- they're left thinking, "gosh, I have to get this right, dammit, no room for slack, can't leave out any details AT ALL or it's back to another month of ramen and job boards trolling."
So yes, with your interviewing technique (orthogonal to the technical content of the question itself) you've succeeded in conveying a stance and mindset that is quite literally the opposite of what you intended.
From the article. The flip side of the fact that the interviewer has had time to carefully prepare while the interviewee has not is that what you get from that inherently contrived, lopsided discussion, kind of a long, drawn-out game of "gotcha." Fun for the <strike>interrogator</strike> interviewer, but tedious and buzz-killing for the interviewee.
It provides insight, yes, but a certain cost: many candidates fine this approach simply alienating, and more to the point, it upsets their thought processes (such that they don't really get a chance to shine).
As a candidate you can train for this kind of confrontational approach, of course, but I think lrm242 nailed what's most alienating about this interviewing style -- that ultimately the candidate is put into a position that's basically uncomfortable to be in, where they have to _placate_ their interlocuter according to some hidden criteria set -- and mostly they just want that particular thread to be overwith as soon as possible.
I just think we can get the same insight into candidates from different ways, where things are more on an even keel, and there's more "flow."[1] It takes more tact and creativity (and perhaps a bit more time), but it's not that hard, is more mutually enjoyable and (I think) ultimately provides a lot more insight.
[1] "flow" being of course a very important concept; e.g. in the sense of Murakami.
Yes. Given a little time I could come up with 20 or more so ways of doing this. As to which one I would use that's not really answerable in an interview. The problem with questions like this is two weeks is a fairly long period of time. If it's literally all your working on then researching a few options and then asking some more questions is the best place to start. aka Is the access pattern random or would caching be useful? Is the data private? etc.
I don't know about you, but when I am interviewing people I am not trying to find "diamonds in the rough", I'm trying to find the best person to join the team and contribute. That may or may not include experience, but it certainly includes the ability to solve problems.
This isn't an unanswerable question. I've asked and gotten great answers to a very similar question about handling data at scale. And what I'm looking for includes the opposite of a chip on the shoulder.
Man, I can't up vote this enough. I would also add that this is also usually symbolically equal to what VCs seem to do in their "interviewing"/wooing process. Well said!
Did you read the rest of the article anyway? The "interrogate" wording may present a bad first impression but the remainder of the article is pretty reasonable.
I'm interested in why he's so against the "custom solution". Almost everything a DB will try and add in for key-value lookup is predicated on the idea that the requests aren't randomly distributed. The DB index will probably be based around b-trees, which will do a logarithmic-time search for the top few levels cached in RAM, but will fall over fairly miserably with multiple seeks as it has to page in leaf nodes and likely the nodes a level above those: I don't see it as realistic to expect a DB to do it all in a single seek.
On the other hand, if you have a perfect hash function for the keyspace into a hash addressing space (you'll want something a bit larger than 2^32), you could simply choose the device with the first few bits, then do a single seek into a (large) file on disk with the position indicated by the remaining bits of the hash addressing space, shifted over appropriately for the 128-byte stride of values. You don't even need a perfect hash: with a merely good hash function, you can cheaply read in more than 128 bytes from disk (OS will probably be reading in 4K anyhow), and use open addressing with linear probing.
You'll need enough devices to spread the load to get good latency for random seeks; probably a combination of mirroring and what amounts to sharding, using a portion of the key hash to select the device. But that's a relatively cheap numbers game.
But I'm not really seeing what a DB buys you here, in this specific scenario.
EDIT: Further investigation of tc and cdb (as linked in the article) suggests that tc (Tokyo Cabinet) may work in its hash form, but I'd rule out cdb for using two seeks, which I would expect to double the number of devices required.
I would imagine that for a certain number of candidates, they answer with a custom solution because they believe this is what the interviewer is expecting to hear (and asking for) when the question contains "you need to design, build, and deploy...". I think to some people, the immediate thought is that the interviewer would deduct points for originality if you said "well let's just use this tool off of the shelf".
This is why interviews are so tricky - it's hard as a candidate to juggle answering the question and trying to decipher what traits the interviewer is really trying to expose with this question.
For example with this one: is the interviewer trying to see if I understand my data structures and the size of this data to the point where I could design such a system from the ground up? Or are they trying to see how practical I can be given real-world time constraints?
I agree, and I try to make clear that I really want to understand how the candidate would approach it -- if they genuinely would try to find a perfect hash and keep a hash -> offset map in memory, then mmap a big data file, that is fine. I will ask how they will find the perfect hash, etc. If you can describe well the approach you take, then this is fine.
Ofttimes, though, candidates have started designing a general purpose on-disk hash database. That is fine and dandy, and we always need a better database, but it is pretty orthogonal to the immediate problem.
Actually, ofttimes engineers have implemented general purpose on-disk hash databases as part of j. random project, so that path certainly does occur on real projects :-)
Were I asked this, my general approach would be: (1) Given the available building blocks (disks, ram, cpus, etc) what strategies are workable? An example of a strategy would be "store it in a btree and keep all but the last two levels in ram; therefore we'll need enough ram for the top of the tree and 2x seeks for each lookup." (2) Find existing software that implements a workable strategy if at all possible, or at least the complicated components of it and code the rest yourself.
You can think of two extreme answers to this as "buy oracle" and "buy some transistors". Those are both bad answers. The first is a bad answer because, unless the candidate can answer the follow-up question "how does oracle work?" it probably just amounts to wishful thinking and they almost certainly won't be able to say when it will fail. The second is a bad answer because you can't do that in two weeks by yourself.
Take for example, Foursquare's recent outage. From what I read online, it seems like they went with "use MongoDB" as a strategy without actually understanding what Mongo was doing under the hood. Presumably they didn't figure out how it would fail (the author's follow-up question, natch) and thus they discovered that the hard way. So they chose bad answer one. On the other hand, had they decided to implement their own database, they'd know all about its characteristics, but they wouldn't have launched yet.
I have gotten solutions which implement custom stores which I have had faith that the candidate could build, but not many. I am not dead set against it, but I will press for an estimate and a good explanation of it if they want to go down that route.
I try to emphasize that I want to know what they would actually do, on the job. Other questions (typically other interviewers) will explore algorithmic stuff, and folks I work with have used this question with that intent -- I use it to see how they think about building systems.
Another thing I didn't mention in the blog post is that up front I tell them exactly what I am looking to learn. I am not going on a fishing expedition, or trying to trick anyone. I find that kind of "second guess the interviewer" thing to be asinine, so strive to avoid doing it myself.
How often do you decide exactly what you're going to do to solve a problem for your job in the span of one hour? Oh yeah, and make sure you provide an estimate and a good explanation, within that hour, as to why you've made that decision. Give me a break man, you're looking for people who tell you what you want to hear because then you've found someone who thinks like you--and that, in your mind, is the only correct answer.
The idea is that you don't have to write a DB - it is already there, usually even deployed and monitored by your ops team. Even if it gives absolutely no other value, thats a big plus.
In terms of randomly distributed requests:
I'm surprised the topic of "working set" is not part of the question. Even with huge amounts of data, if the working set can be stored in RAM and only rarely swap data in and out, its a much different problem than 5000 disk reads a second. Lots of databases are optimized to efficiently maintain cached data in memory and swap as little as possible and as efficiently as possible.
I must say the problem bores me to death. I don't think I'd want to work where they ask me to solve this. However, I am confident I'd design a solution that's near perfect in the two weeks given, it just would require some experimentation on the data storage side - the code is trivial.
I'm confused... why specify a 1 TB disk? It doesn't seem that the problem's solution has any dependence on the aforementioned disk aside from the data it contains unless I'm misreading how he's explaining the answer?
Yeah, I also thought he expected to serve it all from a single disk. Hopefully in real life it gets clear in the discussion.
It's a problem when asking interview questions: there are many ways in which the question may be unclear and the interviewer is mislead that the applicant can't answer. Another problem may be that the interview interferes with the thinking style of the applicant: you try to discuss it with him while he needs time to think by himself, or just the opposite: he thinks better in a discussion and you stay quiet while he sweats in stress.
> there are many ways in which the question may be unclear and the interviewer is mislead that the applicant can't answer.
yeah, when I was interviewed for my university, they asked me to design an algorithm to work out the n'th term of the fibonnacci sequence. They repeatedly said I didn't need to use recursion or loops.
When I finally gave up, they said: well here's one way:
f(n) = f(n-1)+f(n-2)
I just looked at them and quietly said "well... yeah... I mean... obviously". By that point I was too depressed to say "but that's frickin' recursive, wtf?!?" but that's what I was screaming in my head.
Amazingly I still got offered a place.
edit: still bugs the crap out of me. Did I mis-hear them? Were they being deliberately misleading? Was it some kind of test to see how I reacted to an impossible problem[1]? Were they really expecting me to come up with some mathematical algorithm to calculate the nth term without recursion or looping? Argh!!!
I'm pretty sure I asked for clarification several times... It's possible I suppose.
Here's the problem it started from:
Eleven lily pads are numbered from 0 to 10. A frog starts on pad 0 and wants to get to pad 10. At each jump, the frog can move forward by one or two pads, so there are many ways it can get to pad 10. For example, it can make 10 jumps of one pad, 1111111111, or five jumps of two pads, 22222, or go 221212 or 221122, and so on. We'll call each of these ways different, even if the frog takes the same jumps in a different order. How many different ways are there of getting from 0 to 10?
How the applicant seeks clarification is just as important as how he answers.. he's going to be presented with problems all the time.
When I pop the open-ended technical questions, there really is no right or wrong answer - it's just there to judge someone's real ability - I have some questions you can't fake your way through. You don't have to have every little detail right, but I can tell from how you answer whether or not you are afraid to ask for clarification, and whether you know the material, and to a good degree how well you know the material.
It's a conversation - not a one-way thing (at least in my case). I can tell if someone's nervous - and that's okay, too, but the job is going to be stressful, so after I explain not to be nervous and don't worry about getting every detail right.... but if you can't even begin to describe the systems I'm asking about, you don't belong on the team... same goes for if you don't seek clarification.
yep, that totally threw me too. It kinda implies that you're going to be handed a physical disk with some sort of k/v store in it and you need to write some software to access it.
I immediately started thinking about seek times and elevator algorithms and that kind of stuff, which really threw me off course.
Because when you specify a disk, you realise he's guiding you in the direction of a specific answer.
Use the filesystem. It's what it's there for, particularly if the data never changes. Nice big cache, xfs, nested hierarchical datastore using folders based on keynames, 20 lines of code, job done. 5000 lookups a second is trivial throughput for a decent disk with a decent readahead cache.
I don't know of any hard drives that can do a seek+read in 0.2 milliseconds. Keep in mind that the reads are distributed randomly - you're almost certainly going to need more than one disk if you have less than a terabyte of RAM for your datastore.
Why have SSDs killed this question? Seems like you could tweak the amount of data on disk and/or the size of the values and/or the requests/second requirement and keep using it. Also, there are no 1TB SSDs available at this point, so you'd still have to assume spinning platters for this, no?
I haven't thought about it deeply, but my first guess would be that the seek time of hard disks is what makes serving 5000 requests per second difficult; consumer-grade hard disks can perform something in the range of a few hundred (randomly distributed) IOPS at best due to seek time. SSDs make seeks (almost) free, so even one decent SSD should be able to service 5000 requests per second, assuming you can get a big enough one.
If multiple storage devices (hard disks or otherwise) are allowed, RAID (or just splitting the data across multiple disks) would ameliorate the seek problem somewhat, since seeks can then happen in parallel. I wouldn't see this problem as requiring a distributed solution, unless the bottleneck is the bus or storage controller rather than the single-disk seek performance.
Multiple disks, across different physical servers (instead of RAID) would be good too.
You could also try increasing the throughput by having an in memory cache. The index/hash table can also be in main memory. Depending on the keys, locality can play a crucial role in prefetching data.
I think that the parent thought that the system had to use a single SSD. The first time I read the question I thought that I had to use the hard drive that the data came on, and that it was all I was allowed to use. I didn't realize that the point was to design a system to simply serve the data that was given to us. After reading it, it becomes pretty clear that a simple raid5 with a few SSD's makes this problem moot.
Goshdarnit. I just sat down and started talking to myself about hash functions, filesystems, the wire protocol, and partitioning scheme for half an hour instead of making dinner. Nerd sniping?
I love to ask something that is simple to start with, and then build on it interactively with the candidate.
One of my favorite questions is - Give me a normalized database structure that you'll implement if you were to build gmail - incorporate conversations, messages, multiple message participants and labels.
Then, depending on the candidate, I build upon the question,and go into various optimizations possible, the ways caching would be implemented, sharding/splitting/de-normalization would be done with load, etc. etc. With good candidates, its always a very interesting discussion. And even the bad ones can leave the interview thinking they know something and don't need to feel bad :)
This. A million times this. I do the same thing starting with a simple kernel, but keep growing it, throwing in monkey-wrenches, see how the candidate reasons and adapts.
Good candidates will start recognizing and acknowledging trade-offs, whereas bad candidates will settle on one clear path.
Good candidates will speak about the ideas they're trying to get across, whereas bad candidates will drop into talking about specific brands of technologies.
One candidate when faced with some questions on how he'll handle load said he'll switch to Oracle :) Promptly asked to leave.
Good ones clearly show that they have the ability to deal with multiple tracks of ideas, know that trade-offs need to be taken care of, in general, have an approach that with some ingenuity, almost all problems are solvable. Thats the attitude I (and I guess you too) look for.
Jon Bentley's "Programming pearls", definitely. This problem is somewhat comparable to the first "pearl" from this book. Anyway one of the best ever programming books :)
I like this question, but I'm wondering what a good interview looks like if someone is completely blind-sided by the question because they don't have a lot (or any) systems experience? I'm imagining an engineer who's experience ends with MySQL so knowledge of stuff like tcb is a definite no. Maybe that's not an issue, because they wouldn't be interviewing for this job if that was the case.
... not an issue if the questions are relevant to the job. On the other hand, recruiting is hard and it's easy to let a great candidate slip by for any number of reasons.
Because some of us went on to better places :) Ning was recognized for its selectivity in hiring; though it did come as a shock, everyone that I knew in my short time there landed on their feet at similar or better operations (Google, Amazon, Etsy, etc).
In the intervening months since the layoffs, there's also been some attrition (as there always is) in those who weren't laid off. I know of at least two people who weren't laid off, but left for Facebook shortly after the layoffs.
I also know that they've already hired back at least one of my former coworkers since the layoffs.
This guy is taking a lot of heat for bad interviewing technique, and this may or may not be the case - but it's hard to say, given that we don't know what kind of a position he's interviewing for. Given the job requirements of the position, this may be a perfectly reasonable question - i.e. if the job requires someone to build and maintain large custom data stores, it's pretty acceptable practice to ask questions that reveal how much they know about large custom data stores. Personally I wouldn't be over the moon crazy about this question, but it's still a hell of a lot better than being peppered with random "what is X?" type questions that test nothing about your ability as a developer, and only what you currently directly know. That is to say, those questions where you end up looking like an idiot if you didn't happen to read the same man pages as the interviewer. In the types of questions this guy is asking, you have to basically guess what they're looking for - so it can end up being a bit of a mind reading exercise, which is why I wouldn't be terribly crazy about it - but it's still by far the lesser of several evils, simply because it's a direct demonstration of how a developer thinks about development, instead of how much rote minutae they've mastered or whether they can find a way to stick eight kittens into six holes without any broken legs of flying fur, etc....
> In the types of questions this guy is asking, you have to basically guess what they're looking for
Don't guess, ask. Having asked questions like this before, I can assure you that a reasonable interviewer will actually be pleased that you understand the alternatives.
(And, of course, if you get an unreasonable interviewer, you probably don't want to work there anyway.)
I've never been asked this question in an interview so when I saw it I started thinking about how I would answer it.
The first thing I did was work out the actual bytes/sec (throughput) needed for this many requests. Also given the 2 week timeframe I would question really hard what the data set looks like. Since the requests aren't randomly distributed does that mean you can narrow it down to a small enough portion to fit inside RAM.
I think the question is a good one because it does touch on a lot of aspects of infrastructure and systems design.
Expecting a good answer to this question in 30 minutes without any warning is totally unreasonable. Expecting anyone to implement a deployable, generalizable, tested solution to this in two weeks is totally ridiculous. I wouldn't work at a place that asked questions like this or made development calendars with schedules like that.
Yet, companies who can pull off those feats are the ones who survive. A 1 TB database, even 5 years ago, shouldn't take more than a week or two to build properly. The solution I've used in the past, a Netapp file server with hashed directories of key-named files, takes zero programming and a couple hours to install. (You can easily read 5000 files / sec over NFS from one of their file servers.)
I worked on it on average two hours per day. It's not that ridiculous to believe that it would be (near) production quality in two weeks of fulltime work.
Hmm my first thought was ... well ... WTF
Then I thought - man I'd like to work for this corp. but alas I think I'm to stupid for that.
Some moments of reflection later I'm now convinced that I don't want to work there at all.
The question is somewhat similiar to
"give me the fastest sorting algorithm and prove that it is"
Everyone with a CS-degree or something similar will know this prove and Qsort from his very first years and I claim almost everyone will have forgotten the prove.
(I can remember that you can prove this by looking at the "descision" tree passed on the 1on1 comparision which will have depth of about the needet size ;) - but this is it)
I think most people will get the prove (even without google or wiki) in a resonable time (some hours?) but within interview-time and in this situation? In my case no way.
It's everybody's favorite interviewing technique: Ask a fairly esoteric question that you are thoroughly familiar with and sit there and feel superior while the candidate struggles to get to an answer you will accept. Three kinds of candidates do well: those who are smarter than you are and spit out answers that surprise you; those who have heard the question and act like they're thinking it through for the first time; those who are unfamiliar with the area but have a personality you find congenial, so you coach them to a solution. In the end, it's all about ego.
Using general purpose database in such case is just an approximation of a solution, ie. if the database supports index-optimized type of table and smart enough to cache index header blocks quickly enough.
Because of extreme primitiveness of this case, custom solution would work better here than general purpose db. Sort the pairs by key and place in the tree, with the root node containing 2^11 keys (512K), second level - 2^11 nodes of 2^11 keys each (512K each node size), 3rd level - 2^22 nodes of 2^10 keys (256K each node size). Keep root and all the second level nodes always in memory (1G RAM required). Given a key to find, look it up through the root and the second level nodes in memory - this would result in the index of the 3rd level 256K node to be read from disk. Time-wise reading a 256K block from disk is just one IO - pretty much the same time as reading 1 byte.
5000 lookups/sec at 1 IO/lookup - 20 disk array of 250 IO/sec disks - 15K ones would do it. 2 arrays of 400GB disks is still ~2 times cheaper than 1 TB SSD array. RAID5(6)-ing the disks will leave us with 6TB. Short-stroke the first 1TB (70Gb x 20) to get much better speed than 250 IO/sec (or just use less than 20 disks to start with) and enjoy the rest 5TB for non-IO intensive purposes.
I'm trying to work out if this is a thinko on my part or a typo on your part:
First you write:
> a system that lets you look up the value for a given key
and then later on you write:
> if they think the problem of looking up a key by value has been solved before
Did you mean to say "looking up a value by key" or am I misunderstanding it?
(Given that no one else has commented on it makes me question my question but I'm trying to get over my concern of being wrong on the internet and asking to be sure. :) )
Hello,
I would probably make a very poor candidate, but I wanted to ask this - the question gives a 1TB hard drive, and fills it literally to the brim with 4 billion sets of 256 bytes. And then you say that 4 or 5 years ago this relied on some sort of distributed system, but now it can be done on one box. What changed? If the drive is one terabyte, and is used to store exactly one terabyte of data, aren't you going to need to put your database overhead somewhere? Am I completely misunderstanding the question? I find it interesting though. It's an easy problem without the storage space limitation. With it, it's fascinating. Are we thinking to compress the values and/or the keys? Or am I off the mark?
It seems to me that the problem could actually be solved by using a 64GB SSD. Given that the storage is 1TB and each of the keys and values are 128Bytes (and hence the disk is full to capacity), and assuming the keys are unique, which it would need to be to have all the keys unique, then we can safely say that every possible value in the 2^32 space is used. Therefore, we could read the data from the 1TB disk, and for each key we find, use the value of the key as the address of a memory location on the 64GB SSD. Net effect is we essentially remove all the key info from the hard disk and compress the storage into 16GB. ie, no need for the hard disk, just a 64GB SSD.
If you have more data than, say, GDBM, can swallow in one bite, how could you break this data apart and then find the right chunk amongst multiple servers when given a key?
Also, what does the interviewer mean by this:
> I generally ask these folks if they think the problem of looking up a key by value has been solved before, especially given the two weeks to be live in production requirement.
I'm in recruiting myself. If I would ask questions like this we would have been out of new employees very soon. Questions like this only intimidate the applicant and are much to complex for interviews. Also you seem to have the idea that you know the only "truth". This is ridiculous.
Great thought question. Being a junior developer, I haven't the slightest clue on how to answer it. Having been exposed to merely algorithms in school, would it be safe to assume that this wasn't really targeted at software engineers but more towards system designers?
Sorry, I'm not a hacker, I wouldn't know the first thing about setting up an interface to a hard drive. However, the first thing that pops out at me is that you just described a 512 GB addressing scheme. You want half your disk to be unaddressable?
A picture is a useful abstraction of reality. It forces people to think about what they'll 'answer' before they do. Some people might draw a diagram, some might draw a story, who knows. It tells a lot about people and how they think.
For some people this may seem like a fun task and they'll get excited about it, some may not find it useful and argue about it and others (the ones you want to avoid) will frown upon it and say 'Fine. I'll do it anyway'.
Most of the people I've interviewed are well accomplished engineers, but that doesn't always mean I want to work with them.
Why do you think it is NOT useful to draw a picture?
I think it's not useful because you're not hiring me to draw pictures. I'd pass on any interview where crayon drawings of my future had any impact whatsoever on the likelihood of my hire.
It's not useful because it has nothing to do with what you're hiring them for. Frankly, if I were asked to draw a picture, I'd probably walk out. If that means you don't want to work with me, then I probably wasn't going to enjoy working with you either.
I've been hiring game programmers for a long time now. I think there are plenty of other ways to evaluate the non-tech personality/team-player side of candidates. Start with talking to them.
I ask very blunt questions in this area. Will provide examples if folks are interested.
FWIW, Ning turned me down back in April 2008 so feel free to ignore my advice :)