If your company does coding tests during the engineering interview process, (how) do you measure the long term effectiveness of the tests? Do you keep internal metrics comparing candidates' score to their long term impact/success at the company? If yes, what have you learned from the results and how have those learnings impacted your hiring process?
I don't know if my current company does, but when I first implemented them for a company I worked for ~15 years ago we definitely did.
At that company (which was a ~200 engineer, privately held, software company) we found a few things:
- in person tests were less predictive than take home tests.
- tests that did not provide automated test cases as examples were less predictive than those that did.
- there was virtually no predictive power to 'secret test cases' that we ran without providing to the candidate.
- no other part of the interview pipeline was predictive at all. Not whiteboarding, not presenting, not personality interviews, not culture fit testing, not credentials, or where experience came from, nothing. That was across all interviewers and candidates.
A few caveats about this:
- this was before take home testing had become widespread and many companies screwed it up. At the time we were doing this it was seen as novel and interesting by candidates, not as just one more painful hoop they had to jump through.
- we never interviewed enough candidates to get true statistical relevance.
- false negatives were our biggest concern, they are extremely hard to measure (and potentially open yourself up to lawsuit). The best we ended up doing was opening up our pipeline to become less selective to account for it. This did not seem to reduce employee quality.
In a more meta-sense, that experience led me to believe that strict hiring pipelines are largely not useful. Bad candidates still get through and good candidates don't. Also, many other things have a much bigger outsized impact on productivity than if a candidate was 'good'. It turns out, humans do not produce at consistent levels all the time and things outside of what you can interview for make more impact (company process, employee health, life events, etc. all have way more impact on employee productivity than their 'score' at interview time).
> no other part of the interview pipeline was predictive at all. Not whiteboarding, not presenting, not personality interviews, not culture fit testing, not credentials, or where experience came from, nothing
Did you test predictive power from individual interviewers? At a company I worked at previously we did, and this was the by far the best overall predictor: some interviewers just did a much better job at identifying those likely to succeed than others. Which can explain another reason why you didn't see much predictive power if you looked across those other items over all interviewers - the variance between interviewers essentially "swamps" any smaller differences between those interview techniques.
Note this didn't surprise me that much, as you see this dynamic in lots of other "person-to-person" endeavors. For example, when looking at whether one type of psychotherapy intervention is better than another, most of the data that I've seen shows that by far the most important factor is the skill and "match" between therapist and client, far more important than any individual modality.
We did. There were small differences between them, same with what questions they asked. But nowhere near as predictive as the code work. I suggested but was never able to get approved, removing interviews entirely.
Again, we didn’t have enough data points for real statistical validity so it could be that, but I became convinced that it didn’t matter who was interviewing or the format of the interview. Some candidates are good at interviewing and some aren’t but that didn’t hold to the job.
> there was virtually no predictive power to 'secret test cases' that we ran without providing to the candidate.
this brings back some unpleasant memories of a take-home i got from a FAANG.
basically i was given a loose spec to implement, with no real data or test cases (and was told that none would be provided when i asked). after submitting my work i received a terse rejection with 0 constructive feedback for my 6hrs of work. uncool.
In one interview, I was given a timed hacker rank problem with a screen share with 2 interviewers. The sample tests passed and the real tests passed except for 2 (from what I remember) that timed out on large data sets. Before the tests were run, I already highlighted the part of the code that's the bottleneck and asked if I could copy the code to Visual Studio (the test was in C#) because the standard lib has a data structure for this use case that I hadn't used in a long time but I couldn't get the code to compile on hacker rank. I wasn't allowed to use the IDE and I was also denied access to the standard lib documentation (in front of then through the screen share). I couldn't implement the data structure within the time limit. I failed the interview. I still wonder what the point of that test was.
I always feel like that type of coding interview is a sort of engineering hazing. I know I am often consulting documentation, especially when working in a new problem-space or less familiar programming language!
I always try to give candidates the benefit of doubt with silly things like syntax or whatever since it's not like I'm interviewing for a live coding performer!
Same experience. This is why I will no longer do take home tests that take more than 90 minutes or look like they'll take more than 90 minutes (even if the company misjudges it).
The only exception I've made is if the company pays for the time.
Fwiw that job we had an explicit goal of 60 minutes or less and tested that against engineers we’d already hired.
I’ve heard guidance that said up to 4 hours was a fine. That might have been true back then before employers abused the system and made code tests just another hoop, not a replacement for, interviewing.
I've recently encountered similar assessments. I asked for feedback or the test cases but got none. What do you think the best option is to learn from the projects?
post it on stack overflow or reddit for feedback :D
typically they tell you not to post your solutions publicly, though i don't know why you'd be inclined to respect their wishes after such an experience unless you're dying to work there in the future; the main thing i learned from the project is that i didn't.
The problem with keeping these stats is that it only tracks engineers that were hired. I don't think coding interviews are a good predictor of performance, and that's not why I use them.
The point of a coding interview is to eliminate, as fast as possible, people who simply can't code. I'm being completely serious here. They can even have a CS degree (or will claim to but if you look closely they were in an easier program to get into and took CS electives) but cannot write a simple program on the board in an hour.
It's also why I don't like take-homes. First it's trivial to cheat (I don't mean lookup stuff online, just flat out have someone else do the work) and because of that the final stage would still have to be in-person whiteboard (or pair programming over Slack but still have an engineer spend 40+ minutes with the candidate).
That was the purpose of the original fizzbuzz but for whatever reason it seems to have morphed into “spend all your spare time on leetcode so that you can answer whatever arbitrary problem is thrown at you” and they have the audacity to call that a meritocracy.
We used the same takehome for years, and eventually there were a few solutions online that were easy to find. But for some reason, they all sucked, so we never had to worry about unqualified candidates copying them.
The clear requirements of take home tests make them my favorite. They allow me to express how I work: get a list of reqs, walk away and think about them, make some decisions on directions, then let the code lead me.
I strike the style required. I capitalize on opportunities to make decisions I can discuss. "I used tape instead of jest because this example product will be distributed to many developers. The reduced API-surface area keeps us focused on the how's not the what's."
I tone that down if the role seems more rote-work like, at which point I try to highlight my ability to solve problems and learn quickly. For example, a comment above some network call: "// I was getting a cors error and found out I can run my own proxy for this"
Trouble is, unless more of the industry starts doing them so they're unavoidable, I'm going to skip companies that put these anywhere other than at the tail end of their process.
I'm not putting in half a day of work for zero pay to help you with your first-pass weed-out phase before we bother to make sure we align otherwise and this looks like a good fit. Thanks, bye, next (employer) candidate.
I like them because I have a far higher success rate with them. Take home tests cater to my strengths. Of course, I'm selective about the companies I apply for, and sometimes the test itself reveals something about the company, and I've rejected assignments after receiving them.
I agree and generally say I'm willing to an alternate, live coding approach instead. I'm not putting in hours and hours in some random take home that may or may not be discussed down the line. Been there and done that so many times. Most of the time it didn't even align with the job.
I think we should use interviews for basic screening purposes only and skip determining "who is awesome from just signals". Instead, shift to a flexible hours paid trial period where potential colleagues get a better assessment. Measure by doing and interacting around doing, not by guessing, hazing, trivia, interrogating, or whiteboard hand-waving.
If you are talking about short term trials many devs are bound by anti-moonlighting employment agreements that either outright bar working for someone else or require notification.
For long term trials you severely limit your hiring pool because that is effect temp-to-hire which many devs simply will not do.
The first issue could be fixed legally. Just like California makes non-compete in-enforceable, it could pass a law that says short-term moonlighting can’t be in employment agreements. This way, you could take a week off from your current job, and actually work for a week for an employer that you are interested in. Fully paid.
This would mean you would have to work a ton of extra, waste significantly more time than a day with the current interviews approaches, or be "interview hopping" with no steady job for an extended period if nobody hired you. Which could have gaps between "moonlight" sessions. Which could mean you end up broke.
This would maybe work if every employer did it and it was easy to pick up a new trial quickly, but the reality is that the time from application to hire can be weeks if not months at most companies!
No way I could risk having to find another job if the trial went poorly.
We compared their performance review scores. I was always leery of this given how fraught performance reviews are, but that’s how the company judged employee ‘worth’ so it made sense to align there.
I honestly dont remember. It’s been a long time and I wasn’t privy to the assessments just the scores (and they were anonymized before I saw them). I do remember them being several 5 point scores.
It wouldn’t surprise me if they were just manager assessments.
We don't use coding tests in this way. We use coding tests as a screening process to ensure the candidate is in the correct ballpark.
If we are recruiting a senior, we would expect them to easily complete basic technical tests. If they are more junior we might use them only as an indicator of their ability.
I don't particularly expect a strong correlation between how well they did in the tests and their long-term ability since their value is made up of many things, only one of which is their ability in the tests.
Yes. I think we all know that interviews are not perfect so we won't be overly strict on anything. Do we think they will get on with others? As long as they are not a definite "no" then that's fine. Do they seem interested in the company? Same thing.
There is only time I was still unsure and didn't want to waste the candidates time so rather than telling him sorry, I set him a paid coding test to develop a microservice in order to judge his style, how long he took, what questions he asked etc. I didn't think the result was good enough but because we paid him, we parted on good terms and he had some good feedback.
We do this. We have a scale of how in-depth we expect people to get based on their level. We ask candidates questions about specific projects they've worked on and we ask them to be more specific until we feel we have a good understanding of how deep and wide their understanding is. We dig deeper for more senior folks. It seems to work pretty well.
I'd like to point out that success in a role depends on more factors than the technical interview.
I have found that investing the time to correctly onboard new team members makes a huge difference. Correctly onboard an average/good hire and they go on to produce solid output and often thrive. On the other hand, you could have a great new hire but because of no/poor onboarding they "sink" instead of swim.
The company would have to be fairly large (>100 employees) and long-lived (>10 years) to generate an amount of data with any hope of statistical significance. Employee "success" depends on many factors, and an employee who can seem to be a failure in the short-term may end up becoming very successful (or vice-versa), simply because of external circumstances---the nature of the projects, the clients, colleagues, etc.
Setting the bar at Fisherian statistical significance is letting perfect be the enemy of good.
This seems like one of those occasions where improving your reliability by just a few percent (even if far from statistical significance) can massively reduce costs in the long run. (Maybe Kahneman even used interviewing as a concrete example of this in his latest book?)
and maybe a follow up question (to measure the false negatives)
Do you check the applicants who were denied based on their test and see where they ended up working at. E.g. you are a mid tier start up who rejects someone who ends up working at amazon as a high level engineer – do you mark that a failure?
Let’s think about this for a second, if I apply to be a mid-level engineer at Billy Bob’s software development firm, but I’m capable of getting a job as a senior engineer at Amazon. Odds are I’d only stay at Billy Bob’s software up until I’m able to get my Amazon job, considering on boarding a software engineer can easily take six months, that means you only get six months or so of actual work out of this person. If that, they might just work at Billy Bob’s for three months until they get their FAANG offer and then just leave Billy Bob’s off the résumé
I'd be careful to call that out as a negative; if the culture fit wasn't right, and the candidate would have been a net negative to the team, it shouldn't matter where they end up next, unless (of course) the candidate that was actually hired ends up being an even worse fit (ergo the need to fix your hiring process).
> I'd be careful to call that out as a negative; if the culture fit wasn't right, and the candidate would have been a net negative to the team, it shouldn't matter where they end up next
I'd be careful to presume you can know these things from an interview.
> unless (of course) the candidate that was actually hired ends up being an even worse fit (ergo the need to fix your hiring process).
Total lack of self awareness in the corporate world really is an amazing thing to behold. I suppose this is "iterating" (in HR speak, not code speak): taking a set of criteria which generates a wrong conclusion, and then applying all that to ancillary things to find more wrong answers.
On the other hand, "they would not have been a good fit" sounds suspiciously like a blanket, non-falsifiable denial of failure. I other words, bullshit.
See also: "you don't have enough experience," one which I most recently heard myself after four interviews and a technical assessment, in which my (passing) solution included a bugfix to the test itself.
"Culture fit" has become like the currency of recruitment. Supposedly what you may have to pay with for a potentially great engineer (technically) about whom the hiring team didn't feel comfortable with. I think the original question is a great one. Do we actually put these intuitions to the test?
For developers, coding tests that include deployment / infrastructure components (i.e., deploy your solution to a cloud container, or, build and compile your solution for desktop platform testing) are uniformly consistent with long-term impact / success. Problem solving at the algorithmic layer may be inversely correlated to success, if a candidate lacks a production skill set.
Unless one's focus is research and development, there is a non-zero cost to training for production skills, so it's best to start with someone who understands the delivery process.
Linear metrics are probably less useful, inasmuch as it will become rather obvious as to which employees are self-starting and work well with others, versus those that require motivation or are staunch individualists.
As a developer with an arts background (studied music, happened into tech work after college through years of mostly self-taught hobbyist tendencies) I'd agree that the design of these tests themselves is a factor that's thoroughly understated.
Timed algo challenges encompass a slew of antipatterns in terms of how good code is actually written and shipped. To begin with, pitting someone against a clock and hidden test cases (and a foreign editor) is actively optimizing against solutions that are readable to other human beings -- or to the person writing them, a year from now. The nature of running them in a browser means that it can't evaluate a person's capacity to actually use tools outside of core language functionality. Never mind that building the entire exercise around predetermined test cases precludes any way to gauge whether the person taking it has an understanding of writing tests.
And that's assuming your test environment doesn't add obnoxious and arbitrary restrictions of its own. Like telling you that using documentation is cheating. (Btw imagine listening for ctrl+t here, but not ctrl+n.) Or offering you "the language of your choice," but then throwing API call exercises at you while limiting your choice of JavaScript runtimes to a bare installation of Node -- the only one still in active development, out of a list that also includes every browser you would use to actually access the test -- that doesn't support Fetch.
With the original question being about measuring correlation between interview performance and actual long-term performance on a job, I'd love to see the numbers you are basing your opinion on regarding testing for "production skills" and long-term job performance.
- Boolean yes/no qualification by a manager at 2 years of tenure compared to a boolean yes/no qualification upon performing the interview task (yes, they showed skills that relate to deployment)
- Performance review ratings over time (I would assume those cover "team interaction" and "declared reponsibility")
- Length of tenure ("lasted more than 6 months", ..., "lasted more than 5 years", "quit themselves"...).
...
None of these are perfect, but they are pretty unambiguous even if sometimes subjective.
As long as you measure while fully understanding that you're unlikely to achieve statistical significance soon, you can still measure stuff! And while you should not take trends that show up as gospel, they could still influence your interviewing.
Or, you could simply decide not to measure anything since you'll never get perfect results. But at least don't put your opinion out as fact, at least not in a thread about measuring stuff.
The more fundamental question: is your company meaningfully able to measure the long term impact/success of its employees? If so, how?
The submitted question seems to just brush over this aspect, but so far when I've tried to evaluate interviewing techniques that has been the primary obstacle; people just can't agree on what success means once employed, so anything that tries to correlate interviewing to that will be an equal amount of junk.
I think my favorite story of code tests was where one interviewer presented the test, gave me 24 hours to complete it, and I was then supposed to be "graded" by second team member. The second guy obviously didn't understand the requirements of the coding test (despite presumably receiving the same written instructions I received), so rejected me outright. Which I guess kinda gets to my thinking on coding tests, where you often learn a lot about companies by the crappy "tests" they think have merit.
I interviewed hundreds of technical people in my career, across dev, test, and ops skill sets. I saw limited correlation between tests and aptitude. If you talk to someone about a project they've done, you know pretty quickly:
1) Can they communicate technical ideas?
2) Can I develop a rapport with this person and work together?
3) Do they understand what they built? Can they talk about tradeoffs they made? Did they learn anything from the experience?
A fizz buzz test isn't a terrible idea, but you also have to have an interviewer that understands how to administer it within the wider context of the interview. If the interviewer themselves doesn't understand it, they aren't qualified to actually administer it.
There is no score or measurement. The task is to write a stopwatch in any technology you want and explain it along the way. Then we put in some bugs and ask for troubleshooting. It's all about the approach to the problem.
I've never worked for a company that did (18 years in the industry this year). Of the 8 companies I've worked for, only one had interviewing figured out, and they didn't track or measure metrics on coding tests, challenges, etc. They did allow the challenges to evolve and they were tailored to the position that the tests/challenges were for.
The big FAANG do for what it's worth. They have entire ML pipelines looking at hiring. The following isn't about interview effectiveness but is one example of the analysis done:
Not empirically, but my manager focuses primarily on the engineering expensive for our team and potential hires. This results in explicit feedback gathering, modifying our process accordingly.
We have short, standardized, broad interviews. We look for what can be added to the team rather than poking holes, and we're still trying to improve.
The only way this could even conceivably be done in a scientifically valid way is randomized controlled trials, which would mean not giving the same interviews to all candidates, which is only possible if hiring at a large enough scale as to even be able to sample meaningfully from multiple "interview type" groups, and it would of course require it to be legal to give different interviews at random, which I'm not sure is true. I guess as long as it's actually random, you're not discriminating against any specific group, but it isn't exactly fair and you risk killing goodwill of your employees when they realize you're running experiments on them.
Of course, it's really not possible at all to do this at the level of rigor expected of, say, clinical trials. Each new hire will know what type of interview you put them through, and there is no reliable way you can prevent them from telling others.
I would say anything having indirect correlation has no easy way to measure. Ultimately a company is either looking for product market fit, customer growth or revenue/profit/cash flow depending which stage the company is in.
On top of being hard to measure, the data points generated through hiring is just too few and the data collection process is too long and subjective
Just ask your team if they like the new hire, can they make progress together. Things like do you like working with the new hire? Is the new hire bringing in new insights to the team? Is the new hire easy to work with? Is the new hire learning new things.
And most importantly, can the team let go of mismatch fast enough.
Overall I would say it is just not worth it in measuring hiring.
We tried that at one company I worked for and it worked well enough. Our contract with the consulting firm said if we dumped the contractor within 90 days we didn't pay a cent for any of their time. This resulted in the consulting firm only sending us candidates that had good reviews in prior engagements. And good reviews from prior engagements strongly correlates with good reviews in future engagements.
I want to say it was TEKSystems. Some caveats here. We may have had to sign an exclusive deal with them, that we would not hire through anyone else. Also, we were a big client with an extremely good relationship with them, they made a lot of money from us. Also, it could've been 30 or 60 days on the limit. But the concept was in place.
The main thing that matters is training managers properly. Training management to be clearer about how they communicate goals and how transparent they are. The fault is not with candidates. Making sure a candidate can communicate clearly and effectively and has some passion for the position is all that you can really do at the interview level. The rest is quite frankly having better management and a culture of being helpful.
Metrics on your org should be about how clear are the processes and planning toward goals and how well do they get communicated and executed. I worked for a MAAAN company and this one didn’t get it right. I figured they just made the decision that it is better to crank through people than actually grow them - since they were never short of candidates. This was pretty clear from their promotion culture and assessments that rewarded selfishness.
Bottom line... train managers. Build the scaffolding to grow competent, empathetic, managers. Communication and clarity and empathy wins over everything else. F** programming test hazing. Commit to the people in the organization. Done.
At that company (which was a ~200 engineer, privately held, software company) we found a few things: - in person tests were less predictive than take home tests. - tests that did not provide automated test cases as examples were less predictive than those that did. - there was virtually no predictive power to 'secret test cases' that we ran without providing to the candidate. - no other part of the interview pipeline was predictive at all. Not whiteboarding, not presenting, not personality interviews, not culture fit testing, not credentials, or where experience came from, nothing. That was across all interviewers and candidates.
A few caveats about this: - this was before take home testing had become widespread and many companies screwed it up. At the time we were doing this it was seen as novel and interesting by candidates, not as just one more painful hoop they had to jump through. - we never interviewed enough candidates to get true statistical relevance. - false negatives were our biggest concern, they are extremely hard to measure (and potentially open yourself up to lawsuit). The best we ended up doing was opening up our pipeline to become less selective to account for it. This did not seem to reduce employee quality.
In a more meta-sense, that experience led me to believe that strict hiring pipelines are largely not useful. Bad candidates still get through and good candidates don't. Also, many other things have a much bigger outsized impact on productivity than if a candidate was 'good'. It turns out, humans do not produce at consistent levels all the time and things outside of what you can interview for make more impact (company process, employee health, life events, etc. all have way more impact on employee productivity than their 'score' at interview time).