Unfortunately I can't share the exact questions we ask, but here's a smattering of the sorts of pitfalls I encounter, in no particular order:
* Number one is probably no Big-O performance considerations, or Big-O is an afterthought. For the love of God, please please please don't do a linear search on an unsorted array anywhere inside a nested loop. If you're going to do any appreciable number of lookups, preprocess your data into a hashmap, hashset, whatever.
* No or insufficient considerations for edge cases. What if the input array is empty? What if I pass in invalid arguments? What if the strings aren't ASCII? What if the numbers you're multiplying are very large? What if the input is coming from an untrusted source?
* No questions about the size of the input. Some of my favorite questions are those whose solutions bifurcate on whether the input can fit on in memory, or whether it's coming in on a network stream. Some questions involve multiple arrays, and I always like to ask "you assume A is much bigger than B, but what if A is much smaller than B?"
* We place a great deal of weight not just on the solution, but on justifying why that solution is the most appropriate one. More often than not, this hinges on the Big-O performance and the system on which we're running. It's a mistake to dive right in to your solution, because it robs you of the opportunity to demonstrate that you have multiple fully-formed ideas knocking around in your head and you need to choose which one to code up. Otherwise I might get the impression that you only have one way to solve a problem.
* Sorting the entire array when I ask you for the top N elements! Getting the top handful of elements is a linear-time operation, people!
Bonus points:
* No consideration for parallel processing. I've run maybe like two serial production workflows in my entire career. Try to parallelize your solution to handle bigger inputs.
* Don't forget cache performance! I usually let candidates working in Python or other high-level languages slide on this one, but those odd C/C++ candidates have a major opportunity to impress me by optimizing their memory access patterns for cache performance.
* I let this one go for intern and entry-level candidates, but testing should never be an afterthought. The moment you finish your implementation you should throw input/expected output pairs on the whiteboard. You don't even need to walk through them super carefully, just show me that you know how to design test cases.
And I hate to say it, but this right here is the problem with the industry.
You're expecting a junior dev to know and apply details of edge cases, complex character sets, runtime and space complexity, parallel processing and behavior of caching.
This is not junior level knowledge. You're looking for a mid-level to senior developer with zero experience. Yes it's possible to find, but that's not junior level knowledge. But most of them all things that are easily learn-able on the job. You're expecting a junior dev to be familiar with and easily able to apply all aspects of software development but just not have done it. This is absurd.
A junior dev should be able to handle bug fixes, well scoped and defined features, and have a small area of ownership. Guess what, they're working for you, they need to find the 5 largest elements in an array. The sort the entire thing and then take the top 5. They send their code for code review. You have a 5 min conversation with them on how you don't need to sort the full thing. They say "ah that's cool, hadn't seen that trick before". They now implement it and know it for life.
This absurd idea that you wouldn't hire this person instead of investing in a 5 min conversation with them is a large part of the problem with the industry.
My thoughts kinda echo this, but I don't think it's outlandish for a junior dev to know a lot of this - what's less reasonable is knowing it with enough depth that they can bring it all to bear without prep in an interview. They've had fewer opportunities to put these things into practice, so I'd only expect them to demonstrate proficiency if they knew ahead of time to study those fairly specifically.
Exactly. In junior interview I accept passing mentions and hand-wavey verbal solutions. What I'm referring to here is a complete absence of awareness of these things.
I should be clear, I'm not expecting someone to write up a bulletproof solution to every possible edge case on a whiteboard in 35 minutes. I expect them simply to be familiar with the things that can go wrong. Add a check at the beginning to return false if the array is empty. Tell me your solution will get hairy if I expect you to handle Unicode, have a three-sentence exchange with me as to why, and implement an ASCII-only solution as a first pass.
My time, and the time of everyone on my team, is far too valuable to be spent on teaching new hires things they should have learned on their own in junior year of college. Onboarding people means bringing them up to speed on the complexities and specifics of our own systems. We're not in the business of hiring people and holding their hand for a year until they know enough CS to be useful.
Also:
> This is not junior level knowledge. You're looking for a mid-level to senior developer with zero experience.
Zero professional experience, maybe. Zero experience period, no. This stuff is offered in almost all undergrad CS programs. If you didn't go to college or have a degree in a different field you can find tons of lists of topics for self-study. If you've ever done any project on the side, you're bound to have encountered or at least imagined a couple of these issues. Something as basic as finding the largest N elements in a array isn't a trick, it's something that should be painfully obvious to anyone who's coded for more than a month.
> Something as basic as finding the largest N elements in a array isn't a trick, it's something that should be painfully obvious to anyone who's coded for more than a month.
Not a trick? Of course it's a trick.
The efficient way to do it is to heapsort the array, but just stop after you've popped N elements from the heap.
You can do much worse and still stay in pure linear time (though as I noted in another response to you, the difference in polynomial order between O(n) and the O(n log n) that would be required to finish your heapsort is ε, where ε is larger than zero but smaller than any real number): just scan the array N times, looking for the largest element that's smaller than your shrinking upper bound. I'd rather see a solution that sorted than one that took N passes to pull N elements -- sorting will be faster -- but the N passes approach is pure linear time.
Then again, naive N passes will fail when the array contains duplicates. To avoid that, you'll need to allocate a data structure to hold your N answers, and implement a way of adding values in and having it appropriately discard the lowest value when you add to it while it's full. That's starting to look like a trick. It also cuts you down to one pass.
Is the "not a trick" answer you're looking for implementing an N-running-maxes data structure, or remembering how to heapsort? Do you really care about getting your results in slow linear time instead of fast "so close to linear you can't even tell the difference" time?
I see where you're coming from on some points, such as complex character sets and caching. But runtime and spacetime complexity are basic concepts which should be taught in any respectable CS degree program. Looking for edge cases and writing good tests ought to be taught as well. Also I'd hope parallel algorithms would get some time in a CS degree, but that one could be somewhat negotiable.
Can you define what a 'respectable' CS program is? I'm currently taking a core CS class online for fun at a major public university and its nothing earth shattering. In fact, I'd go as far as saying that I get more out of blogs and study aides like 'cracking the coding interview' than this class.
Somehow i got into Tech Industry without a fitting degree and found myself in a smallish (30 people) company as the only SysAdmin. Exploring, setting, and controlling all technical parts this office needs: Multiple Linux VMs with different services like smb, confluence, ...; AD- and Terminal Windows Server; Device-Managment of all used MacBooks; and so on.
I always also liked to code stuff, and always found a way to archive what i tried to solve. But of course i never build something big or even though about things like super efficient ways to archive what i did. I always though that when i keep going and try to build my small personal projects, one day i can maybe start as a junior dev. Even without the degree. But your post somehow states that a junior dev has to think about edge cases even before they occur. As a Sysadmin i also have to do this. But i always imagined that the job as developer is no one man show, and that these edge cases will be found together. Code will be reworked when a more experienced team member found a pithole.
First off, congrats on landing in your role! Sysadmin and development are completely different skillsets in my mind, and I have as much admiration for sysadmins as they seem to have for me.
My company doesn't really do small projects anymore, so we take a lot of care to make sure we hire people with the discipline and ability to handle large systems. There's certainly a lot of organizations that don't have that requirement, though, so some of this assessment may be a little harsh.
As for code review catching errors, that sounds great in theory, but in practice it's just rarer than you'd think. Think about it, your reviewer is always going to spend less time on your code than you did, and they're always going to have less context. I find that unless the error is egregious, it's almost never the code reviewer that catches it. You're not really a one man show, but you're pretty close.
Unit testing is by far a better approach. I catch easily ten times as many bugs in my code by unit testing it as by manual inspection and code review combined.
Most of these things are not “junior” level at all. There are exceptional people though and I suppose it makes sense for you to try and hire those. But large amounts of disappointment are very much part of the process then.
>Number one is probably no Big-O performance considerations, or Big-O is an afterthought. For the love of God, please please please don't do a linear search on an unsorted array anywhere inside a nested loop. If you're going to do any appreciable number of lookups, preprocess your data into a hashmap, hashset, whatever.
Big O should be an afterthought - the MO of any developer ought to be to get it right and then refactor - and worry about speed after profiling. Premature optimization is an antipattern you should be selecting against, not for.
> Sorting the entire array when I ask you for the top N elements! Getting the top handful of elements is a linear-time operation, people!
Unlike your other points, this one seems unlikely to cause any trouble in practice. From a polynomial-order perspective, a factor of log N is literally infinitesimal, essentially free.
From what you asked, I can deduce you work in an environment where performance is important.
For my startup (and I'd say at least a non-negligible number of companies), half of what you ask is just not important.
Background to read what follows: small 5-year old startup with 2 developers, which don't do anything "complicated" like ML, computation or things like (mostly apps, with a CRUD back-end).
I'll try to rephrase as what I'd expect instead:
* Big-O is not important there. What you need to know is to have some knowledge to how to leverage your DB to do things instead of looping yourself over data just queried from your DB. And that's not number one.
* Edge cases is actually a valid concern for every programmer, no matter the field
* it might be useful to have a rough idea of what you expect to offer a proper solution, but it's generally pretty obvious
* justifying a solution is also a valid concern for every programmer, no matter the field
* manually doing operations already handled by your framework/library is a red herring, barring very specific situations that require an explanation. Be it sorting, filtering or whatever.
Bonus point
* bringing manual parallel processing reeks of premature optimization (as an environment where performance isn't generally a problem, remember). It has to have a real explanation, and other obvious optimizations have to be done already. So far, I've never reached that point in my professional career (but tools I use DO use parallel processing, I just don't roll it out myself, that would be NIH syndrome)
* don't forget caching! If the cache performs badly, it's time to start investigating why, but you don't need to worry about cache performance if you don't do crazy things, for the most part.
* Testing is also important wherever you go, but I also let this one slide because entry-level candidate aren't teached that in school, unfortunately.
The thing about performance is that it doesn't matter until it does. A young organization can easily get away with hiring people who don't know about or don't care about performance for years. I get it, features need to get pushed, the business needs to move on. That's fine in the short term. For some organization even longer.
Sooner or later, though, that approach will come back to bite you. Milliseconds matter to the user because the longer they wait for an action to complete, the more likely they are to stop caring. They matter in settings where you're performing multiple RPCs because a millisecond of delay here and there adds up over multiple calls. Not to mention the fact that memory, CPU, and disk all cost money. And when those factors come up, you'll suddenly find yourself surrounded by people who are (at worst) incapable of addressing the situation or (at best) will need to be ramped up on the issue.
And that's just constant factor improvements. Can you imagine choosing an O(NlogN) solution over an O(N) one? Everyone thinks that's trivial, but in real terms it means a 10x speedup per thousand inputs and 20x per million inputs. That's huge.
More importantly: in interviews, a lack of care for performance suggest to me that a candidate is willing to cut corners. Once they have any old solution they pat themselves on the back and move on. We're a very self-driven organization, and in my experience if a new hire is struggling to meet expectations 6-12 months after joining, it's usually because they don't constantly ask themselves how they can improve their work and our codebase and instead expect to be told what to do.
Minor pet peeve: parallel processing is not premature optimization! If you think it is then I suggest you exercise it more. It's not as hard as people make it out to be, and at a certain point it becomes second nature. Then all of a sudden terabyte datasets start looking trivial...