My favorite coding question to give candidates

hornban · on Nov 14, 2023

Asking candidates to come up with this kind of solution in an interview setting where they are under all kinds of pressure is honestly dehumanizing.

There's a lot of good insight in the article about the correct way to approach the problem, but asking anyone to come up with it on the spot is unrealistic. You have the benefit of having seen the problem before with time on your side to reflect on it. They haven't.

When I do interviews like this, I prefer to talk them through the problem together, like we were actual teammates working on a problem together. That more closely relates to life on the job, which to me is the point of interviewing someone.

dnsco · on Nov 14, 2023

Many people (especially from big tech backgrounds), treat interviews as "the time for the candidate to prove that they are good enough to work at my company". I, like you, prefer to use the time for collaborative problem solving to try and get as much signal as possible about whether it would be fruitful for us to work together, while also trying to figure out if we would want to work together.

The "is this person good enough for me" interview allows geniuses who are assholes through. I prefer to filter for good teammates.

cperciva · on Nov 14, 2023

If this is considered dehumanizing, I'm concerned for the future of our industry. Anyone with even a first undergraduate algorithms course should immediately spit out "sort by customer and page, then stream the output".

I wouldn't expect everyone to immediately see the optimizations -- you can sort the days individually, you can drop all but a small number of pages per customer -- immediately, but I would be disappointed if someone couldn't be hinted there by the end of a 30 minute interview.

Failing to even get the O(n log n) solution tells me that someone should never have graduated.

arp242 · on Nov 14, 2023

> If this is considered dehumanizing, I'm concerned for the future of our industry. Anyone with even a first undergraduate algorithms course should immediately spit out "sort by customer and page, then stream the output".

Well yes, but that's not what this is about. It's about social pressure, not ability.

I could program this for you right now if I wanted to; most "optimisations" seemed the "obvious ones" to me, so I guess I would have passed. I could also program it in a hurry at ludicrous speed under pressure because all of production is down or whatever.

But ... I wouldn't be able to program this in an interview setting.

Last time I was asked a bunch of even easier things (in a timed interview, no less), where I literally have some exact code on my GitHub. I just couldn't finish it. Not because I can't "get" it, but because programming while two strangers are judging your every keystroke and commenting on what you're doing is just nerve-wrecking.

Or, basically this sketch: https://www.youtube.com/watch?v=kQa5NsdYSts

thaumasiotes · on Nov 14, 2023

> Failing to even get the O(n log n) solution tells me that someone should never have graduated.

This is something I wonder about in my comment sidethread - to me, the natural solution is the O(n) one in O(n) space.

I see that the O(n log n) solution requires O(1) space to run. But it requires O(n) space to return a result! Is that actually better than O(n) time and O(n) space?

xoranth · on Nov 14, 2023

With the sorting solution, you can stream the result directly to a file _without_ holding the entire file in memory. So it can be better if you are memory constrained (e.g. if the files are bigger than what you can hold in memory), or if you want to hold the working set inside the CPU L3 cache as 'cperciva is alluding to.

cperciva · on Nov 14, 2023

Either of those solutions is ok from a junior developer.

From a senior developer, I would expect a discussion of in-core vs. out-of-core operations, and the fact that hash tables (even if they fit into RAM) aren't really O(1) time per operation since random access to large address spaces runs into e.g. TLB misses.

xoranth · on Nov 14, 2023

Given that the depth of the page table tree is fixed at 3/4/5 depending on the architecture and kernel, I think most people would be confused if asked whether the average complexity of a hash table lookup is always really O(1).

Even senior developers that understand the TLB would likely think of it as "O(1), but with a big constant factor".

On a side note, one thing that the article doesn't mention is the possibility of using a Bloom filter. It trades an extra scan of a file for smaller hash tables/smaller set to sort with external sort.

hughesjj · on Nov 15, 2023

Yup, was surprised bloom filters weren't even mentioned in passing. I suck but still it's crazy to see some people genuinely would have a problem getting the O(n) solution when they've been in the industry for more than 5-10 years

seadan83 · on Nov 14, 2023

O(1) = c * O(1) = c * O(n^0)

Whether you get a cache miss or hit is part of 'c', or whether those bytes are fetched over WAN. Big oh of one just means lookup time is a constant and is independent of data size. The big oh doesn't change if move from a SSD to a tape drive

Thus, those operations are really O(1)

Big oh is the infimum function such that for an arbitrarily large N, the runtime of the algorithm is then strictly larger than C times the big oh function. (Paraphrasing there a bit, and on a phone, apologies for not pasting the exact definition)

cperciva · on Nov 14, 2023

In a formal sense, anything you can compute on a computer is O(1) since there's only a finite number of states. But we tend to hand-wave that away.

Hash table lookup is absolutely bounded by a constant, but going from L1 to L2 to L3 to RAM with anywhere between 0 and 5 TLB misses is far from constant.

seadan83 · on Nov 15, 2023

> In a formal sense, anything you can compute on a computer is O(1)

I don't think the definition of Big-Oh supports that conclusion.

"Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity." [1]

"For example, the time (or the number of steps) it takes to complete a problem of size n might be found to be T(n) = 4n^2 − 2n + 2.... As n grows large, the n^2 term will come to dominate,... say that the algorithm has order of n^2 time complexity" [1]

[1] https://en.wikipedia.org/wiki/Big_O_notation

> Hash table lookup is absolutely bounded by a constant, but going from L1 to L2 to L3 to RAM with anywhere between 0 and 5 TLB misses is far from constant.

Indeed, but I can choose a value "C" that is large enough to negate any of the differences between cache miss or hit.

Let's say "N" is the number of lines in one of our text files. Once we have put that data into a hashtable, the claim is then that lookup time is O(1). You counter, and state, no, the lookup time is potentially 30ms, or even 100ms (making up the numbers). For my claim that the lookup time of O(1) to be true, I can pick any value of 'C'. So I choose 1 million. I'm basically saying, lookup times will always be under 1 hour. Given our precondition that we have built this hashtable, no matter how we get the data, yes - it will be under 1 hour. If we used carrier pigeon with USB stick, we could just choose a larger value for C and still be correct.

The idea is basically, if you have something like N^2 + N^1, eventually when N is large enough, n^2 will become so dominant that you could multiple by it by a fixed value to wash away all the other terms in the expression.

Thus, while the actual lookup time can and will vary, it is still bounded. Keep in mind as well there are different functions for best case and average case performance. These notations do take into account that the per-instance lookup time can vary.

xoranth · on Nov 15, 2023

> Hash table lookup is absolutely bounded by a constant, but going from L1 to L2 to L3 to RAM with anywhere between 0 and 5 TLB misses is far from constant.

Wait, don't the inner nodes of the page table contain the _physical_ addresses of its child nodes? Wouldn't that make it a most one TLB miss per hash table lookup (assuming key and value are adjacent in the hash table bucket)?

seadan83 · on Nov 25, 2023

Bounded by a constant does not mean constant lookup time.

Let's say the runtime of some function is (x1n + x2n + x3n), where x1, x2, and x3 represent the L1,L2 l3 cache lookup times.

A function F is called big-oh of function G if there exists a value C such that CF is strictly greater for all sufficiently large values of N

For the above example, let's say I choose the value one billion for C. Is it true that 1 billion times n will be greater than the sum of (x1+x2+x3)n. The answer is yes, thus, f(n) = n is big-oh (x1+x2+x3)n. In other words, at a certain size of n, we can pick a constant multiple that dominates the other terms. This is in part why asymptotic functions are independent of hardware. Even if L3 cache were actually virtual and fetched over a network, I could still just choose a larger constant. If you can't choose such a constant, then chances are you need a faster growing function like n-squared

cperciva · on Nov 15, 2023

It depends how you define "TLB miss". Read it as "paging table reads" if that's clearer.

If you have a 64-entry TLB and 4kB pages, then you can do random memory accesses (aka hash table accesses) up to 256 kB without any paging table reads; up to ~16 MB with one paging table read per operation; and up to ~1GB with two paging table reads per operation. (This, of course, is why large databases use superpages when possible; but while that pushes the limits back it doesn't eliminate them.)

In virtual machines there's further overhead since you have "virtual physical" addresses...

8note · on Nov 15, 2023

I'd expect talking about TLB misses as a junior engineer thing. That's stuff you learn in school.

Id expect senior engineers to talk about how they will monitor this system, and to bring up things like customers we know we don't trust.

A senior engineer might tell you that running into TLB issues makes a suggestion that your approach is wrong and it's about time to move the solution to your data warehouse

SOLAR_FIELDS · on Nov 14, 2023

I have a non traditional background (non comp sci) and while I understand complexity I wouldn’t have immediately arrived at the solution like this.

I think that rather, I would have done well because I know after doing hundreds of these both as the interviewer and the interviewee that using some sort of Map along with some vague rumblings of “space/time complexity tradeoff” is very often the solution for these types of questions. So I would have immediately gone there first by instinct.

bfung · on Nov 14, 2023

The question asked doesn’t involved any computer science algorithm knowledge at all, nowhere near leet code complexity.

Only the basics that are close to everyday programming work: write a for-loop, know what a Map/dict is, and Google for “how to read a file”. If a candidate can’t do that, they can’t really program. ChatGPT can probably code this answer.

sfn42 · on Nov 15, 2023

And now you know who the devs are who think LLMs will replace us. They're the ones who think this question is too much to ask.

munksbeer · on Nov 14, 2023

Agree. Our interview technique is a trimmed down but somewhat realistic coding exercise. We do it as a pair programming exercise and we tell them up-front that this should be an interactive and positive experience, we're not trying to trick them or put them under pressure. It'll test the candidates ability to actually do the job they'll be doing if we hire them, which is to write code in an IDE. We do it with them, so they can ask questions on libraries they're not familiar with and so on. Obviously we're not going to solve the problems in exercise for them, but if they do get stuck, we'll help move them forward.

The coding exercise will involve writing tests, deisgn of classes, working with floating point arithmetic, working across thread barriers, some fairly standard maths, and so on.

I actually can't say we're hired a bad candidate yet and they all report that the interview is a positive experience.

In my fairly long career I've been through about 10-15 interview processes. A few of them were quite toxic and those were the "just cram leetcode for months" type. I never want to inflict that on someone else.

weaksauce · on Nov 14, 2023

While I like your approach better than his by a considerable margin... I would say his approach has merits since he's a principle engineer at google(maybe hiring there) and was an engineer hiring at amazon. the problems they face would definitely have this applicability of a large volume of data needing to be worked on efficiently.

munksbeer · on Nov 15, 2023

I actually think the question from the blog is a good one and I've used similar at previous roles. It may consist of one part of the interview process. I think in hindsight I was more responding to the overall ethos were some interviews are conducted in an unduly abrasive manner.

jemfinch · on Nov 14, 2023

> I prefer to talk them through the problem together, like we were actual teammates working on a problem together.

If a colleague came to me to discuss a question this basic, I'd want them PIPed out. This is miles from the complexity of problems that require collaboration. It's a shell pipeline that I'd write with sort/uniq in less than five minutes. I just did it to make sure my estimate wasn't wrong:

    sort <(sort -u <(cut -d' ' -f1,2 /tmp/day1)  <(cut -d' ' -f1,2 /tmp/day2) | cut -d' ' -f1 | uniq -d) <(sort <(cut -d' ' -f1 /tmp/day1 | sort | uniq) <(cut -d' ' -f1 /tmp/day2 | sort | uniq) | uniq -d) | uniq -d

Maintainable? No. Done iteratively in less than five minutes? Yes.

If your perspective is that it's unrealistic to ask anyone to come up with an answer to this question on the spot, you should take a long, hard look at yourself and the quality of your colleagues.

DoesntMatter22 · on Nov 15, 2023

As mentioned over and over. This isn't about the ability to do it. This is about making it less of a hazing ritual that generally only seeks to make the interviewer have a sense of importance.

nsxwolf · on Nov 19, 2023

If people knew that people like you existed, they’d be terrified of ever asking for help on anything.

AstralStorm · on Nov 15, 2023

This thing runs for 2 days not total scan though. It does not scale to a total scan due to an O(n) memory requirements of storing every hit.

You didn't ask the question you were supposed to ask. :)

aaomidi · on Nov 14, 2023

I would seriously expect anyone above mid level to not even consider the brute force solution.

DoesntMatter22 · on Nov 15, 2023

Sadly I know plenty of Seniors with 15 or so years of experience. That would write brute force and it still wouldn't work right.

Senior in a perfect world sure but the reality is that for decades ppl have been desperate for talent and so you don't have to do much to get by.

I knew a popular principle engineer at a fortune 50 company who was loved by execs. He would check in 5 or 10 of the same file. Opposite of the dry principle.

mock-possum · on Nov 14, 2023

I don’t about that, I mean - if you’re going to be in a position where this level of craftsmanship is necessary, then why shouldn’t the interviewer require that you demonstrate your ability to do so? It’s not a trick question, it isn’t unrealistic or impractical, it just spotlights a particular programming focus, performance.

corethree · on Nov 14, 2023

This one is pretty easy though.

zzyzxd · on Nov 14, 2023

> After a candidate puts forth the O(n²), I smile politely and I wait. I am really hoping the next words that come out of their mouth are “…but the complexity of this is O(n²) so can I do better?”

> Occasionally, a candidate will think they’re done at this point. 90% of the times that a candidate was done without questioning the quadratic nature of that solution, the final outcome of the loop was No Hire. So that’s another signal for me.

I would have been one of such candidates. The author said they didn't like tricky questions and wanted to get a signal on how the candidate may approach real world problems. Well this is indeed tricky -- unless you drop a bunch of constraints in the beginning, for a real world project, I would just use all the resources I can access to finish it. I am not going to go the extra miles to optimize it in all possible ways. Premature optimization can be evil. I provided the solution, it works and meets all your requirements, then I am done.

Want me to make it fast/memory efficient? You have to say it. Forgot to mention it in the first iteration? No problem, cut me a ticket and I will see if I can sneak it into my next sprint.

xen0 · on Nov 14, 2023

Describing 'throwing a hashmap' at the problem as 'premature optimisation' is a bit reductionist.

It wouldn't even occur to me to go with the naive O(n^2) solution because it has such obvious drawbacks with large inputs.

And it's an interview question... yes, you're getting a No-Hire if you just leave it there. Although I personally would prompt you if you're not interviewing for a senior position.

didntcheck · on Nov 14, 2023

Yep. I'm surprised anyone hears a problem like this without immediately thinking of maps and sets. I was expecting the naive/"poor candidate" solution to just be memory-inefficient uses of those data structures. The O(n^2) solution is something that you might do when you're just starting to learn programming, but anyone ready to apply for jobs should see that it's not only extremely inefficient, but IMO requires more cognitive effort than just throwing the data into maps and letting the standard library do the heavy lifting (even if that is still not optimal). This is basically an Algo 101 "what not to do" introductory example

rewmie · on Nov 14, 2023

> Yep. I'm surprised anyone hears a problem like this without immediately thinking of maps and sets.

If you want to import a day's worth of access logs into a hash map, you need to take into account not only the time but also the space complexity of this sort of solution.

Depending on how much traffic a service handles and also on the misguided arbitrary beliefs of a reviewer, importing a day's worth of accesses into a hash map can be a gross mistake and deemed a no-hire with prejudice.

This exercise is even dumber if we take into account that something like importing access logs into something like sqlite, querying data, and discarding the database is something that might be as efficient (if not more) and can take a hand-full of lines of code to implement.

Also, personally I'd consider a sign of incompetence complaining about using a proper database to ingest and query data efficiently but also believes that the time complexity of a task which is expected to run once per day is a hire/no-hire deal-maker. Computational complexity matters for code running in a hot path. Beyond that, either you're playing trivial pursuit or failing to understand the basics of what you're doing.

xen0 · on Nov 14, 2023

> If you want to import a day's worth of access logs into a hash map, you need to take into account not only the time but also the space complexity of this sort of solution.

Good thing space complexity is routinely discussed in these interviews.

And the idea that computation complexity only matters for 'the hot path' is way too dismissive.

For instance; a quadratic loop over a 10 million line log file is just a catastrophe (100 trillion iterations), even for something run once a day. Such a log file would happily fit into memory (it's mere gigabytes) and a hashmap would be perfectly adequete.

> Depending on how much traffic a service handles

Yes, it depends. The interviewee is expected to ask, or at least state what assumptions they're making.

vincent-manis · on Nov 15, 2023

I'm sorry, I thought of sorting as my first approach, and didn't think of maps at all. But what do I know? I only taught Algorithms and Data Structures (CS3) 3 times a year for 10 years.

didntcheck · on Nov 15, 2023

So you thought of the O(n log n) solution, not the O(n^2) one? Great. You're not who I'm talking about

marcos100 · on Nov 14, 2023

Exactly. At least you have to show that you know what you're doing and it's deliberate. Depending on the seniority, I expect some kind of justification in the comments, like "it's O(nˆ2), but since the input will be very small, it's ok".

In real life people do a lot of O(nˆ2) code without realizing, and usually it's just some unnecessary loop inside another loop. I want to know that you care about some things.

zzyzxd · on Nov 14, 2023

I have grinded leetcode and worked at FAANG, so I understand the common signals interviewers look for. But that's only because I spent a lot of time on FAANG style interview prep. I don't know why the interviewer had to give that awkward silence -- The candidate, who is likely under abnormal pressure, may start thinking about a bunch of other things "is there a bug in my code? did I miss some edge case?" I am one of those people who become very nervous in a live coding interview and usually can't perform as good as I do in my daily work. There was one interview when I embarrassingly forgot some basic syntax of a language that I used every day for years.

I don't understand why this has to be a red flag. What's wrong with just saying "it works, but can you make it faster"? As an interviewer, I say this a lot and I never judge candidates on this at all.

TheCoelacanth · on Nov 14, 2023

Not even large inputs. More like medium or even the larger end of small. A quadratic-time solution can often start to be noticably slow with just a few thousand items.

A realistic dataset for the problem they descibed could easily be tens of millions of records even if you're not a Google-sized site.

philwelch · on Nov 14, 2023

With an attitude like that, I think the author would be absolutely justified in not wanting to hire or work with you. This is the sort of attitude you’d expect from a disaffected minimum wage teenager, not from anyone who took pride in their work or cared about the quality of what they delivered. There are certain things that, even if they aren’t explicitly stated as requirements from the outset, should be reasonably assumed. Avoiding quadratic algorithms is one of those. That’s not what “premature optimization” means; optimization is about tweaking something after the fact but choosing an efficient algorithm is a decision you make at the very beginning.

> Want me to make it fast/memory efficient? You have to say it. Forgot to mention it in the first iteration? No problem, cut me a ticket and I will see if I can sneak it into my next sprint.

It’s probably a lot easier to just not hire people who deliver poor quality work in the first place. Then you don’t have to worry about whether or not they can go back and fix it later.

PH95VuimJjqBqy · on Nov 14, 2023

absolutely not, in the real world you work with constraints and you can make simplifying assumptions.

"How large can the file potentially be" would be my first question. Depending on the size I might even throw sqlite, an RDBMS, or even a full text search engine at the problem. Or I might not, it depends on the actual scenario.

But if your description is to keep it simple I'm going to do that in good faith.

pavel_lishin · on Nov 14, 2023

> Want me to make it fast/memory efficient? You have to say it.

Me: Child, please go wash your hands.

Child: rinses hands under cold running water for 2 seconds

Me: With soap! And warm water! For more than two seconds!

Child: You want me to wash my hands with soap? You have to say it.

thaumasiotes · on Nov 14, 2023

If it's any consolation, the running water is more valuable than the soap. Sadly, time under the running water is a major factor in how much stuff you can dislodge from your hands, so two seconds isn't going to accomplish a lot.

The soap is independently helpful, but it does occur to me to wonder how much of the additional value it brings is due to the automatic requirement to spend more time with your hands in the water, lathering up and rinsing off.

cwmoore · on Nov 14, 2023

Interesting thought, that the process of rinsing off soap requires more time spent flushing with water, however, due to the nature of solubility, while plain water removes a lot, a https://en.wikipedia.org/wiki/Surfactant will help remove the rest. Not very much is needed. Your laundry will get just as clean if you do not fill the provided measuring cup with detergent.

AstralStorm · on Nov 15, 2023

You could wash your hands with a dry powder or a solvent better at dissolving dirt than soapy water to boot. Glycerol/ethanol comes to mind. (Surgical prep hand sinitizer heh.)

avmich · on Nov 14, 2023

Parent: Child, please go wash your hands.

Me: Should I wash both hands?

Parent: Yes.

Me: Should I wash them in any order or the order is important?

Parent: In any order.

Me: Should I wash them fast or thoroughly? I need to ask as that may affect the way I'm solving this interview problem.

workfromspace · on Nov 14, 2023

Parent: Child, please go wash your hands.

Child: Why?

Parent: Because X (i.e. you will eat food now / you just came from outside...)

Child: Washes hands according to X

hoseja · on Nov 14, 2023

You shouldn't be using O(n^2) algos at all, ever, unless the problem is well-constrained to be very small or there is no other possible solution.

They fester and/or blow up regularly otherwise.

professoretc · on Nov 14, 2023

I've heard O(n²) described as the most dangerous asymptotic complexity, because it seems fine for small testing inputs but falls down when you throw real-world -sized data at it.

nsxwolf · on Nov 19, 2023

This will never matter in, say, a site configuration dashboard where a set of options is generated quadratically from a couple small sets of data.

There are some features that will never be used at any significant scale.

huehehue · on Nov 15, 2023

I generally agree with this, as an interviewer. You need to be explicit about what you're screening for, unless you're trying to hire a mind reader.

E.g. "this is a one-off report" and "ok, how would you do it differently if we wanted to turn this into a feature at scale?"

It's great if an engineer gets there on their own, but there are so many tangents one could go on that I won't ding someone for picking the wrong one, or asking for the most relevant one.

I would expect seniors not to need that level of guidance on the job, with the caveat that expectations should be reasonably set in design docs, tickets, various bits of context, etc.

nemetroid · on Nov 14, 2023

This is how we end up with https://accidentallyquadratic.tumblr.com/.

phito · on Nov 14, 2023

Yep definitely No Hire.

GoblinSlayer · on Nov 14, 2023

It's a google interviewer, he can have any hiring practice, he will still hire someone.

clnq · on Nov 14, 2023

> No great engineer should ever settle for an O(n²) algorithm, unless bound by memory or some other unmovable constraint.

What if this is a one-off to produce a business report? Would it make sense to use programmer time to create an O(n) structure in memory, or just loop through the files line by line and let the CPU take a minute or five, or thirty? What is the programming language - something that has a library for this or something very low level where we’d read the file byte by byte?

If we’re dealing with the latter, a small amount of data, and a one off report, I don’t care at all in my work whether an engineer I’m managing somehow writes it in O(n^3).

It’s interesting how quick to judge the author is - ask this question for points, don’t even think about that, don’t mention arrays because they’re fixed size (despite implementations for dynamically allocated arrays totally existing and the candidate might be coming from that), and so on. Some humility would be nice.

Although I think what they wrote is very valuable, as this is how many interviews go. And I have to at least appreciate the author’s approach for trying to start a conversation, even if he still takes a rather reductive approach to evaluating candidates.

jauntywundrkind · on Nov 14, 2023

There's places where writing bad shitty code doesn't matter but frankly I'd rather be at places and rather have colleagues and an environment where we don't write bad shitty code.

The attempts to circumstantially excuse away shitty answers goes against the desire to just not be shit.

The author here seemed able and willing to talk through constraints & issues. Their first paragraphs practically begged for it, as a sign of maturity. Rather than just excuse away shitty solutions, my hope is, even if you are not a super competent can-do coder, you at least can talk through and walk through problems with people that are trustable, to arrive at reasonably competent capable answers.

0xb0565e486 · on Nov 14, 2023

I would argue that in most cases where performance isn’t a constraint, the first algorithm that comes to mind is probably the most optimal choice. He even says:

> About 80% of the candidates go for the naive solution first. It’s easiest and most natural.

The “naive solution” will be easier to understand and maintain. Why make it harder if it doesn’t add value?

stouset · on Nov 14, 2023

The nearly optimal solution is effectively just as easy to understand and maintain and frankly should come even more naturally to an engineer than the O(n^2) version.

But even more importantly, with a slightly better solution I don’t get woken up in the night once a week because some buffoon left behind a hundred of these little performance landmines that worked great when the table had ten rows on their dev box but causes an outage in prod the second we hit some critical threshold of data.

This takes all sorts of forms from people using quadratic time or quadratic memory to my personal favorite: pulling entire database tables across the network to do basic analytics.

The authors always have the same excuse, that “it worked good enough for what was needed at the time”, ignoring the simple fact that the version which wouldn’t have caused an outage would have been just as easy from day one.

clnq · on Nov 14, 2023

Of course, using the correct date structure is least an engineer can do. But the point is that it doesn’t matter quite often. You’re describing a situation where it matters. Which it doesn’t always.

jpgvm · on Nov 14, 2023

> Which it doesn’t always.

The problem is even if this is "known" upfront things always end up changing.

Unless the code is truly throw away code and not checked into source control.

I'm maybe guilty of being too harsh with this sometimes but the amount of times half-assed code has woken me up has trained me to be very critical of assumptions like this.

stouset · on Nov 14, 2023

It doesn't always, but I'll be blunt: if you find yourself frequently writing O(n^2) or O(m * n) algorithms where n, m aren't fundamentally small values, you are doing a massive disservice to your coworkers. There is very rarely any meaningful cost to using something better, and that better thing should come more or less instinctively for the overwhelming majority of cases by the time you have even a little experience as a software engineer. These types of algorithms should immediately jump out at you as a least an orange flag if not a red one.

If you aren't doing better because you don't care, you're inflicting the costs your apathy on everyone else around you. If you aren't doing better because nested for loops is the most ergonomic solution for virtually every problem in your language of choice, you need to reevaluate your choices.

Everything doesn't have to be overengineered to death, but that's a far cry from being okay with regularly and intentionally choosing the actual worst solution that is all but guaranteed to blow up under any kind of growth.

clnq · on Nov 14, 2023

Let’s turn it down a notch!

No one said regularly or intentionally. Note that this thread stems from a comment talking about producing a one-off business report.

stouset · on Nov 14, 2023

I said “regularly or intentionally”, because I wanted to differentiate that that is what I have a particular problem with.

And yes, a really one-off business report is fine. As long as it runs and produces an anccurate answer, that is fine. Which is why I distinguished that if you find yourself regularly writing these kinds of algorithms and not having alarm bells going off in your head, that is almost certainly a problem.

I have never in 25 years encountered someone who spent noticeably too long optimizing running time for software that wasn’t performance critical. I have had to respond to at least two to four cases per year where somebody assumed a small case would remain small forever (or just didn’t think about it at all) and caused an outage, a severely bloated AWS bill (unbounded autoscaling + quadratic), or some other completely avoidable hell.

handoflixue · on Nov 14, 2023

The thing is, compilers are pretty amazing beasts. CPUs are pretty amazing beasts. "10,000" is a very small value of "N", given those two factors.

I've worked with a lot of engineers that considered anything O(n^2) to be a red flag, and half the time the actual performance profiling favored the naive method simply because the compiler optimized it better.

That means that if you actually care about performance, you've got to spend 30 minutes profiling for most real world scenarios. Yeah, O(n^2) is obviously a crazy bad idea if you ever expect to scale to ten million records, but the vast majority of software being written is handing tiny 10K files, and a very large chunk of it doesn't care at all about performance because network latency eclipses any possible performance gain.

HellzStormer · on Nov 14, 2023

Unless you are in a very hot path and know with absolute certainty that n will remain very low, I'd say you are doing clear premature optimization by comparing and choosing the O(n^2).

I say very small because to me, n=10_000 sounds like a number that could easily and quickly grow higher since yoy are past a basic enumeration of a few choices.

didntcheck · on Nov 14, 2023

And this is how we end up with situations like GTA5 taking several minutes to parse a JSON in production, because nobody did actually test it with real-world values of n (or at least didn't expect it to increase over the product lifetime)

growse · on Nov 14, 2023

The right lesson to learn here is "manage and maintain your software as it's used in the real world, as that will tell you where you need to invest time" not "prematurely optimize every algorithm just in case it becomes a problem".

stouset · on Nov 14, 2023

No, the right lesson here is that quadratic algorithms are a particularly nasty middle ground of seeming to work for the small n of development and initial rollout but falling over for the medium to large n of living in production.

Luckily they’re typically easy to notice, and better approaches are often as easy as sorting your input first or caching items in a hash table.

An engineer should therefore catch these things when writing them or in code review and make sure a slightly better option is used in any situation where the size of input has a reasonable chance of growing.

The cost is nearly nothing and the benefit is not shipping code that will virtually guarantee slow response times and bloated scaling costs at best and humans being woken up at night for outages at worst.

PartiallyTyped · on Nov 14, 2023

I used a brute force-y approach that meets the requirements, saves millions in operational costs (vs hiring engineers to build and maintain the complex non-brute-force solution).

Unfortunately people don’t think about actual engineering cost of “optimal” solutions. Engineering costs are part of operational costs and need to be juxtaposed against compute.

You can get a lot more mileage out of running the largest EC2 instance for a year vs hiring a junior engineer.

gpderetta · on Nov 14, 2023

Why would only the non-bruteforce solution require hiring an engineer to maintain it? Does the brute force solution spontaneously manifest and maintains itself?

PartiallyTyped · on Nov 14, 2023

> vs hiring engineers

The plural form was correct.

It's not just hiring one engineer, it's hiring a whole engineering department because that solution involves symbolic execution plus a few other things that I can not speak about without compromising my employment. The two solutions are in entirely different leagues in terms of engineering complexity.

The bruteforce solution is simple enough that can be maintained by junior engineers.

Add a few mid and senior engineers, and it can become significantly more efficient, without requiring the resources of the optimal solution while still being classified "brute-force".

It is yet another way the bitter lesson manifests [1].

The highest core count machine you can get on EC2 is like 45k per annum, which is peanuts compared to the cost of the team required to build the perfect solution.

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

gpderetta · on Nov 14, 2023

The issue here is not brute-force vs optimized. It is over engineering. Why spend 10 minutes optimizing your program when you can spend two weeks over engineering it.

PartiallyTyped · on Nov 14, 2023

Had I been in your shoes, I'd accept that there things I am not privy to, and therefore I can't have the full picture and thus I can't exercise judgement. Instead I'd try to understand what the other person is telling me - especially on HN - instead of jumping to conclusions.

A complete and efficient solution would take 2+ years plus a very senior, very specialized engineering team chasing a moonshot.

A brute-force approach that can leverage compute is far more efficient in engineering time, operational costs (eng time vs compute), and opportunity cost.

This isn't something that can be optimized in 10 minutes, and insinuating that while digging further suggests that haven't dealt with truly complex engineering problems.

gpderetta · on Nov 14, 2023

Even you are doing some one off analysis, having to wait several minutes can be annoying. And the non-quadratic solutions are not harder than the quadratic one.

xigoi · on Nov 14, 2023

“Temporary” code rarely ends up actually being temporary.

Izkata · on Nov 14, 2023

> Can’t I just use a Database?

> In theory you could write a pretty simple SQL query, and sure, Big Tech companies have giant data warehouses where you can easily do this sort of thing. But for the scope of a coding interview, you wouldn’t want to. Since this is not a distributed systems problem and the data fits in memory, why introduce the additional complexity and dependencies of a database for something that you can solve with 20 lines of simple code?

My first thought on an actual implementation was, if this is a one-off request, to import it into sqlite. No need to set up a big system, and I think it would be easier/faster than writing those 20 lines of code. Also a hell of a lot easier to iterate on minor spec tweaks like the unique pages overall vs per day clarification. And probably less likely to have off-by-one type of bugs, since the simple logic is handled by the database itself. Bonus, it does handle the case where the dataset is larger than memory!

ReactiveJelly · on Nov 14, 2023

Yep. I estimate if the files are over 10 million lines, I'd rather use SQLite. They'd probably still fit into memory, but that's where I'd put up with the up-front hassle of SQLite so that I don't have to reparse CSV in Python / Lua / whatever I'm writing this script in

mplewis · on Nov 14, 2023

Exactly. Use the right tool for the job here.

Foobar8568 · on Nov 14, 2023

Excel and power query, bonus point, you can perform more advanced data analytics and industrialize for a folder easily

amluto · on Nov 15, 2023

But a point lost because you may need to go out of your way to avoid hitting row limits. “I’m using a proprietary tool that has serious issues with more than 2^20 rows” is not so awesome.

rf15 · on Nov 14, 2023

> I don’t like hard or tricky questions.

...

> I’ve actually expressed the problem in an ambiguous way.

So when other people do it, hard and tricky questions are bad, but when you deliberately set your candidate up for failure by withholding concrete information, that's clever and insightful. Got it.

Or more productively put: The author obviously enjoys tearing down simple questions with complex implications (one often does in the sw field) and reflects over their candidates, but seemingly lacks the self-reflection to understand what makes questions hard or tricky and why interviewers like to pick them.

mieubrisse · on Nov 14, 2023

Viewpoint as someone whose primary job right now is recruiting: my observation on what differentiates Fine software engineers from Great ones is how they understand the problem, engage with it, and go get what they need to solve it. Fine software engineers need to have the problem pre-digested; Great ones can take ambiguous problems, figure out what's missing, and hunt it down.

I don't see the author's withholding of all problem parameters as tricky at all, but an attempt to accurately mimic the ambiguities of the real world to see what the candidate does with it.

rf15 · on Nov 15, 2023

I do interviews for my team, and I'm very aware that framing questions the wrong way quickly leads people into the trappings of their own obsessions. Be that a need for optimising the wrong things (e.g. optimising minute but easy to optimise logic next to large, network-heavy calls), or assuming the wrong perspective on a task (e.g. getting tasked with an accidentally math-heavy problem, and assuming it's about solving the math instead of their overall approach to new requirements) or simply lacking the confidence to question obvious problems in the task they were given. Interviewing is a stressful to downright dehumanising process that is hard to get right, even with years of experience.

PH95VuimJjqBqy · on Nov 14, 2023

Your mistake is in thinking you can communicate so perfectly that they haven't decided they have the answer to the question from your description.

hoseja · on Nov 14, 2023

I don't see it. Usually the bad tricks are CS puzzle gotchas, not a designed-in, well-spottable, management underspecification, which is actually testing for an everyday-used skill in the actual job.

arp242 · on Nov 14, 2023

Except it's not a "actual job" setting. It's an interview.

Especially for more junior people it's selecting on confidence, because especially in a hiring setting they don't want to "seem stupid" by asking questions. I guess this is also a problem for more senior people. It's just a different setting than "actual job".

If you want to ask if anything could be clarified then ask.

dwaltrip · on Nov 14, 2023

In the beginning of the interview, tell them that it’s completely fine and expected to ask clarifying questions.

arp242 · on Nov 14, 2023

That has only limited effect, as this kind of thing is more of an emotional thing.

sfn42 · on Nov 15, 2023

You people just love making up excuses for failing simple interview questions don't you?

rewmie · on Nov 14, 2023

> So when other people do it, hard and tricky questions are bad, but when you deliberately set your candidate up for failure by withholding concrete information, that's clever and insightful. Got it.

To me the worst part is the nonsense rationale to argue away using a database to store and query this data. Taken from the article:

> In theory you could write a pretty simple SQL query, and sure, Big Tech companies have giant data warehouses where you can easily do this sort of thing. But for the scope of a coding interview, you wouldn’t want to. Since this is not a distributed systems problem and the data fits in memory, why introduce the additional complexity and dependencies of a database for something that you can solve with 20 lines of simple code?

The blogger's problem statement explicitly mentions the system needs to track business metrics. Business metrics include things like clickstream metrics and the task of analyzing business metrics is exploratory in nature and requires dashboarding. Only an utter moron would look at those requirements and say "hey, this is a task for logs dumped onto a text file". There is no way in hell that this task does not involve in the very least a database.

This is yet another example of a technical recruiter evaluating candidates, and rejecting them, because they did not presented the wrong solution to the wrong problem that the recruiter somehow convinced himself it was the right one for some reason.

redditor98654 · on Nov 14, 2023

I assure you, Carlos is not a “tech recruiter”.

rewmie · on Nov 14, 2023

The blogger describes his blog post as "A coding interview question, from the viewpoint of an Google/Amazon/Microsoft interviewer".

I'm not sure it makes any sense to refer to anyone engaged in interviewing and candidate evaluation as anything other than a tech recruiter.

digital-cygnet · on Nov 14, 2023

"Recruiter" usually refers to a person whose role is sourcing talent. Their responsibilities are more like identifying candidates, salesmanship toward the candidate, and minor filtering that doesn't require too much subject matter expertise (resume review, canned questions with correct/incorrect answers)

"Interviewer", by contrast, is much more often an actual practitioner whose main role is something else (it rarely makes sense to keep a stable of people around who are solid enough engineers to give interviews and not have them do actual engineering).

redditor98654 · on Nov 14, 2023

Ok. If that is what you meant. I thought you meant a recruiter as in a tech recruiter that finds candidates from different sources and then sets up the actual interview loop after a brief initial call that might involve a simple technical equestrian.

Mawr · on Nov 14, 2023

Here we see the most classic interviewing error: not understanding that there's a difference between a test and what's being tested.

Will the data fit in memory? Well, of course it will, it's an interview... oh you expected me to ask you anyway?

I should obviously load both files into the hashmap, that way it works for an arbitrary amount of files instead of just two... oh, you expected me to write a solution for literally the exact problem you stated without considering its practicality? Even though before you wanted the opposite, when you asked about the algorithmic complexity?

Guess I'm failing.

gpderetta · on Nov 14, 2023

An interview could reasonably ask for an out-of-core solution.

But you are right that optimizing specifically for two files seems wrong, especially as the data contains a timestamp, so you could simplify the problem and ignore the number of files completely.

xoranth · on Nov 14, 2023

You can generalize to

> Now, given two N log files we want to generate a list of ‘loyal customers’ that meet the criteria of: (a) they came on ALL days, and (b) they visited at least L unique pages.

while keeping linear time complexity and

  O(num records in first file * L)

memory complexity. It is not too different from the solution given in the article (just use 3 maps instead of two).

That means that multiple files doesn't need out-of-core if the maps for one file at a time fit memory.

qsort · on Nov 14, 2023

The three solutions (n^2 time 1 space, n lg n time k space and n time n space) are basically the three strategies to perform a join in a relational database: a full table scan, a merge-join and a hash-join respectively.

"explain select" is a cool source of interview questions :)

dopidopHN · on Nov 14, 2023

Thanks. My reasoning was indeed “alright, it’s a weird world without DB, what a DB do? Hash-join.

AstralStorm · on Nov 15, 2023

A better db would keep an online running window in a bloom filter for it, though. (A good implementation of an index.) Not even mentioned in the post.

interactivecode · on Nov 14, 2023

You know since this assignment is basically a boss asking for a single report once about "loyal customers" complexity doesn't matter. let's say worst case scenario it takes 30min to run. who cares about complexity the business value is in the answers from the data.

If you're consistently getting reliable answers and finally decide to build a system for these types of reports, clearly this guy's real world experience at Amazon's Clickstream product is going to be far more valuable than what ever anyone who is brand new to the problem can come up with, even if they choose the "correct" algorithm from the start.

Because I bet you that for most real world products that create more than a single fixed format report you actually want your data setup in a completely different way than what you initially thought. You'll probably learn for example that you want to aggregrate data per week instead of per day. or perhaps you want to link this data to an internal users database, or perhaps your boss wants a notification when new data is added. Or perhaps you'll learn that loading it into a single 1GB SQLite DB solves your problem without even needing to think about any algos.

extragood · on Nov 14, 2023

I have a similar feeling.

My team builds out a lot of one-off projects for customers, and a lot of them don't need to be especially performant. That has always been the case for reporting/analytical projects.

You know what does make a difference though? Amount of development effort spent on the solution. The most performant solution can very easily cost the company much more in terms of development time spent. So in that sense, it's sub-optimal

michaelteter · on Nov 14, 2023

And my favorite answer to questions like this is, "Can I just use grep (or shell commands in general)?"

Grep, uniq, wc, and a few others can be treated as pipeline data transformers to answer questions like this interview question. As long as you make some smart decisions about the order of operations, you can usually get performance on par with what you might write custom code for.

gpderetta · on Nov 14, 2023

You would definitely pass if I interviewed you, as long as your solution was reasonably efficient.

I would love to interview a candidate that can show they can use command line tools effectively.

optymizer · on Nov 14, 2023

Because shell command injection and process spawning time are not always acceptable side-effects.

michaelteter · on Nov 14, 2023

And one of the appropriate questions would be, “Is this a one-off or rare request, or does this need to be productionized? And how sensitive is the response time? And if it must be fast and frequent, then why are we not using some form of indexed db”

kasdi · on Nov 14, 2023

Out of curiosity, how would you solve this particular problem with shell commands?

CSMastermind · on Nov 14, 2023

Reading that made me happy I'm not an IC anymore.

Don't get me wrong - it's a totally fair question, frankly one I would have been happy to receive when I was interviewing for those roles.

I'm also a fan of whiteboarding coding interviews in general as a way of evaluating talent so no objections there.

There were just something about this specific question that just struck me as boring, souless, like who cares? I think my objection might be that it too closely resembles a menial task I might actually be given - something that I hope to God the upcoming LLM advances automates away.

SkyPuncher · on Nov 14, 2023

I also cannot stand the context your given this in. It's setup as a one-off task. Who cares how fast it runs? Is it correct?

johnnyanmac · on Nov 14, 2023

yeah, unfortunate "hard studying student who eats algorithms for breakfast" compared to "boring reality". I'm sure there's a fancy data structure for this. In reality, I'd make three buckets (one for each day, and a "isLoyal" byte buffer), and update them as I scan along. O(N) time, O(N) space.

"they don’t know the size of the data upfront", okay. I spend a scan finding the highest customer number and probably make some 10MB index-assossiated buffer. If I'm fancy I find the range and use offset indices to reduce the overall size. You already said it fits in memory and I'm not a distsrubuted programmer. Space is cheap in boring reality

I guess it's one of those cool brain teasers that gets you excited to use your skills from college. Not many get to in reality. Or they prefer other domain-specific skills.

johnfn · on Nov 14, 2023

OP's solution seems a little clunky to me. He does this thing where he loads the first day into a hashmap, and then he queries it as he loops over the second day. But why? That just overcomplicates your algorithm, and the memory used by O(2) days is roughly the same as O(1) days. It's also overly specialized to solving the very precise problem that's posed in the interview; what if tomorrow Big Boss wants you to aggregate over 3 days?

A cleaner solution would be to load both days 1 and 2 into the same hashmap. Then you can iterate the map and count whatever condition you want.

tobiasSoftware · on Nov 14, 2023

I agree. I hate how these interviews always focus so much on algorithmic time at the expense of flexibility of the code. I agree with the first part about avoiding the n^2 algorithm by using hash maps. However, making your algorithm use half the memory but hardcoding the requirement that a user visited on two days is bad design, especially when it's only halving the memory used. Also not only does it hardcode the requirements, it also makes it much more complex logic wise as you need that "if pages from day 1 >= 2 or first page from day 1 != page from day 2".

My design was to create two hash maps, one for customer to a list of days and one for customer to list of pages, though after reading the article I realized my lists should really be sets. Then you can easily account for any change to the definition of a loyal customer, as all you need to do is use two O(1) lookups and then check the size of the lists. Easy, flexible, and little room for error.

Terr_ · on Nov 14, 2023

> I agree. I hate how these interviews always focus so much on algorithmic time at the expense of flexibility of the code.

Especially when the question scenario is generating "business metrics" which tend to see a lot of tweaking and iteration.

Having engineers who make contextually appropriate designs and architectures is at least as important as having engineers who are math-whizzes.

eachro · on Nov 14, 2023

I've become partial to interview questions where the interviewee just has to build something like a rock paper scissors game or command line to do list app. Simple prompt, easily extendable as well. It's jarring how many people with 5+ years of experience completely fail on this kind of interview.

Any experienced engineer should have no trouble with it. There's no hiding here. - candidate just needs to deliver something working, something relatively clean, and be reasonably pleasant to pair with. No leetcode grinding necessary, though I have found that those who did well on this problem also generally got high scores from my colleagues who do ask LC questions.

Terr_ · on Nov 14, 2023

I somehow ended up responsible for the coding interview part at a small startup, and I'm pleased to say that it doesn't involve any niche algorithm memorization or specific language knowledge.

It's really just a scenario with some mockup third-party API docs, where the applicant needs to write some paeudocode that checks different conditions, arranges the data, and ties together the different calls.

It might not be testing every possible skill the applicant has, but at least it's in-line with one of the tasks we actually expect them to perform regularly.

flashgordon · on Nov 14, 2023

Ah yes where have I heard that "oh my question is so non tricky that you just have to think rationally and I am only looking for your willingness and ability to converse and you will do fine. But most people fail it because they can't have normal conversations".

"Let us hire this person because they dint solve the problem but were a great conversationalist problem solver" - said nobody at a standardized hiring committee!

sjducb · on Nov 14, 2023

I worked with someone who was hired because they were likeable, even though they didn’t meet the technical bar. It was a really good decision. They improved the overall atmosphere of the office and made it a nicer place to work.

tmtvl · on Nov 14, 2023

Story of my life. I think people just take pity on me (it's better than sitting on the side of the road with my cap on the ground).

flashgordon · on Nov 14, 2023

I think likeability is way too penalized these days (bias). Likeability is actually a useful trait in a team for cohesion and empathy as long as you have processes and culture to identify/manage coasting, sociopathy and toxic behavior.

throwaway81523 · on Nov 14, 2023

It wouldn't surprise me if the O(n log n) sorting solution is faster than the O(n) hashing solution, because of better memory locality.

The first answer that popped into my head was a shell pipeline, "cat file1 file2 | sort -k [pattern for customer ID] | awk -f ..." where the awk part just scans the sort output and checks for both dates and two pages within each customer-ID cluster. So maybe 10 lines of awk. It didn't occur to me to use hash tables. Overall it seems like a lame problem, given how big today's machines are: 10,000 page views per second for 2 days, crunching each record into 64 bits, means you can sort everything in memory. If 10 million views per second then maybe we can talk about a hadoop cluster. But 10k a second is an awfully busy site.

I actually had a real-life problem sort of like this a while back, with around 500 million records, and it took a few hours using the Unix command line sort utility on a single machine. That approach generally beat databases solidly.

ghthor · on Nov 14, 2023

How much memory does the sort command end up using; 2N?

throwaway81523 · on Nov 15, 2023

It uses whatever amount of RAM you tell it to. I think the default is 1MB, which is way too small. It uses external sorting which means it uses O(1) RAM and O(N) temporary disk space. Oversimplified: it reads fixed sized chunks from the input, sorts each chunk in RAM and writes each sorted chunk to its own temp disk file, then merges the sorted disk files. If there are a huge number of temp files, it can merge them recursively, converting groups of shorter files into single longer ones, then merging the longer ones. I'd set the chunk size to a few GB depending on the amount of ram available.

That is basically how everything worked back when 1MB was a lot of memory. The temp files were even on magtape rather than disk. Old movie clips of computer rooms full of magtape drives jumping around, were probably running a sorting procedure of some type. E.g. if you had a telephone in the 1960s, they ran something like that once a month to generate your phone bill with itemized calls. A lot of Knuth volume 3 is still about how to do that.

These days you'd do very large sorting operations (say for a web search engine indexing 1000's of TB of data) with Hadoop or MapReduce or the like. Basically you split the data across 1000s of computers, let each computer do its own sorting operation so you can use all the CPU's and RAM at the same time, and then do the final merge stage between the computers over fast local networks.

I've used the Unix sort program on inputs as large as 500GB and it works fine with a few GB of memory. It does take a while, but so what.

anotherpaulg · on Nov 14, 2023

For fun, I fed this interview question to GPT-4 with aider. See the chat transcript linked below.

The data structures look sensible and it did most of what the interviewer wanted on the first try.

It did make the wrong initial assumption that we wanted 2 unique pages per day. When prompted with a clarification, it made a sensible fix.

When asked to optimize, it went for big hammers like parallel processing and caching. As opposed to saving memory by only storing one file in the data structure as the author discussed.

https://aider.chat/share/?mdurl=https://gist.github.com/paul...

greatgib · on Nov 14, 2023

What is very is funny is that, in all big and small companies I have worked in or with, no one ever use or discussed BigO. It only shows up in interviews technical tests.

Lots of good software engineers will have the instinctive knowledge of good or costly solutions without mapping it to BigO.

Also, what is funny is that, in my opinion, BigO is used and required by people that want to look smart but are not necessarily so. Because what we do with bigO is really limited. Almost nowhere the discussion will go further than O(1), O(logn), O(n), O(n2). Because after it becomes hard to understand maths. But in my opinion, algorithm complexity goes way beyond that when you use it in real life.

PH95VuimJjqBqy · on Nov 14, 2023

> What is very is funny is that, in all big and small companies I have worked in or with, no one ever use or discussed BigO.

Maybe you don't work in the right places then.

rf15 · on Nov 14, 2023

I kind of agree, the track record of people obsessed with BigO is not good in my book. The last person that valued it would still get the smallest element in a collection by... sorting it and then selecting the first element.

sfn42 · on Nov 15, 2023

Sounds like you don't understand it and are doing the usual mental gymnastics to justify your ignorance. People always say this about math. If you understand it you'll use it all the time because you'll recognize situations where you can leverage it. If you don't understand it you'll blissfully make do, unaware that you could have done things a better way. And then you'll proudly proclaim "math is pointless, I've never needed to use math after I graduated!". Fact is you did need it, you need it all the time. You just don't realize because you can't use it.

traverseda · on Nov 14, 2023

>Poor candidates load the contents of both files into memory.

I suppose this is the step where I become a "poor candidate". I think it's important to acknowledge changing client requirements at this point. Sure, loading both files in to memory is less memory efficient, but it's much easier to tweak this algorithm later if you do it. You can change to to count over 3 different days, or 2 days in a 5 day time period, or any number of other things. You can save some memory if you don't, but you'll arrive at a solution that is much less flexible.

I mean the real solution is to load all the data into a database of course, but even given the constraints of the problem I'd still argue for loading each entire file in to memory as the more general and flexible solutions, when our pretend clients inevitably change their pretend minds.

>you don’t need to actually keep every single page from Day 1 in the Map, just two, since the problem is “at least two pages” so a Set of size 2 or even an array of size 2 will use less memory than an unbounded Set.

And with this I think we've crossed over from the practical to leetcode. At this point you're asking the candidate to add a bunch of new code paths (each one should be tested) and make their solution a lot less general. Going from a pretty general algorithm that can be tweaked pretty easily to something super specific with a bunch of new sources of bugs.

>Or, if you’ve already determined that a customer is loyal, you don’t need to waste CPU cycles going thru the logic again next time you encounter that customer in Day 2.

No, please load it all in to your data structures properly, even if you "waste" a bit of time. All these weird little conditionals sprinkled throughout your code when you're ingesting the data are going to be sources of problems later. They might save a bit of memory, a few cycles, but they significantly increase the complexity of the code, and make a refactor of tweaks much much harder.

If this developer started doing stuff like that in an interview with me, well it would raise some red flags.

>If you want to make the problem even more interesting, you can add a third file. I will leave that as an exercise for the reader as well!

See, our imaginary customers imaginary minds did end up changing. Bet you wish you had loaded both files into memory now.

Niksko · on Nov 14, 2023

It's a good question, and the author explains it and the logic really well.

As someone going through this style of interview at the moment (but not having interviewed at Google, Microsoft or Amazon), two things jump out at me:

- If you're going to ask this question and get it done in 1 hour, does the code really matter? I'd argue that if you can get to a good or optimal solution, 99 times out of 100 you can write the code. If I got this question and didn't know better, I'd be stressing about writing the code within an hour. Knowing that we wanted to spend most of the time discussing the algos and data structures would be really useful to me. Maybe Google/Amazon/Microsoft interviews really stress this in their preamble, I don't know.

- The big "issue" I see with this question is that it relies on the interviewer knowing exactly how to steer the conversation. I think I could get to this solution with the hints, and the author seems to imply that it's ok to need a few hints. But an interviewer that doesn't know the right hints to give (or phrases them poorly) is going to turn this question into a train-wreck. This isn't an issue for the author, they clearly know this questions backwards and forwards. But giving this question as a 'standard' question that others will deliver? I think it could easily end up being too conservative and cutting out a lot of otherwise smart developers.

In general, that's my criticism of this style of question: they all claim that they're about 'seeing how you think'. But I think expecting interviewers to be able to elicit a conversation that really shows 'how a candidate thinks' is much more on the interviewer rather than the interviewee. You're expecting people whose primary job is writing software to be really good at delivering interviews.

Instead, you're going to have candidates who most of the time will do well if they can pattern-matching against problems they've seen in the past, and poorly otherwise. I can see how questions like this seem good on paper, and I'm glad this question works for the author. But it's the combination of interviewer and question that makes it effective, not just the question alone. A better title for this post might be 'My favourite way of interviewing candidates', because this post is mostly to do with the author's mental model of how to run an interview with this question.

AndyNemmity · on Nov 14, 2023

Well, you're holding it in a log file. Either the data is important enough we'd like to keep it, so we should put it in a database or elasticsearch, or the data isn't important enough, and I'd like to get further clarity on what you're trying to achieve.

... I'm guessing I didn't get the job.

avmich · on Nov 14, 2023

Yeah, I'm disagreeing with some approaches. For example, in real life the engineer usually knows the constraints of the problem, and can add them himself. More, asking clarifying questions regarding memory versus speed will quite often draw blanks from less-technical consumer. Some aspects of the problem seem artificial - do we need exactly two days visits to be loyal? Exactly two unique pages? What one's going to do tomorrow, when these requirements change? And that would affect the design chosen.

I feel that author filtered out lots of great candidates over this problem, which might be something to pause about. On the other hand, interviewing to get a good signal is indeed a tricky business, so I can sympathize.

josephg · on Nov 14, 2023

> Map<CustomerId, Set<PageId>> will do

You can do a little better than that. Each item in your map has exactly 3 states:

- We’ve seen this customer visit one unique page with (xx) url on the first day

- We’ve seen this customer visit two unique pages - but only on the first day.

- We’ve seen the customer visit one unique page (xx) and they’ve visited on both days.

In the second state you don’t actually care what the URLs are. And I think the logic for the 3rd state is identical to the logic for the 1st state - since you only add them as a “loyal customer” by visiting them again on the second day. So I think you can get away with using an Option<Url> to store the state instead of a Set or a list. (Though I’d probably use a custom parametric enum for clarity).

It’s a great problem to model using a state machine.

gpderetta · on Nov 14, 2023

It is mentioned later that you only need to list at most one URL .

croes · on Nov 14, 2023

Did any of the hired programmers for Amazon work at the Amazon search?

I don't mind O(n²) if I get good result but Amazon's search seldom gives me good results.

Same with Microsoft's search in the start menu. Doesn't find Excel if I type "exc".

SkyPuncher · on Nov 14, 2023

IMO, this interview question is going to get you amazing developer who fail to build anything of value.

I don't like the trick of failing candidates if they don't ask a question. 90% of this style of interview want candidates to rifle through solutions. If you want to talk about requirements, be explicit about it.

I'm really amazed that this "best interview" question really just boils down to leetcode for a _Senior Staff_ level interview. I don't know about y'all, but the _Senior Staff_ and _Principal_ developers I've worked with aren't wasting their time of shit like this. They're ironing out requirements. They're working with stakeholder. They're architecting systems. They're figuring out how to deliver the value the customer wants - and they're ensuring that it's actually the customer wants.

-----

There's a place for performance, but the fast running turd is still a turd.

josephg · on Nov 14, 2023

> I'm really amazed that this "best interview" question really just boils down to leetcode for a _Senior Staff_ level interview.

A good interview should involve more than just a coding problem. But it should absolutely require at least one coding problem. It’s mind boggling the number of “senior” people with good resumes I’ve screened out in interviews over the years because, simple as a problem like this is, they really had no idea how to even start solving it.

I don’t know about the poster, but when I’ve done interviews - especially for senior people - there are a lot of different types of assessment I’d want to do before hiring them. I’d also want to assess their social skills somehow (eg get them to present to the team about something interesting). And ask some high level systems architecture questions, talk about their background, and more.

stouset · on Nov 14, 2023

Or JIRA, which is particularly inexcusable.

My username at a large org is my first name. In any situation where someone wants to link to me or mention me, typing my username brings up a list of every person at the org with my name, alphabetically. My last name inevitably sorts me down toward the bottom.

You would think that an exact literal username match would have priority, but no. Typing any prefix of my name similarly sorts everyone else before me too.

fmajid · on Nov 14, 2023

Amazon’s VP of Search used to be Udi Manber, who literally wrote the book on Search. The crappiness of their search is deliberate, it is there to serve Amazon’s business objectives, not your needs.

RheingoldRiver · on Nov 14, 2023

> Same with Microsoft's search in the start menu. Doesn't find Excel if I type "exc".

What do you get? I tried just now, and `E` gives me Excel. In fact, I typed "Esc" first on accident, and I got something different for "Es" but "Esc" gave me Excel too.

croes · on Nov 14, 2023

Sometimes I get nothing until I type "Excel", sometimes I get suggestions for a web search.

RheingoldRiver · on Nov 14, 2023

Ah, maybe the difference is that I disabled web search completely so it's limited to locally-installed stuff. If it's suggesting web results, I wonder if that's a deliberately-bad result for advertising reasons

RagnarD · on Nov 14, 2023

What if a candidate notes that web hits can come from all parts of the planet and every timezone and therefore "same day" for a particular end user does NOT overlap with the "same day" of the log files and thus immediately throws into question the meaning of '(a) they came on both days'. Many users could visit the site multiple times in one of their days, but recorded as two separate days in two separate log files. This certainly adds some complexity to the question and might not even be something the interviewer considered in the first place.

ZoomZoomZoom · on Nov 14, 2023

Great to see I'm not the only one here who isn't fixated on 'how' so strongly that they've stopped asking 'why'!

bjornlouser · on Nov 14, 2023

Exactly. It's a trick interview question since the minute you deliver this report on 'Loyal Customers' someone will want to increase the complexity of the 'unique' page visitation constraint to include information that will never be available in the log file.

pjot · on Nov 14, 2023

I think these kinds of problems cause interviewee’s (under stress) to overthink the solution.

“Load to a relational store and use sql” would be a reasonable answer that, I’m sure, would be acceptable in most cases.

gpderetta · on Nov 14, 2023

It would if I was the interviewer. But expect that the followup question would be how the SQL engine implements the query.

AndyNemmity · on Nov 14, 2023

How isn't remotely relevant to being able to accomplish the task.

I only know how certain SQL engines implement queries because I was tasked at that point to increase the speed of the queries, and went deep into the debugging level detail to understand the exact cost of each action.

But I haven't done that in 7 years, and couldn't tell you much more than the tools used to figure it out.

I cannot remotely understand the requirements people make up for our jobs that have nothing to do with doing our jobs.

j-pb · on Nov 14, 2023

  Sorting the files can be a logarithmic operation with constant memory.

A great candidate would know that this is a log-linear operation O(n*log(n)), not a logarithmic one O(log(n).

donatj · on Nov 14, 2023

I mean in my book anyone doing this in anything other than a bash one liner leaning on Awk is overbuilding.

antisthenes · on Nov 14, 2023

2 liner SQL query and forget about it.

missblit · on Nov 14, 2023

The rationale for not wanting SQL solution feels a bit strained. If you don't want an SQL answer just say "Hey buster this is a C++ interview!"

krackers · on Nov 14, 2023

Good thing SQLite is written in C.

yardstick · on Nov 14, 2023

How about a single map to a structure that contains a bitmap for days visited, and a bloom filter for pages visited.

Very low memory and compute requirements.

Map<CustomerId, Metadata>

Where Metadata is a dateVisited bitset plus a bloom filter (32/64 bits depends on how accurate you need it to be).

antisthenes · on Nov 14, 2023

I agree with most points in the article except for the database part.

> Since this is not a distributed systems problem and the data fits in memory, why introduce the additional complexity and dependencies of a database for something that you can solve with 20 lines of simple code?

Because this is a question about getting the right data, and SQL Databases are...extremely good for filtering, sorting and grouping... data. Besides, every page visit from every client is a unique observation, and the principle of...tidy data suggests that every observation use a database row.

Why solve this with 20 lines of code, when you can solve it in a 4 line SQL query?

phendrenad2 · on Nov 14, 2023

The venn diagram of people who blog about their coding interview questions and the people I want to interview with does not overlap. Nothing personal, but they always come across as "here's how I make monkeys dance for my amusement" or "here's a clever question and the more people ask me about it in the interview (despite it being perfectly clear to begin with) the more it strokes my ego and the more I am likely to interview them"

mattkenefick · on Nov 14, 2023

Job interviews like this are terrible. It's a shame he's put so many people through it.

happytiger · on Nov 15, 2023

This is a great way to hire people who all meet a homogenous standard of behavior. It’s absolutely efficient from an engineering standpoint to eliminate weak links who can’t solve basic engineering problems.

It creates a team of people who are good at tests, and good in testing environments.

However, there are so many things that make an engineer great that have nothing to do with how they solve problems but who they are, how dedicated to improvement they are, etc.

But they may not be someone who:

-thinks quickly on their feet - finds this type of situation tolerable - may have disabilities one can’t see that would make this kind of interview difficult for them - have personality challenges or anxiety in social situations that make interviews like this impossibly difficult

The list of reasons that “whiteboard testing interviews” don’t work well is long.

I don’t think there’s anything wrong with this approach if that’s the kind of organization you want to build. But it does tend to create homogeny and act as a gatekeeper for “those who do not fit in.”

Some of the very best engineers I have ever hired would never make it though this interview. But they were amazing engineers who did world class work.

shepherdjerred · on Nov 14, 2023

This article mentions "good" and "great" candidates many times.

How is the author determining which candidates are great? Do "great" candidates answer the questions the best, or is the interviewer following up 1-2 years after hire and examining the impact of the person?

Great candidates aren't those who can answer DS & algorithms questions the best, but it seems that the author thinks that way.

amluto · on Nov 15, 2023

> For example, you don’t need to actually keep every single page from Day 1 in the Map, just two, since the problem is “at least two pages” so a Set of size 2 or even an array of size 2 will use less memory than an unbounded Set.

That seems overcomplicated. For each customer on day 1, you either have multiple pages or you have a single page. If you see them on day 2 and they had multiple pages on day 1, then they are loyal. Or if they had a different page on day 1 than day 2, they’re loyal. (Or two different pages on day 2, but this comes along for free.)

So the data structure can be:

map<customerid, (MultiplePages | pageid)>

Where MultiplePages is choice in a sum type that doesn’t store any associated data. Or you can do:

map<customerid, optional<pageid>>

Where the none state of the optional means there are multiple pages, but this is a bit of an odd use of an optional.

zdw · on Nov 14, 2023

You could get to something working and relatively bug free in 15 minutes with a shell script consisting of little more than cut, head, sort -u, and grep.

For reference: http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...

kristopolous · on Nov 14, 2023

Sorting, merging, then just a clever unique was basically the only thing I'd consider.

If I used this question I'd add to it.

"Pretend you have to do the work on an Arduino uno, which has very little resources. Your uno can request input and produce output from the disk where these are stored at whatever offset you wish. The log files are 100GB each and sit on a desktop computer with a modern Linux on it. Each log line is 512B. You can create files if you need to through some unspecified protocol with the desktop computer. But the desktop computer must be dumb. It will only write to and read from disk. You can send it any disk system call you wish. Step 2; Now do it without sorting or something absurdly slow"

The point is to ask for actual creative solutions instead of the pattern recognition problems that most of these problem formats are.

You want something that a weekend of drilling won't change the result of

Izkata · on Nov 14, 2023

Also possibly comm for combining two lists (one for criteria (a), another for criteria (b), then intersect them with comm).

corethree · on Nov 14, 2023

Easy.

1. Find names appearing in both log files. From this create a Set of customers that appear in both. In python it's just creating a set from the first file, creating another set from the second file and unionizing them. O(N)

2. concatenate both files to treat as one. create a Map with key: Customer and value: Set[Page]. This is basically iterating through the log, when you see a customer append the customer_id to the map and add the page to the set if it already exists. O(N)

3. Filter the map for all customers with length(Set[Page]) > 1. To get the Set of all customers that visited more than one page. O(N)

4. Combine the sets of customers who visited multiple pages with customers that appeared in both log files into a new set. O(N). You can do this with, again the union operator in python.

The irony is I'm more able to do this in python then I am in SQL. For SQL stuff at work I just look it up. I never remember the exact syntax. Total runtime O(N) total memory O(N)

This is just your basic hashmap stuff for fast look up and aggregation. Usually fang questions are harder than this.

bvrmn · on Nov 14, 2023

union -> intersection

corethree · on Nov 14, 2023

Thanks for the correction.

andrewstuart · on Nov 14, 2023

If the job is log file processing, it's an entirely reasonable question.

But most jobs are not log file processing.

It's ridiculous to generalise this sort of thing:

"We are cooking soup here at Amazon Soup Kitchen. My favorite interview question is to ask candidates to bake a cake, that's the real test of any cook."

marssaxman · on Nov 14, 2023

The question is not about log processing; that is just a framing device. The interviewer does not care whether you can process log files, specifically; the interviewer cares whether you understand basic data structures and know how to make reasonable tradeoffs between time and space complexity, generally.

sfn42 · on Nov 15, 2023

The people who criticize these questions do so out of insecurity - they know they aren't at the level required to solve it, which is embarrassing because it's pretty simple stuff, so they try to poke holes in it and justify their ineptitude with comments like the one you're responding to.

chiefalchemist · on Nov 14, 2023

First they say they don't like tricky questions, and then goes on to admit the "spec" is ambiguous. True, candidates are permitted to ask questions, but perhaps they are trusting of the interviewer and expect the question to be actionable as is? Or just the same, the candidates don't trust the interviewer and don't ask questions because they fear that'll result in a penalty? If you come from an environment where questions aren't rewarded - and there are plenty of those - then silence is likely.

Finally, it's worth mentioning, while the question + answer might correlate well with the hiring decision there's no mention to how well it predicts future performance. That said, there's a survivor bias at play so using it against performance might be iffy.

sneed_chucker · on Nov 14, 2023

Frankly, I don't want to hire people who are too timid to ask a simple clarifying question in an interview.

A big part of this job is dealing with ambiguity and communication. If you feel the requirements of your task aren't clear, then go to the person who made the task and clarify them. What's the alternative, exactly? Staying silent and waiting? Wasting time implementing the wrong solution?

davidw · on Nov 14, 2023

Then make it clear that it's "real world" and that they can and should ask questions about aspects of it that they need more information on, rather than just hoping they do so.

redditor98654 · on Nov 14, 2023

Usually all interviewers will say “please take some time to consider the problem and ask any clarifying questions if you have any”. This is standard for all Amazon interviews.

Source: Conducted 250+ interviews in Amazon.

davidw · on Nov 14, 2023

That feels very formulaic, which is I guess par for the course for behemoth corporations.

If it were me, I would try and frame it like "this is an attempt to be a bit more 'real-world' where we've received some initial direction on what the results are supposed to look like from management, but don't consider any of it set in stone and I'm happy to talk you through it as if we were working together...". IDK the exact verbiage, but something that gets people in that mindset without straight up telling them "this is meant to be ambiguous, wink wink".

redditor98654 · on Nov 14, 2023

Yeah, I made my comment concise but in a real interview I do say something similar.

chiefalchemist · on Nov 14, 2023

You're assuming the nervous stressed candidate can read your mind? That's not going to yield the best candidates. It's going to yield the candidates who are best "tuned" for the process.

I've already explained simple and obvious scenarios where the context can impact the candidates in sub-optimal ways.

If someone says, "Solve this" and the canidate attempts to do so, but that's not what was really expected? That's trickery (and foolish). On the other hand, if the interviewer wants, "Here's a problem, let's discuss..." then *that* is what the interviewer should lead with.

sneed_chucker · on Nov 14, 2023

No, they don't need to read my mind. I tell each candidate before they start that they can ask any questions at any time during the interview.

chiefalchemist · on Nov 15, 2023

So context doesn't matter? And they're to blindly trust you? Because you said so [1]? Because they've never been screwed by an interviewer before? My sense is, you don't get out enough (i.e., be the candidate).

I hope the next time you do an interview the candidate asks, "Why should I trust you?" or "When was the last time you were the interviewee?"

[1] That's not how trust works. It is, by definition, earned.

sneed_chucker · on Nov 15, 2023

...I don't know what to tell you dude.

It just sounds like you're projecting your poor interview experiences onto me.

If you can't take a statement as unambiguous as "feel free to ask questions at any time" at face value, then why even bother interviewing?

And for the record, I've been the candidate in plenty of interviews. I've done the 5-hour FAANG interviews and know how shitty they are. It's why I specifically designed the coding interview process at my company to be much more candidate friendly.

chiefalchemist · on Nov 15, 2023

Not really. It sounds like:

1) You don't understand human nature in the context of interviews.

2) You're making it about you and how exceptional you are, and have forgotten you're only half - at best - of the process

3) You've forgotten that - directly or indirectly - you are also being interviewed (by the candidate).

4) It's (all but) 2024. 20th Century mindsets need to die already.

This isn't about my experience(s). This is about having a proper and healthy understanding of the paradigm / process.

I appreciate your input. I hope you appreciate mine (but I doubt it).

two_handfuls · on Nov 14, 2023

I think what the author meant is that they dislike questions that require too specialized knowledge, so once you know the trick the question is easy to answer, but whether the candidate knows that particular trick is not otherwise correlated with candidate skill.

phanimahesh · on Nov 14, 2023

I have practically never seen a prd that did not have some ambiguity. I realised spotting ambiguity and asking questions is an essential and invaluable skill that needs to be selected for.

One of my teams was very surprised that I rejected a prd for being too vague and put my foot down that it will not be picked up until specific questions are answered. They were, I would not say meek, but resigned to the inevitability of product managers pushing poorly thought PRDs and not having any say in the matter.

I took my time training them to say no and spot ambiguities, and I like to think they have all become better developers and product managers for it. I always pick questions that have more than one obviously correct interpretation to see if the candidate notices it.

The idea that you trust your interviewer to provide a directly actionable question is strange. I would expect that in campus hirings, and for entry level fresh grads, but not to anyone with even a year of experience. At senior levels, it becomes more and more important to spot ambiguities and clarify them before they result in misunderstandings, wasted efforts, and worse. I trust a good interviewer to have a question that can provide them useful data points on candidates experience, skill, thought process and attitude.

Whether spotting ambiguities in the question has any correlation with future performance is harder to answer, but methodical people with attention to detail are preferrable to the alternative.

If the candidate comes from an environment where questions are penalised, they would be a bad fit for a team that values and expects questioning. It is somewhat unfair, but either way, the interviewers are selecting for their preferred qualities.

davidw · on Nov 14, 2023

I agree with this... you're asked to do something and the interviewer is purposefully holding back information wanting you to come out and ask about it. That feels a bit tricky to me.

chiefalchemist · on Nov 14, 2023

It's certainly not transparency.

I do agree, the real job would habe to deal with ambiguity. But this is the interview. The interviewee has a completely different mindset going in. "Playing games" to see how they respond...that's for the likes of the NSA, CIA, etc.

"We can't find good candidates"? Nah. Your hiring process sucks. Get a mirror.

acheron · on Nov 14, 2023

Ugh yes, the interviewers who give ambiguous questions and expect you to read their mind are the worst.

What’s the old xkcd: “communicating poorly then acting smug when you’re misunderstood is not cleverness.”

klodolph · on Nov 14, 2023

The skill under test is for that part is “can solve ambiguous problems”, and what you want to see is that the candidate is able to recognize that a problem is ambiguous.

I think the hard part is recognizing that a problem is ambiguous. Telling someone that a problem is ambiguous kind of defeats the point. IMO, it’s not about reading someone’s mind, but recognizing that there are multiple interpretations to what somebody has SAID. That seems less like a mind-reading technique and more like, you know, a communication skill.

I have gotten lots of ambiguous problems during my career, it seems only fair to have them appear during an interview.

philwelch · on Nov 14, 2023

He’s not expecting you to read his mind; he’s expecting you to notice the ambiguity and ask a clarifying question.

chiefalchemist · on Nov 14, 2023

The interviewer is expecting the interviewer to violate standard interview protocol? While the interviewee is perhaps desperate for a job? And the interviewer is - in his/her mind - thumping their chest about what a great and experienced interviewer they are?

I can't see how that's a success-minded plan for anyone involved.

gpderetta · on Nov 14, 2023

Asking questions is a protocol violation???? What kind of interviews are you subjecting yourself to?

chiefalchemist · on Nov 15, 2023

Apparently, you'd be surprised (the kind of crap interviewing experiences there are).

That aside, the candidate definitely has to err to the side of caution. There's also typically a time limit, yes?

That's protocol.

laurent_du · on Nov 14, 2023

I am baffled that several contributors to this thread seem to find this question difficult, some even calling it "dehumanizing". This is a very easy and basic question and I wouldn't want to work with someone who couldn't solve it efficiently in a few minutes.

tobiasSoftware · on Nov 14, 2023

I agree that "throw everything in a hashmap" should be straightforward and is a good interview test. However, his further steps to "optimize" it by saying "Poor candidates load the contents of both files into memory." are terrible. Yes, that might optimize resources, but first it hardcodes the requirement that there are exactly two days breaking the solution if the requirement changes, and second it adds a bunch of finicky fragile code about "if there are two pages or more from day one or if the first page from day one is different from the page from day two".

Great candidates treat software like a business with changing requirements and code that is read by multiple people, poor candidates treat software like a math challenge where the only goal is to use as few resources as possible.

coolThingsFirst · on Nov 15, 2023

The problem is that once you start asking algorithm questions for top tech companies people will optimize for knowing them deeply instead of exploring other fields of CS.

This is what happens with Codedorces/ACM-ICPC. Suddenly everyone is hyper driven to crank them out day after day under pressure and much more interesting fields databases or turning a business idea into a usable app get neglected.

Lets not fool ourselves no one is going to solve hard medium LC under time pressure in 30 minutes unless they’ve seen a similar problem before which leads to hiring worse engineers who pass interviews.

Once a metric becomes the goal it ceases to be a good metric.

seanmcdirmid · on Nov 15, 2023

https://en.m.wikipedia.org/wiki/Goodhart%27s_law

svilen_dobrev · on Nov 14, 2023

i have a ~~similar task in my python course.. given 2 text files with similar format, one holding persons and cds they have each, another holding songs and which cds they are on, to answer which songs particular person has, and which persons has particular song. Yeah, a join.

Nothing about O(blah) though, these are way-too specialized / optimizing lands.

that said, coding != thinking..

Some people cannot think of a solution at all, 50% go for 3 loops one-in-another.. copy-pasted 2x2 times, few do the ~~simplest 2 maps with 2-3 funcs.. and once a "master" invented a class with quite a few methods contaning copy-paste inside :/ Probably one can do some analysis over who goes which way and why but i never bothered.

It is also a good ground to show the different ways to add key-value to a map if it's not there - and their pros/cons (one can hook some O(blah) stuff here. i don't. There's just simpler vs faster, with variants).

And yes, having an (subtly) unclear requirement (that should be identified and asked about), is important part of the learning - that's 90% of cases in life.

animal531 · on Nov 14, 2023

Data structure questions heavily favors previous use. For example in the past I've used a multi-dictionary for certain problems, so it would have been an easy reach for me; but maybe not for someone from a different coding history.

Personally I prefer something like fizzbuzz which is a pure code question, it applies to candidates of all levels and tells you if they can reason through problems.

gpderetta · on Nov 14, 2023

In this specific case it heavily favor anybody that has spent more than 5 minutes with a programming language and used a dictionary.

ZoomZoomZoom · on Nov 14, 2023

The question looks more or less OK, except the ambiguity part. Hower, perhaps after "25 years in Big Tech" one shouldn't invent a process that discriminates against "loyal" customers living in timezones unaligned with the logging server. As usual, users come last.

dooglius · on Nov 14, 2023

I think unaligned customers are discriminated in favor of--more likely to be browsing around the server's midnight and get marked as visiting two days.

ZoomZoomZoom · on Nov 14, 2023

I don't think you can decide if it's a net positive/negative discrimination with any degree of certainty without knowing the date-stamping machine's location and the demographics of the userbase. However, there most certainly will be a non-zero amount of users discriminated against (browsing during a span around their local midnight that falls onto a single date on the servers), and that's what matters most in my opinion.