"and candidates who haven't usually don't even have a chance at solving them"
I am one of those candidates, and I don't know why it is called Dynamic programming. To me a vey naive understanding of DP is this - it's just a simple cache mechanism to store intermediary results so that you don't have to recompute them and waste resources.
In the real world we always think about and do such optimizations, be it I/O, disk access or db access. I would love to understand how DP is any different.
> [...] I don't know why it is called Dynamic programming.
According to Richard Bellman, who came up with the name:
An interesting question is, Where did the name, dynamic programming, come from?
The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington
named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the
word research. [...] What title, what name, could I choose?
In the first place I was interested in planning, in decision making, in thinking. But planning, is not a
good word for various reasons. I decided therefore to use the word "programming"
[...] it's impossible to use the word "dynamic" in a pejorative sense. Try thinking of some combination
that will possibly give it a pejorative meaning. It's impossible.
Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to.
According to Richard Bellman, who came up with the name:
An interesting question is, Where did the name, dynamic programming, come from?
The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word research. [...] What title, what name, could I choose?
In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word "programming"
[...] it's impossible to use the word "dynamic" in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible.
Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to.
Problem is when some text copied and pasted here incidentally starts paragraphs with two or more spaces.
And maybe the the commenter does not know why it formats weirdly.
> We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word research.
Amusingly, Engine Charlie [1] was an engineer. You'd think an engineer understands the value of research :)
DP is a general problem solving approach. It's about recognizing which parts of a naive solution have repetition which can be eliminated to improve the time complexity of the solution. Another poster aptly described it as finding states which can be collapsed into one. However you want to think about it,
- DP is a general problem solving approach
- DP is not limited to any particular implementation detail (like my parent poster attempts with "it's just a simple cache mechanism")
- DP is not a "trick" that you can learn and then solve any DP problem
If we have a DP solution, we can write it in many ways. We can write it top-down with recursion and memoization, or we can write it bottom-up without memoization. Whether we use memoization in our solution or not is a minor implementation detail. Both solutions are DP. Other posters seem to make a distinction between these two solutions as if they are entirely different and as if the recursive solution is not dp - this is also incorrect. The challenge in DP tasks is about improving time complexity by recognizing repeated sub calculations.
In "Algorithms" [Dasgupta, Papamdimitriou, Vazirani] they state that the memoization (e.g. querying a hash table) can have significant overhead leading to a large constant factor in the big O analysis. On the other hand, the bottom up approach using a table solves all the possible subproblems including ones that are not needed and ones that the recursive approach would avoid. So from a big O perspective the recursive top-down and the table bottom-up approaches are the same but there can be significant constant factor differences.
Then mention this in the interview. I ask an algorithms question that would likely be hated on HN but when candidates tell me the practical difference (API cleanliness, maintainability, performance, extensibility) between approaches that ultimately don't impact asymptotic runtime I love it.
Yeah I thought the recursion+memo wasn't actually "dp" until I looked it up. Recursion without the memo is not DP however since you hit exp running time/memory and the whole point of DP is to avoid this.
The tricky thing with dynamic programming is realizing that it fits—you have to understand how the problem you're solving decomposes into overlapping subproblems that are worth caching. Once you know what the subproblems are, implementing the algorithm is mostly a matter of getting some fiddly indexing logic right.
Take string edit distance for example: the key to the dynamic programming problem is seeing how the exponential blow up of possibilities actually has a rich overlapping structure that you can cache in a table. If you didn't know string edit distance was amenable to dynamic programming, you might not realize it at all even if you are constantly thinking about caching results. Once you see the structure, the algorithm becomes relatively straightforward.
Absolutely, but I attribute the ability to decompose a problem into overlapping subproblems as an understanding of recursion, so I had implicitly assumed such an understanding.
It's a dp problem, and you understand how to recurse just right, and you understand memoization, so you'll be able to solve it, right? Link your solution when you reply please.
In my "real world" we normally don't care about things like big-O complexity. We worry about doing dumb things and not taking more time than we have available. I'm not saying big-O is useless or CS wizards are never helpful. It's just that you need one or two of them for a large team of normies, IME.
I have a problem with this notion that knowledge of algorithms is required to be a good engineer though. Case in point: Senior algorithm nerd on my project is going nuts over algorithmic complexity and delaying an early delivery to another team so they can begin using our new feature. In our case, n is fairly small no matter what, so 2n vs n^2 really doesn't matter to us. The code he's optimizing isn't even the long pole in the tent. It calls other routines which take 2/3 of the total time anyway. We could just deliver now and improve later when we have real world data on how people want to use our feature, but nope, we're wasting time on endless rewrites in search of perfection which may not matter if the other team can't deploy our stuff in time.
>> Senior algorithm nerd on my project is going nuts over algorithmic complexity
This is me, but luckily where I work I have people who can keep me in check because we generally do design reviews before anything big is built.
However, I have been in situations at previous companies where big(o) was ignored to take short cuts up front, because the "data was small" and suddenly scaling to even just 100 users starts to break things because of poor design decisions when it gets into production.
I guess the lesson here is more the importance of design reviews. Also n^2 is HUGE even for small data if there is IO or api calls involved. Any public api you provide, that is n^2 is not a good idea because you never know who may end up using it for what.
Right. In my case, the operation was an extra memory comparison. For something already in the cache.
Sure, constraints can change and your assumptions about n<10k may prove unwise, but that's our call to make as engineers. YAGNI. If you know n is never going to grow, then why waste time on it? We're not paid to write pristine code. We're paid to solve problems while hopefully not creating new ones. Pragmatism and all that.
> In my “real world” we normally don’t care about things like big-O complexity. We worry about doing dumb things and not taking more time than we have available.
Big-O is just a formal word for how much time you’re going to take, a way to figure out if you’re likely to take more time than available. So, it sounds like you do care about it, you’re just saying it doesn’t matter if you’re formal about it?
If your team’s n^2 worst case really is in the noise, then you’re absolutely right, the senior is prematurely optimizing.
But without more context, I have to assume there could be reasons your senior “nerd” might be right. Is there a noticeable difference to the user in run time with the largest n you can possibly have? Is the other team you’re delivering to going to call your n^2 routine from one of their own, one that has it’s own O(n) or O(n^2) outer loop?
I’d say big-O notation isn’t difficult, doesn’t take long to learn, doesn’t require school, and knowledge of it doesn’t make someone a wizard. It’s easy. It’s a low enough bar, that I do expect everyone on my team to feel comfortable looking at a function with loops and being able to tell whether something is O(n^2) or O(n^3).
The difficult cases are determining complexity of an architecture full of lots of interconnecting O(1..n) functions, when the complexity isn’t all in one place. I watch unnecessary O(n^3) algorithms get made even on teams of people who all know big-O.
> Big-O is just a formal word for how much time you’re going to take, a way to figure out if you’re likely to take more time than available
This is a common misunderstanding about big-O. It's not about how much time you're gonna take but it's actually a measure of how complexity affects time growth as the data grows.
"Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. "
The quote from Wikipedia is factually correct, but it seems you misunderstood my comment.
> It’s not about how much time you’re gonna take
It most definitely is about predicting running time, or memory consumption. Big-O is literally a straightforward way to write the formula for best/avg/worst case time or memory of a program, modulo a constant. The second part of your sentence contradicts the first part.
I’m not sure I buy that there’s any common misconception about Big-O either, I’ve never seen much evidence of that. Anyone who’s learned Big-O in school, or has used it at work, or read past the first sentence on Wikipedia, or commented about it on HN, knows what Big-O means and that for small n, O(n) can be bigger than O(n^2). It’s just not that tricky.
I like Wikipedia a lot, and I agree this one’s right. Why do you dislike it, and why is that relevant?
I'm not the person you're replying to, but it seems to me that there may be an example of a common misconception about Big-O in your comment here:
> Big-O is literally a straightforward way to write the formula [describing] best/avg/worst case time or memory of a program
Technically, Big-O notation should only be used to provide an upper bound on the best/avg/worst case[0]. And typically people only use it to discuss the worse case or average case, just because talking about the upper bound on an algorithm when its inputs are restricted to its best possible inputs is typically much less useful.
But providing an upper bound is not quite "writing the formula" (giving a complete picture), because you haven't specified the lower bound (Big/Little-Omega) - very loosely speaking, if you only ever talk in terms of Big/Little[1]-O, you've only provided half the picture. Of course to be fair, Big-O of an algorithm is the side of the picture that is way more likely to bite you in production!
Disclaimer: not a mathematician, so salt to taste :)
The “worst case” is the part referring to the upper bound. Big-O is referring to the “order” of the run time, meaning the largest term in the formula for large n.
Again, everyone knows this if they know Big-O at all. I am intentionally not being picky and pedantic with my words, because the context of this thread was people complaining about unnecessary formality and unnecessary and unhelpful over-concern about academic correctness rather than practicality. There is no widespread misconception, but there are people who like to wax philosophical and try to demonstrate their superior knowledge...
My side intention was to detail why @rafiki6 might have been incorrect without knowing it, since they claimed correctness and complained about downvotes.
Not entirely sure why I am getting downvoted here. It's factually correct and relevant to the discussion. I'm starting to think someone is actively trying to downvote my comments.
If you really do need to know that stuff, you can read up on it when it's needed. Been programming professionally for over 15 years and have had to use this stuff once that i can remember. Knowing how to design and index a database well is way more useful.
To give the other side of the coin here- you at least need to know the basic logic behind it if you're going to be able to recognize when reading up on it might be needed.
If you have never considered the benefits of caching certain computational results in a recursive algorithm, than you probably wouldn't be as quick to recognize when that technique would be useful.
Exactly. And also, even if you somehow recognized that "you need to use a DP solution here", it's not as easy as "read up on it" (like GP says). DP is not a trick that you can just learn once and then apply everywhere. It's not a npm package that you can install that solves your problem for you.
> In my "real world" we normally don't care about things like big-O complexity. We worry about doing dumb things and not taking more time than we have available.
Any work in any kind of scale setting has to implicitly deal with this. You might not realize it immediately, but if you are dealing with a network of 100k+ nodes, or some database with 10M+ entries, or anything of that sort, there is a huge difference between doing something in O(N) or in O(N^2) or O(2^N). Just a nested loop where for each node you need to gather information from each other node is completely out of the question (quadratic vs. linear). Or where you try all combinations of something (exponential). Or where you look up something (linear search vs. logarithmic). I deal with such problems every day. And my job isn't anything special. You may call those "dumb things", but under the hood it's just asymptotic complexity, aka "big-O".
It could also be that your "real world" does not contain scale settings. But then please don't generalize; the industry is full of scale problems to be solved on a day-to-day basis.
> Any work in any kind of scale setting has to implicitly deal with this.
Obviously. And most experienced engineers know this. But a lot of us never deal with n > 10k or so. I've worked a lot in embedded, and even when n is large, it's usually just moving around a payload. Even when I've dealt with n>>10k, say writing a network server, I've rarely been concerned with complexity. I focus on doing the minimum amount of work possible. It's basically the same thing, without the academic formalism. The main rule of engineering seems to be "don't do stupid things()."
When n is guaranteed to be very small it does not matter. Even an exponential solution would be fine in these cases. The 'algo nerd' on your team ought to know this.
Sadly, Perfectionists share a trait when it comes to building software, obsession to their craft at the expense of everythng else, shipping times included.
I think we all derive joy from writing 'perfect' software...but then the rules are changed and perfection becomes garbage :)
Like the other piece of software I was working on for this project. Nice little test framework with a beautiful OOP structure... which became a noose as soon as I wanted to add a new feature. Now it's this frankenbeast with a weird appendage glued on.
DP means you map the problem to a grid, and your cache connections are your neighbors on the grid.
Memoize (aka caching) was invented later than DP, and it’s easier and the same complexity. You don’t have to map your problem to a spatial layout.
One big difference is that DP often involves windowing your cache. While the problem might be on a 2d grid, the storage for your cache might be just two rows, and the algorithm will sweep down rows only once. So, a DP solution might be far less memory than a straightforward memoize, unless you’re doing work to clean up memoize as soon as it’s no longer needed.
Another big difference is that with DP you’re pre-allocating, while with memoize you’re typically using dynamic memory allocations by default. That obviously doesn’t have to be true, but makes a fairly naive DP solution faster than a fairly naive memoization.
What makes dynamic programming special is that it exploits optimal substructure, where a problem can be broken down into smaller sub-problems, the solution to which must be part of the optimal solution to the larger problem.
While caching can be applied anywhere, dynamic programming can only be applied where optimal substructure is present. For example, finding the shortest path between all points in a graph. Likewise, finding the longest path doesn't have optimal substructure and so dynamic programming cannot be applied in that case.
Yes, basically it is exactly that. (For why it is called like that, see other answers; but that's also somewhat irrelevant.)
But the main task is to identify how to iteratively / recursively solve the problem and what to cache (and then you should also be able to tell how efficient your solution is). This is the tricky bit. It doesn't matter if you call it DP or whatever. But it matters to be able to come up with efficient solutions for such problems. And this is what companies want to see.
> In the real world we always think about and do such optimizations, be it I/O, disk access or db access
That's they key. You've come across caching as a way to optimize for a given metric. Folks who haven't heard of DP aren't aware of using DP techniques such as memoization, for example, to speed up repetitive tasks [0]. Similarly, you might not be aware of many other techniques you haven't come across [1][2] how could you possibly solve those problems in the exact same way? For instance, QuickSort and its variants were PhD thesis worthy, once upon a time (if not now).
[0] Not everything is intuitive, DP certainly isn't, imo. As a rule of thumb, naive solutions to a problem are often times intuitive.
I am one of those candidates, and I don't know why it is called Dynamic programming. To me a vey naive understanding of DP is this - it's just a simple cache mechanism to store intermediary results so that you don't have to recompute them and waste resources.
In the real world we always think about and do such optimizations, be it I/O, disk access or db access. I would love to understand how DP is any different.