It is funny that the author chose the SUBSET-SUM problem. You may be amazed but this is a special-case of the KNAPSACK problem for which we have:
Fully polynomial approximation schemes.
Pseudo-polynomial time algorithms.
In other words, you may find that for all practical purposes this problem can be solved so fast that it doesn't really warrant the NP-badge in some sense. You can say that the problem is an NP problem which disguises itself as a P problem. It would be really fun to use David Pisingers knapsack algorithm codes, http://www.diku.dk/hjemmesider/ansatte/pisinger/codes.html and then bait the bidder into finding a set for which the algorithm is NOT fast (hint: That is also pretty hard :)
Yes. There's even a conjecture that says that for basically all NP problems all reasonable [1] instances will be solvable in expected polynomial time. Even if P != NP.
[1] Add some hand-waving about probability distributions here.
Does that include factoring large prime numbers? If a good approximation would exist for that, wouldn't that make public-key encryption pretty useless?
Included in that handwave about probability distributions was a caveat about not being handed pathological instances of problems. Encryption harnesses pathological instances and wouldn't be covered by such a handwave.
(Is there ever any other reason to generate multi-hundred digit numbers and try to factor them? Honest question, if anybody's got a fun answer, though I am asking about something actually useful that you know about, not something hypothetical.)
Actually, it's quite hard to come up with good primes for RSA. They need to avoid lots of properties, because researches have found efficient attacks on numbers with those properties.
Factoring numbers is already suspected to be much easier than NP-complete.
That is because factoring prime numbers is in the intersection between NP and co-NP. Co-NP is the set of problems where a "NO" answer is easily checkable with a certificate. If NP ?= co-NP is as much a question as P ?= NP. (I.e. no known proof either way, but every expects them to be unequal.)
Public-key encryption schemes would simply need to avoid reasonable circumstances. I suspect "product of two large primes" falls outside the "reasonable circumstances" for factoring.
It's even harder than that. There are lots of known attacks, if you are not careful with your choice of prime numbers. (`Careful' is equal to `choosing pathological instances'.)
>You can say that the problem is an NP problem which disguises itself as a P problem. It would be really fun to use David Pisingers knapsack algorithm codes, http://www.diku.dk/hjemmesider/ansatte/pisinger/codes.html and then bait the bidder into finding a set for which the algorithm is NOT fast (hint: That is also pretty hard :)
This makes me think that the post could be a phish for programmers with strong maths comp-sci backgrounds, something Googlish?
That would be fun, but alas, I don't have time. One can simply submit Pisingers "decomp" codes from the page I linked above, but if you do that, you are not the author of the work...
Why is finding a set for which the algorithm does not work (edit:) WELL so hard? I have admittedly not read anything about -- or indeed, heard of -- KNAPSACK, but my impression was that we could estimate running time based on various measures of complexity of the problem. Is this incorrect, or is measuring the complexity of NP problems simply incredibly difficult?
For all NP-problems, there are trivial instances. As an example the given instance might have structure which effectively makes it a P-problem. What makes NP-problems interesting is that there are also hard instances where the search breaks down and you end up walking an enormous solution space. Most NP-solvers "cheat" by looking at the structure of an instance in order to cut down the amount of searching it needs. How good they are depends on how good they are at cutting down the solution space. A lot of research is going into this - which is really a bet on P not equal to NP.
The KNAPSACK problem is one of the "weak" NP problems in the sense that there is a pseudo-polynomial solution, see
Intuitively, this means that KNAPSACK is still hard, but is easier than most other NP problems. Pisingers codes are so good that for most inputs the algorithm actually terminates quickly. There are still hard instances out there, but there are far between them. This means that a search for a hard instance might actually become rather cumbersome. Further, all realistic instances are probably realistically solvable so the instances we lack are more or less of theoretic interest.
So when you call NP-problems hard it is because it has been proven there are some nasty instances among them for which even the most clever NP-solver algorithm for the problem gives up and resorts to basically searching the whole solution space in exponential time.
But WHY are they trivial? Because we can extract metadata that help us prune the solution pool? And if that's the case for the KNAPSACK problem, does this hold for all such problems?
Yes, the idea is to look at the structure of the problem and use the structure to prune the solution pool aggressively. The usual method is Branch-and-bound algorithms, but the wikipedia article does not seem to convey the idea especially well, so dig around for a better reference.
Sometimes we are lucky however and there is more structure to a problem than what BB-algorithms usually give. For KNAPSACK, remember that we have a burglar with a knapsack who has just broken into a house. Each item in the house has a given profit and a given weight. The problem is to maximize the profit while still keeping the weight low enough to be in the knapsack. In the 0-1 KNAPSACK problem, there is only one of each item, so we either have to leave it, 0, or to pick it, 1.
The inherent structure is that we can order the items by the ratio p/w of the profit over their weight. Item with a high p/w ratio tend to be items we need in the knapsack, whereas elements with a low p/w ratio tend not to be. It turns out you can preprocess and prune some elements this way before you start on the BB algorithm.
But for this problem it turns out that there is a critical item, the break item and a window of items around it. If you can identify this window, solving that is enough for solving the whole knapsack problem. There is a theorem from the 80'es stating this. It turns out that for knapsack problems with thousands of items, the window tend to be fairly small, some 20-30 items.
Pisingers codes work by being extremely clever at finding the window. It is a smart BB/Dynprog hybrid and as soon as the window is found, it is almost done.
Of course, there are knapsack instances where the window is almost all of the item set and then the algorithm fail. But it turns out to be rather hard producing an instance from a real-world problem which exhibit the large window structure. Hell, Pisingers codes are often faster than sorting the items by the p/w ratio - which makes for a great discussion on complexity :)
So what did I mean by trivial instances? In the SUBSET-SUM problem, this is a trivial instance: {-3,-2,-1,1,2,3}. This one too: {1,2,-3}. Or how about this one: {1,-1, 337}. In the first two, the whole set works. In the latter, it is easy to see that 337 can never be part of the sum as its inclusion lead to a value so great we can never hope to get back to 0. Even if we construct extremely large sets, it is easy to construct them in a way such they can be pruned down to a small set and then solved.
As an extensive (and generally happy) user of elance and similar sites, this really made me laugh! This is fantastic example of the pitfalls of such sites.
Whatever the project, you get a bunch of replies (the majority from Indian) who either just say 'yes we can do that' or come back with boilerplate drivel without understanding the requirement.
The Indian guy who responded with a $300 bid and said:
"I can do this for you in php or javascript. If you want this work in one language, I can charge less."
He could've done it in phpjs and killed two birds with one stone.
My bid was rejected and my account suspended for completing the work before my bid was accepted. I just hope those Clay guys see it so i can earn my $300.
Hrm.. you know, there could be something here. If people kept posting variations of NP problems as jobs on sites like GetACoder as a honeypot, you could easily devise a blacklist from all the bids as people you'd never want to hire for a real project.
That's actually kind of brilliant. I'm looking at this reply that reads "Hello, Easy job for me,please send me a PM ." and thinking, "you stay away from me." This one is also golden, "Read your requirement and everything has been understood. We have already delivered a couple of similar projects and given our strong capabilities. I am confident we can deliver a powerful end-result to you as well."
I think most of them just skim the description and bid on a lot of projects without really reading them and hope to get some of them awarded. Being non-native English speakers compounds the problem.
Or am I being too gentle? It's hard to miss the title if you know what it means.
You've got it right. Derek Sivers recommends doing this for any serious rentacoder/vworker/getacoder projects you post:
"""
At the end of your post, write something like, “VERY IMPORTANT: To separate you from the spammers, please write I AM REAL as the first line of your bid. We will delete all bids that do not start with this phrase, since most bidders never read the requirements. Thank you for being one who does.”
"""
Of course it's not a sufficient criterion by itself; Sivers' point was that a lot of them won't even do this, so you can at least save yourself the trouble of evaluating them.
This is why everyone who calls themselves a programmer or developer needs a strong computer science background -- university provided or not. Without it you can't accurately assess the difficulty of a problem, even seemingly simple ones like this.
"Without it you can't accurately assess the difficulty of a problem, even seemingly simple ones like this."
And with a background, you'd recognize hard problems?
Sure, very well-known NP-complete problems you'll recognize, but there are plenty of other problems that are equivalent to them, which you might stumble upon without even realizing it (I recently wrote a post about one such problem: http://www.loopycode.com/a-surprisingly-hard-problem-post-co...).
Post's Correspondence Problem is nice. As an exercise you may try to figure out how to simulate a Turing machine in the Correspondence System. If you've done so, you basically have a prove of the undecidablility of this problem.
I worked on this, when I was bored in theoretical computer science classes.
Well, in this case you mainly need strong common sense and the ability to use Google/Wikipedia. Any programmer that doesn't automatically Google something he doesn't recognise or understand isn't worth his salt either.
That's only useful if you happen to get the academic name of the problem when you're trying to solve it. What if it just turns out that the problem you're solving resolves to the knapsack or travelling salesman problems? You're not going to be able to Google TSP unless you already know what TSP is.
Since the post only specified polynomial time and placed no constraint on space, this is actually solvable (e.g., it's not too hard to come up with an polynomial implementation that uses a number of VMs that grows exponentially with the size of the set).
But the getacoder poster only specified that it had to be done in polynomial time, and didn't clarify whether that had to be total CPU time (i.e., "work") or wall clock time.
Given that underspecification, it's perfectly justifiable to use an exponential amount of resources to achieve the goal.
Still, regardless of whether P!=NP, it is known that P is contained in NP which is contained in PSPACE. These classes are defined for certain computational models. If you change the computational model then those definitions might not make sense.
In the real world, taking exponential space implies taking exponential time. The fastest you can transmit information is the speed of light, so it will take exponential time to communicate with the furthest pieces of your exponentially large system.
Depends on the problem and how you distribute data.
e.g., each node can generate its own subset to check, so you only need to distribute the program and input. That can be done using a multicast tree of some kind which brings distribution costs back down by a log.
But if you only have a finite amount of hardware available (which is probably the case), each node's data set grows exponentially with input size.
If your node population increases exponentially with input size, you're still running into the speed-of-light delay problem mentioned above (cube root of exponential is still exponential). If your solution is polynomial time with polynomially many nodes, your space cost is also polynomial.
This could be said about almost all "polynomial time" algorithms though. The bigger the loop, the larger the variables you'll need (in general), and the more time you'll need to communicate between the pieces.
I greatly appreciate that style of response over the "this is a joke, you idiots!" style or the twee exaggerated response that accomplishes the same end.
This is awesome. I will send him the program which runs every Turing machine simultaneously on his problem instance, and checks the output of each to see if it is a correct solution.
If he manages to prove that my submission doesn't solve his problem in polynomial time, I'll refund his money, but I'll also send his proof on to the Clay Institute.
You accept when you find a Turing machine that outputs something which turns out to be the polynomial-time verifiable certificate for the NP problem.
You never reject, because the requirement was only to accept in polynomial time. (If you prefer, you could reject in exponential time by exhaustive search, of course.)
What am I missing? I can see the solution for those 6 numbers {−2, −3, 15, 14, 7, −10} or is that the point, the more numbers you add, the longer it takes to solve and p=np is that it should get all the answers instantly?
To be fair, the description is bad and misleading. The submitter probably knew that the responses would miss that the input might be very (infinitely?) long.
Fully polynomial approximation schemes.
Pseudo-polynomial time algorithms.
In other words, you may find that for all practical purposes this problem can be solved so fast that it doesn't really warrant the NP-badge in some sense. You can say that the problem is an NP problem which disguises itself as a P problem. It would be really fun to use David Pisingers knapsack algorithm codes, http://www.diku.dk/hjemmesider/ansatte/pisinger/codes.html and then bait the bidder into finding a set for which the algorithm is NOT fast (hint: That is also pretty hard :)