On a faintly related note, I once was calculating partitions, using a relatively inefficient method (memoization: if f(n,k) is the number of distinct ways to express n as the unordered sum of integers no greater than k, then f(n,k) = f(n-k,k) + f(n,k-1)). My computer started to feel the strain in the thousands (this algorithm is O(n^2) in space and time). I then googled for the partition of, say, 1034, which is:
91363785902248291467082481888195
I figured that chances are that no website will have that integer on there by accident. Lo and behold, I found, among other results, a text file containing the partitions up to 10,000, presumably done with a more efficient algorithm (likely Euler's ridiculous pentagonal-number recurrence formula, which, memoized, is roughly O(n^1.5) in time and O(n) in space): http://oeis.org/A000041/b000041.txt
It should be pointed out here for everyone else that in general, if you want terms of a well-known integer sequence, OEIS (and the links section for longer tables like that) is a good place to check.
So there are numbers called pentagonal numbers. You may be familiar with the triangular numbers: 0 1 3 6 10 15 ..., with the formula n(n+1)/2, and the square numbers: 0 1 4 9 16 ..., with the formula n^2. In general, the k-gonal numbers begin with "0 1 k", and they are defined by a quadratic formula.
Incidentally, here's a fun little trick. If you have a sequence defined as the values of a polynomial at consecutive integers, then if you take the differences of consecutive terms of the sequence, you'll get a sequence that is the values of a polynomial with degree 1 less. For example, I'll think of a random quadratic. The sequence is:
We end up with a constant function. (And I guess taking differences on a constant function yields another constant.) Then, if we say that the sequence started at index 0 (i.e. f(0) = 3, f(1) = 12, and so on), then I can tell you, by looking at the leftmost elements of each row, that the polynomial was f(n) = 3·(n choose 0) + 9·(n choose 1) + 18·(n choose 2), which comes out to be 9n^2 + 3. [1] That is in fact the polynomial I picked. This method is what Martin Gardner called the "calculus of finite differences"; taking differences is an analogue of taking derivatives, and the result is an analogue of Taylor series.
Anyway, pentagonal numbers are defined like this [this is the only way that makes sense to me, anyway]:
0 1 5 ...
which gives us some differences:
0 1 5 ...
1 4 ...
3 ...
and we know that the bottom is a constant, because it's the second difference on a quadratic:
0 1 5 ...
1 4 ...
3 3 3 3 3 3 3
--> ;now the bottom row is the differences of the row above it
0 1 5 ...
1 4 7 10 13 16 19 22
3 3 3 3 3 3 3
--> ;and *that* is the differences of the top row
0 1 5 12 22 35 51 70 92
1 4 7 10 13 16 19 22
3 3 3 3 3 3 3
Incidentally, this is the method of calculating values of polynomials that Charles Babbage's Difference Engine was designed to do. It's nice and easy. Anyway, back to theory about pentagonal numbers. Using the pseudo-Taylor series method, we can read off "0·(n choose 0) + 1·(n choose 1) + 3·(n choose 2)" as our formula, which comes out to n(3n-1)/2. By the way, we can also plug negative numbers into this formula (or, equivalently, we could take the differences backward), which yields what are called the "generalized pentagonal numbers":
Now, we define P(n) to be the number of partitions of n, i.e. the number of distinct unordered ways to express n as the sum of positive integers. The first several terms look like this:
You could compute several more by hand. Then the recursive formula I originally described might occur to you, and you could make a computer calculate it, perhaps up into the thousands. Now, what if I told you this:
where that sequence "1 2 5 7 12 15" is the "generalized pentagonal numbers" I described above? How the hell are pentagonal numbers connected with partitions? That's what I mean by ridiculous. ... The Wikipedia article in the sister comment describes a proof of the connection. I still find it counterintuitive.
Thanks for the in-depth answer. This actually reminds me of Chebyshev polynomials, which I was obsessed with in a very simple way last spring. But I'm not advanced enough in maths to draw any concrete connection.
The double dot indicates that this is a ranged query.
11..22 would be to search for all integers between 11 and 22.
The effect is that the search is too broad. The problem is more likely to be that the range search is a mapreduce that performs a search for each item in the range.
You can imagine why that's a bad idea, and some aggressive timeouts are probably what stop it from going too far.
Plus the numbers in the range of the OP search are likely to fall in the ranges of sensitive numbers, credit cards being the most sensitive... which are likely to be explicitly blocked.
> The effect is that the search is too broad. The problem is more likely to be that the range search is a mapreduce that performs a search for each item in the range.
I'm a bit confused to see this so highly upvoted. I'm not sure you understand what MapReduce is for. MapReduce is not a technology designed for executing a search query in under 100ms, it's for massive data processing jobs. MapReduce is what you would use to do to prepare an index, not to query it.
> You can imagine why that's a bad idea, and some aggressive timeouts are probably what stop it from going too far.
As you yourself noted, this is blocked to prevent people from searching for SSN/credit card/etc. dumps.
The reason for this is that you can google ranges of numbers, and people a couple of years ago were using this feature to find credit card numbers that were posted online, eg searching for 4000000000000000..4999999999999999
You also cannot google some stuff commonly found in phpBB because of so many hacked sites out there.
Also remember how you used to be able to search for anything up to the 1000th item (10 pages of 100 results). Not anymore for a long time now. Google sucks it all in but won't share and play nice with others.
Why not allow such searches unless a bot is detected (too many pages too quickly, etc.)
"With all due respect John, I am the head of IT and I have it on good authority that if you type 'Google' into Google, you can break the Internet." -- The IT Crowd
I think this question is cool to think about and try to answer: What is the lowest integer that doesn't have any hits on google? Is there any reasoning that can help estimating the approximate magnitude it should be?
A problem might be that as soon as you discover that number, you can't tell anyone on the web about it. This for example has no results as of now: http://google.com/search?q=813115181452319
You could look at the hits returned on increasing numbers (eg, 1111 v 11111). A quick glance shows they seem to reduce towards zero by the time you reach 12 digits.
From there, it's probably trial and error - start with 12 1's in a row and fiddle with them. Even try deleting 2 random digits to see what happens (from 12 ->10 as anything with 11 numbers returns a UPS Package Tracking Link / Ad for me). Lowest I had is 111232111222 - perhaps someone should write a Wikipedia page on it?
The interesting number paradox was great, didn't know about it. Almost the same contradiction exists in my problem statement, as someone mentioned below. Would you be able to identify the lowest non-indexed number you probably couldn't keep it from being published on the web.
Can someone please explain me why that happens? Google says - "Our systems have detected unusual traffic from your computer network. Please try your request again later."
Years ago somebody discovered you could Google for credit card numbers that had been left on the web inadvertently, by searching for the right number range. I wonder if that is why it is unavailable.
I'm guessing there is some kind of grammar parser breaking horribly. There's grammar parsers that have extremely poor performance trying to parse stuff that looks like this. It might do:
search 1 and 111111..11111
search 11 and 11111..11111
...
search 1 and 1 and 11111..111
...
...
...
and google detects that you are DOSing its server.
OK, someone else has pointed out that this is a range search, and google may be protecting CC searches.
Reading the comments here gave me an idea....google your own credit card numbers to check if its already in some scammer's index. While no results might not necessarily mean you are safe, a positive match is a clear red flag.
Because what Google is preventing is the discovery of accidental credit cards in it's index.
You searching for credits cards (via that ranged query) is a flag that the search is possibly automated and they display the warning and catchpa as a mitigator.
Probably that string was used as a dork together with some more text to find vulnerabilities on web applications or so.
So it was reported as a honey
This does look like a string that could very much be generated by some poorly coded webbapp. An incorrect usage of a floating point number can easily generate such output. If it occurs on a critical part of an application it could very well been used as a dork. Just an hypothesis though.
Back in 2005 google tricks were at their peak. Many hackers experimented with search phrases in order to retrieve interesting/valuable/uncommon/dangerous?/sensitive info from the web. "index of/ .mp3" "apache server at port" being the absolute classic.
More and more people started to jump in the bandwagon, webmasters gradually became more aware of this, and google too. The natural reaction was google honeypot.
But it wasn't too long until google started to remove such features. These days one can hardly search for symbols on google. They the old tricks, most of them will not work, google simply ignores the details and returns a list of results based on the actual words contained in the query. It's becoming a QA machine. That's one of the reasons I switched to duckduckgo.
Bing doesn't really come to the rescue -- the X..Y format on Google searches for numbers in a range. Bing seems to just try to find that string as plain text.
In google the ".." between two sets of numbers indicates search for numbers between the range of the two.
So, if Bing is trying to "emulate" google in that way for more advanced queries, they are doing it wrong.
Since google is sort of the defacto standard for search right now, I would say that Bing's response would be out of the ordinary for what a user accustomed to doing those kinds of searches would expect.