Hacker News new | past | comments | ask | show | jobs | submit login
You can't google 9999999..99999999999999999999999 (google.com)
364 points by reg29 on Sept 1, 2011 | hide | past | favorite | 80 comments



On a faintly related note, I once was calculating partitions, using a relatively inefficient method (memoization: if f(n,k) is the number of distinct ways to express n as the unordered sum of integers no greater than k, then f(n,k) = f(n-k,k) + f(n,k-1)). My computer started to feel the strain in the thousands (this algorithm is O(n^2) in space and time). I then googled for the partition of, say, 1034, which is:

  91363785902248291467082481888195
I figured that chances are that no website will have that integer on there by accident. Lo and behold, I found, among other results, a text file containing the partitions up to 10,000, presumably done with a more efficient algorithm (likely Euler's ridiculous pentagonal-number recurrence formula, which, memoized, is roughly O(n^1.5) in time and O(n) in space): http://oeis.org/A000041/b000041.txt


It should be pointed out here for everyone else that in general, if you want terms of a well-known integer sequence, OEIS (and the links section for longer tables like that) is a good place to check.


For more general math there's also WolframAlpha. Eg.: http://www.wolframalpha.com/input/?i=partition+1034

They are also quite capable of blowing my mind sometimes. (Try "When will the ISS be visible?" or "coastline length of africa / length of stride")


It blew my mind when I wondered how many femtoseconds it had been since the start of the Roman Empire, and it obligingly told me: http://www.wolframalpha.com/input/?i=now+-+founding+of+the+r....

It also told me how many ergs are in a jellybean and what time the sun would set on April 2nd (http://www.wolframalpha.com/input/?i=sunset+april+2nd%2C+201...).

Of the many technologies I use, Wolfram Alpha is the one that feels the most like the future sometimes.


This is why I love Hacker News. The pure hack value of some comments is mind-blowing.


+1 Completely agree :)


As coincidence would have it, a friend just announced to me that he has written code to compute the number of partitions of 10^12 in 109 seconds.


Memoization-by-Google.


Ridiculous in what way? Ridiculously good compared to alternatives, or ridiculously complex or? Asking out of pure ignorance here.


So there are numbers called pentagonal numbers. You may be familiar with the triangular numbers: 0 1 3 6 10 15 ..., with the formula n(n+1)/2, and the square numbers: 0 1 4 9 16 ..., with the formula n^2. In general, the k-gonal numbers begin with "0 1 k", and they are defined by a quadratic formula.

Incidentally, here's a fun little trick. If you have a sequence defined as the values of a polynomial at consecutive integers, then if you take the differences of consecutive terms of the sequence, you'll get a sequence that is the values of a polynomial with degree 1 less. For example, I'll think of a random quadratic. The sequence is:

  3 12 39 84 147 228 327 444 579
And then the consecutive differences are:

  3 12  39  84  147  228  327   444   579
   9  27  45  63   81   99   117   135
You might notice that the second row is a straight-line function. If we take another set of differences...

  3 12  39  84  147  228  327   444   579
   9  27  45  63   81   99   117   135
    18  18  18   18   18   18    18
We end up with a constant function. (And I guess taking differences on a constant function yields another constant.) Then, if we say that the sequence started at index 0 (i.e. f(0) = 3, f(1) = 12, and so on), then I can tell you, by looking at the leftmost elements of each row, that the polynomial was f(n) = 3·(n choose 0) + 9·(n choose 1) + 18·(n choose 2), which comes out to be 9n^2 + 3. [1] That is in fact the polynomial I picked. This method is what Martin Gardner called the "calculus of finite differences"; taking differences is an analogue of taking derivatives, and the result is an analogue of Taylor series.

Anyway, pentagonal numbers are defined like this [this is the only way that makes sense to me, anyway]:

  0 1 5 ...
which gives us some differences:

  0 1 5 ...
   1 4 ...
    3 ...
and we know that the bottom is a constant, because it's the second difference on a quadratic:

  0 1 5 ...
   1 4 ...
    3 3 3 3 3 3 3
  -->    ;now the bottom row is the differences of the row above it
  0 1 5 ...
   1 4 7 10 13 16 19 22
    3 3 3  3  3  3  3
  -->    ;and *that* is the differences of the top row
  0 1 5 12  22  35  51  70  92
   1 4 7  10  13  16  19  22
    3 3  3   3   3   3   3
Incidentally, this is the method of calculating values of polynomials that Charles Babbage's Difference Engine was designed to do. It's nice and easy. Anyway, back to theory about pentagonal numbers. Using the pseudo-Taylor series method, we can read off "0·(n choose 0) + 1·(n choose 1) + 3·(n choose 2)" as our formula, which comes out to n(3n-1)/2. By the way, we can also plug negative numbers into this formula (or, equivalently, we could take the differences backward), which yields what are called the "generalized pentagonal numbers":

  n       ... -6 -5 -4 -3 -2 -1 0 1 2  3  4  5  6 ...
  pent(n) ... 57 40 26 15  7  2 0 1 5 12 22 35 51 ...
  -->
  pent(n) sorted: 0 1 2 5 7 12 15 22 26 35 40 51 57 ...
Now, we define P(n) to be the number of partitions of n, i.e. the number of distinct unordered ways to express n as the sum of positive integers. The first several terms look like this:

  P(0): 0 = <nothing> --> 1
  P(1): 1 = 1          --> 1
  P(2): 2 = 2, 1+1        --> 2
  P(3): 3 = 3, 2+1, 1+1+1     --> 3
  P(4): 4 = 4, 3+1, 2+2, 2+1+1, 1+1+1+1   --> 5
  P(5): 5 = 5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, 1+1+1+1+1   --> 7
  P(6): 6 = 6,5+1,4+2,4+1+1,3+3,3+2+1,3+1+1,2+2+2,2+2+1+1,2+1+1+1+1,<six 1's> --> 11
You could compute several more by hand. Then the recursive formula I originally described might occur to you, and you could make a computer calculate it, perhaps up into the thousands. Now, what if I told you this:

  P(n) = P(n-1) + P(n-2) - P(n-5) - P(n-7) + P(n-12) + P(n-15) - ...
where that sequence "1 2 5 7 12 15" is the "generalized pentagonal numbers" I described above? How the hell are pentagonal numbers connected with partitions? That's what I mean by ridiculous. ... The Wikipedia article in the sister comment describes a proof of the connection. I still find it counterintuitive.

[1] "Choose": http://en.wikipedia.org/wiki/Binomial_coefficient


Thanks for the in-depth answer. This actually reminds me of Chebyshev polynomials, which I was obsessed with in a very simple way last spring. But I'm not advanced enough in maths to draw any concrete connection.


See how http://en.wikipedia.org/wiki/Pentagonal_number_theorem strikes you ;-)

At the bottom there's a bit of Python code for the partition numbers, incidentally.


Thanks, shithead


The double dot indicates that this is a ranged query.

11..22 would be to search for all integers between 11 and 22.

The effect is that the search is too broad. The problem is more likely to be that the range search is a mapreduce that performs a search for each item in the range.

You can imagine why that's a bad idea, and some aggressive timeouts are probably what stop it from going too far.

Plus the numbers in the range of the OP search are likely to fall in the ranges of sensitive numbers, credit cards being the most sensitive... which are likely to be explicitly blocked.


If this were true, surely it should allow you to use quotes to force it to ignore the "range" operator...

http://www.google.com/search?q=%229999999..99999999999999999...


It ignores the quotes a lot of the time. They are less explicit than you may imagine.


yup, "11..22" still finds all numbers from 11 to 22.


But '11..22' yields 11,22 11:22 11/22 etc.

Seems the Unix Bourne shell is seeping in - single quotes are for real, double quotes allow param substitution ...


Then why does 100000..100000 get the same error? Not a huge span, and not likely to contain anything all too sensitive.


A six-digit lower end of the range seems to be the point at which they reject range searches (or it errors, perhaps by exception).

This range works: 99999..1000000

Increment that first digit and it will fail though.


> The effect is that the search is too broad. The problem is more likely to be that the range search is a mapreduce that performs a search for each item in the range.

I'm a bit confused to see this so highly upvoted. I'm not sure you understand what MapReduce is for. MapReduce is not a technology designed for executing a search query in under 100ms, it's for massive data processing jobs. MapReduce is what you would use to do to prepare an index, not to query it.

> You can imagine why that's a bad idea, and some aggressive timeouts are probably what stop it from going too far.

As you yourself noted, this is blocked to prevent people from searching for SSN/credit card/etc. dumps.


The reason for this is that you can google ranges of numbers, and people a couple of years ago were using this feature to find credit card numbers that were posted online, eg searching for 4000000000000000..4999999999999999

It was around the time that johnny's google hacking page became popular, iirc. (http://www.hackersforcharity.org/ghdb/)



care to explain?


I think the mobile version doesn't have the same limitation.


He might be referring to this:

Our systems have detected unusual traffic from your computer network. Please try your request again later. Why did this happen?

IP address: x.y.z.t. Time: 2011-09-01T19:03:40Z URL: http://www.google.com/search?q=9999999..99999999999999999999...


No idea why you've been downvoted --- this is what google says for me, too.


You also cannot google some stuff commonly found in phpBB because of so many hacked sites out there.

Also remember how you used to be able to search for anything up to the 1000th item (10 pages of 100 results). Not anymore for a long time now. Google sucks it all in but won't share and play nice with others.

Why not allow such searches unless a bot is detected (too many pages too quickly, etc.)


"With all due respect John, I am the head of IT and I have it on good authority that if you type 'Google' into Google, you can break the Internet." -- The IT Crowd


I think this question is cool to think about and try to answer: What is the lowest integer that doesn't have any hits on google? Is there any reasoning that can help estimating the approximate magnitude it should be?


A problem might be that as soon as you discover that number, you can't tell anyone on the web about it. This for example has no results as of now: http://google.com/search?q=813115181452319


As of now, it has 3 results (all 3 pointing in some way or the other to your own remark). Damn. That was fast!


Try (813115181452319 - 1) It's lower, has 0 results, and I'm not mentioning it here!


You could look at the hits returned on increasing numbers (eg, 1111 v 11111). A quick glance shows they seem to reduce towards zero by the time you reach 12 digits.

From there, it's probably trial and error - start with 12 1's in a row and fiddle with them. Even try deleting 2 random digits to see what happens (from 12 ->10 as anything with 11 numbers returns a UPS Package Tracking Link / Ad for me). Lowest I had is 111232111222 - perhaps someone should write a Wikipedia page on it?

http://en.wikipedia.org/wiki/Interesting_number_paradox


The interesting number paradox was great, didn't know about it. Almost the same contradiction exists in my problem statement, as someone mentioned below. Would you be able to identify the lowest non-indexed number you probably couldn't keep it from being published on the web.


What's the lowest integer that will never have any hits on Google? It must exist by induction, I think.


0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 :p


[deleted]


I know! That's part of the intrigue :)


Guess they don't want you to search for credit card numbers


Can someone please explain me why that happens? Google says - "Our systems have detected unusual traffic from your computer network. Please try your request again later."


Years ago somebody discovered you could Google for credit card numbers that had been left on the web inadvertently, by searching for the right number range. I wonder if that is why it is unavailable.


It seems to slow down, as you increase from:

111..111 11111..111111

and eventually crashes.

I'm guessing there is some kind of grammar parser breaking horribly. There's grammar parsers that have extremely poor performance trying to parse stuff that looks like this. It might do:

search 1 and 111111..11111

search 11 and 11111..11111

...

search 1 and 1 and 11111..111

...

...

...

and google detects that you are DOSing its server.

OK, someone else has pointed out that this is a range search, and google may be protecting CC searches.


Do you regularly search for 99999...999999? Sounds like it's unusual to me!


Reading the comments here gave me an idea....google your own credit card numbers to check if its already in some scammer's index. While no results might not necessarily mean you are safe, a positive match is a clear red flag.


Even better, LOCK DOWN YOUR GOOGLE ACCOUNT WITH DOUBLE AUTHENTICATION, then set Google alerts for your:

credit card, or if paranoid, partial credit card # "123456789"

Email address

Home address

Phone number

screen name (e.g.: "andrenotgiant" -site:ycombinator.com)

Bank acct partial

Password, Password partial "p@ssw0r"


Sweet, now someone just has to get into my google account settings and I'm screwed.


I imagine this is why the parent poster mention enabling two-factor authentication first.


Sweet, now someone has to get into parent's google account AND swipe his phone!


And my phone has my google account information.. :/


Are we too obsessed to ignore irony/humour?


Are these alerts over Gmail traffic or the whole internet?


Better yet, do this several times per day -- see how long it takes to get your numbers into Google Suggest :)


Doing that over HTTP (not HTTPS) is a fairly bad idea.



So I guess mikkohypponen is the responsible for those searches not to work. Interesting...


Funny how Duckduckgo lists all sites linking to this story.


Our systems have detected unusual traffic from your computer network. Please try your request again later. Why did this happen?


Because what Google is preventing is the discovery of accidental credit cards in it's index.

You searching for credits cards (via that ranged query) is a flag that the search is possibly automated and they display the warning and catchpa as a mitigator.


Neither

  1111111..11111111111111111111111
  11111111..1111111111111111111111
  111111..111111 // minimum case
or any other integer combination. 6 digits is the minimum amount that spawn the bizarre error.


000000..000000 //minimum case?

I know, the point is this happens with 6 digits minimum on each side of ..


100000..100000 Neither


There are 6 digits in that, which he mentioned was the minimum for getting an error.


However, you can still set Google Alerts for a search like that.


I see why it doesn't work, but why doesn't it work with quotes either?


Maybe because the value of the string is also large that can cause overflow? You mean you got the alert when you put in "999999999.....99999"?



Take out a few 9's out and Google still returned an error: http://www.google.com/search?q=9999999..999999


I tried also this: 9999999..999999 same results!


Probably that string was used as a dork together with some more text to find vulnerabilities on web applications or so. So it was reported as a honey

This does look like a string that could very much be generated by some poorly coded webbapp. An incorrect usage of a floating point number can easily generate such output. If it occurs on a critical part of an application it could very well been used as a dork. Just an hypothesis though.

Back in 2005 google tricks were at their peak. Many hackers experimented with search phrases in order to retrieve interesting/valuable/uncommon/dangerous?/sensitive info from the web. "index of/ .mp3" "apache server at port" being the absolute classic.

More and more people started to jump in the bandwagon, webmasters gradually became more aware of this, and google too. The natural reaction was google honeypot.

http://ghh.sourceforge.net/

But it wasn't too long until google started to remove such features. These days one can hardly search for symbols on google. They the old tricks, most of them will not work, google simply ignores the details and returns a list of results based on the actual words contained in the query. It's becoming a QA machine. That's one of the reasons I switched to duckduckgo.

For a proper reading on the subject check ou the vast website of the, now deceased, great hacker Fravia: http://www.searchlores.org/indexo.htm


I get this:

Our systems have detected unusual traffic from your computer network. Please try your request again later.


Change the last 9 to 8 and see what happens ?


Bing to the rescue: http://www.bing.com/search?q=9999999..9999999999999999999999...

Pretty similar URL format eh?


Bing doesn't really come to the rescue -- the X..Y format on Google searches for numbers in a range. Bing seems to just try to find that string as plain text.


Yes, it doesn't hit some strange error page though.


Correct.

Instead it returns some generally useless results that are likely not at all what the user desired.

Truthfully this seems about what I expect from Microsoft.


The point is that Google crashes and Bing returns somewhat useless results. It's better, though only incrementally.


It's not a crash. It's a wall preventing people searching for CC#'s without giving them a specific reason why they won't execute that search.


Instead it returns some generally useless results that are likely not at all what the user desired

If the user entered "9999999..99999999999999999999999" in the search box, what would you say he desired to get in return?

Serious question, cause I have no clue about why would anyone genuinely want to search for that ...


In google the ".." between two sets of numbers indicates search for numbers between the range of the two.

So, if Bing is trying to "emulate" google in that way for more advanced queries, they are doing it wrong.

Since google is sort of the defacto standard for search right now, I would say that Bing's response would be out of the ordinary for what a user accustomed to doing those kinds of searches would expect.


Garbage in, garbage out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: