Hacker News new | past | comments | ask | show | jobs | submit login
Mathematicians Discover Prime Conspiracy (2016) (quantamagazine.org)
117 points by yurisagalov on Feb 16, 2020 | hide | past | favorite | 52 comments



> If Alice tosses a coin until she sees a head followed by a tail, and Bob tosses a coin until he sees two heads in a row, then on average, Alice will require four tosses while Bob will require six tosses (try this at home!), even though head-tail and head-head have an equal chance of appearing after two coin tosses.

Ok, I verified on 10 000 tosses, it works, but I cannot see why. Can somebody explain?

Edit: nevermind, it can be found in the previous discussion.


For anyone else interested, the way this was explained to me a while ago was to look at the failure mode after an H. For Alice, her failure for HT is HH, therefore her next flip can land on T, completing the sequence. For Bob, his failure mode is HT, so he now needs to flip a H before he can try for the 2nd H.



Related and well-known by software developers: https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-sea...


I find number theory to be closer to a natural, investigative science akin to Physics rather than pure math such as calculus.

IANAM, but I doubt there was ever a modern calculus theorem that was hinted by a brute force search. One can argue that Calculus is "made up" and thus research in this field is finding deeper logical consequences for the initial axioms, while natural numbers just "have properties" and research is Number Theory is trying to uncover these properties by brute force and later trying to prove them, arguably quite unsuccessfully so far. It's an interesting tidbit in the philosophical question of "Does math exist only in our head?"


Your intuition is not all that wrong, but it’s not entirely correct either.

Number Theory is basically the exploration of the deep consequences of a set of (ultimately arbitrary but apparently obvious) axioms. The exploration of those consequences arises from the fact that they are ultimately too plentiful, ramified, and incompletely provable (per Godel’s Incompleteness Theorems & Turing’s Halting Problem).


> Number Theory is basically the exploration of the deep consequences of a set of (ultimately arbitrary but apparently obvious) axioms.

Isn't that true for all fields of pure math?


Almost all fields of pure math use a much stronger set of axioms called ZF and essentially everyone also accepts the axiom of choice (making it ZFC). The axioms in ZF are reasonable but the axiom of choice is surprisingly controversial for an axiom. There are some unintuitive consequences of the axiom but even more unintuitive consequences without it or with the negation of it.

Number theory uses a much smaller set of axioms.

It should be stated that most mathematicians don't really mind the logical foundations of their work when they are actually working in the same way that most programmers don't worry about assembly language or transistors.


Not entirely arbitrary. The Peano axioms turn out to be a really good model for a lot of real-world processes.


That’s why I meant by “ultimately arbitrary but apparently obvious”: mathematically, they are just one combination in an infinite set of internally consistent axiom sets, but apparently obvious insofar as they very accurately model our universe. Whether other universes could exist where such axioms do not obtain is something of an imponderable question.


The study of calculus that you’re referring to is real analysis. Never quite thought of it as pure math though as it’s very applicable to many things - hence the “analysis.”

Number theory is actually an example of pure math having real world impacts despite its nature. It’s very much about studying math for maths sake - though there are also people interested in it for their own applications.

They’ve gone the brute force route with number theory because proving things in it is hard - in the sense that a hard number theory problem takes centuries to solve otherwise.


> I find number theory to be closer to a natural, investigative science akin to Physics rather than pure math such as calculus.

And yet the man who gave us physics ( newton ) also gave us calculus. Rather calculus was invented to help describe and model physics and the natural world.


Abstract: While the sequence of primes is very well distributed in the reduced residue classes (mod q), the distribution of pairs of consecutive primes among the permissible ϕ(q)2 pairs of reduced residue classes (mod q) is surprisingly erratic. This paper proposes a conjectural explanation for this phenomenon, based on the Hardy-Littlewood conjectures. The conjectures are then compared to numerical data, and the observed fit is very good.

- https://arxiv.org/abs/1603.03720



LOL, after posting my other reply I visited your link... and I just realized I repeated myself from 3 years ago.

I'm not sure how to feel about that reminder of age.


Two days ago I found an elaborate response to something I was researching on Reddit, only to discover at the bottom that I was the person who wrote it, 5 years ago. It's a weird feeling indeed!


I've been getting stack overflow answers every blue moon to my node proxy settings question for nearly 10 years. Weird feeling, it's nice to know we exist after we look away.


.. or a reminder of consistency ?


>This conspiracy among prime numbers seems, at first glance, to violate a longstanding assumption in number theory: that prime numbers behave much like random numbers. Most mathematicians would have assumed, Granville and Ono agreed, that a prime should have an equal chance of being followed by a prime ending in 1, 3, 7 or 9 (the four possible endings for all prime numbers except 2 and 5).

This seems like an odd assumption to me. Surely sexy primes are more common than twin primes, so at least for primes that are near each other there should be a higher probability for certain sequences of final digits. This is obviously not proof in itself, but it would certainly make me hesitate to assume there is an equal probability in the general case.


> Surely sexy primes are more common than twin primes, so at least for primes that are near each other there should be a higher probability for certain sequences of final digits.

I think that's conjectural, but prime constellations are also conjectured to be a negligible fraction of primes as a whole, much as primes are a negligible fraction of integers (asymptotically). I think twin primes are conjectured to be distributed as n / (log n) ^2, while primes are n / (log n).

Besides, even if (p, p + 6) is significantly more common than (p, p + 2), if the final digit of p is uniformly distributed among 1, 3, 7, and 9, I don't think the statistics as a whole are affected.


I always wondered if prime numbers can be used as the basis of a compression algorithm.

Where you describe how to get your number, in relation to how it can be factored using prime numbers.

You cut up your data into chunks, and then find the prime factors of each chunk. Now you have a list of arithmetic descriptions to describe your data, and you can use Huffman to compress the descriptions.

Of course, this would be computationally intensive, for both the compressor, and the extractor, but it may save on the payload size, to allow for efficient data transfer.

And while we still use Huffman’s algorithm, I wonder if advanced aliens may have discovered this technique for data compression.

This is more of a sci-fi fantasy algorithm, than anything actually mathematically sound.


Interesting idea, but factoring a number does not compress it - the number of digits (binary or otherwise) will stay roughly the same. You can see this using logarithms: a x b = c implies ln a + ln b = ln c. But ln x is proportional to the number of digits of x, so a and b have roughly the same length together as c.


If this were true, that would provide a way to seemingly compress any number (with the exception of prime numbers, though even those could be compressed by representing them as their index in the list of all prime numbers!).

The problem is that there can't be a way to compress every number: there aren't enough N-1 bit numbers to compress N bit numbers into (since the former set is only half as big as the latter).


There is no way around the fact that you need a number of bits to represent a number of different codes. All you can do is take less bits to represent the codes you want and more to represent those you don’t care about. Using weird representations does not change that. But Huffman coding works with that fact because it represents often-used codes using less bits.


There ARE a number of messages that can be represented as a polynomial of prime powers, look up Godelization. More generally there are SOME very compact methods to write large numbers using functions, eg factorial and tetration: 12345! + 3^^3^^2 -1 would be huge. The problem is MOST numbers do not have compact representations so they'd be bigger. I do wonder about how effective it is to pad messages in-band to align with the nearest compact representation.

Funny you should mention scifi because there's a story that got me interested in this topic [1] where some folks encode a large, important message into a compact algebraic expression like this. Turns out they're pissed at the recipients and the decimal expansion would be enormous so it would take them a while to decode it.

1. Fredrick Pohl, "Starburst", pp 56, 58, 59.


The prime factors of a number takes (about) the same amount of space to list as the number itself.


Notably, even listing the ordinal index of the prime (e.g. that a number is the product of the first, fifth, and tenth primes) takes about the same amount of space as well due to the prime number theorem.


> the average gap between consecutive prime numbers among the first N integers is roughly log(N)

So if you are compressing exceptionally large numbers, say, a whole video file, which is likely to contain very long factors, you could shave log2(log(N)) bits for each factor of length N. Seems like the most impractical compression algorithm imaginable.


I would assume the intent is to use the fact that most numbers use the same prime factors as the encoding advantage. As in, you're transforming a list of dissimilar numbers into a list of similar ones.


The only way this might work is that you succeed in finding the GCD of a sequence of numbers:

https://en.wikipedia.org/wiki/Greatest_common_divisor


That would tend to 1 for any reasonably size sequence.


paper: "Unexpected biases in the distribution of consecutive primes" http://arxiv.org/abs/1603.03720

explanation for non-experts: "The Last Digit of Prime Numbers - Numberphile" https://www.youtube.com/watch?v=YVvfY_lFUZ8


Is the Numberphile video based on the paper? Because the paper was out on 30 May 2016 and video was out on 4 May 2016


Read the paragraph with the heading "Submission history".


Classic example of following definitions or illustrious assumptions too closely... another one is an obvious one, and historically has been challenged a few times: the beginning of the prime sequence 'should be' 1, 3, 5.. very clean, very coherent, but it got stuck in definitions limbo and now we got no 1 and the alien 2, that has to be excepted in most algorithms..eh...


Why do you think 2 is alien? How could the fundamental theorem of arithmetic be stated without 2?


2 isn’t an alien, exactly... but it’s a little bit of an oddball amongst the primes. There are indeed a lot of theorems that start with “let p be an odd prime.”


2 is definitely no “alien”. More like a legendary prime god amongst countably infinite odd primes. 0 and 1 are two sides of an extreme that either generates a meaningful number system through their separation and extremis from one another or no system at all if 1=0 is assumed true. They’re like a Jekyl and Hyde god.


"Odd" is just a word that means "not divisible by two". There is truly nothing special about a prime not being divisible by two. I could equally say "3 is a legendary prime god amongst countably infinite non-divisible-by-three primes", "5 is a legendary prime god amongst countably infinite non-divisible-by-five primes"... The only difference is that we do not have special words that mean non-divisible-by-three, non-divisible-by-five, etc.

In this regard, I don't see 2 as a special prime. Saying "2 is the only even prime" is the same as saying "2 is the only prime divisible by two", which, by definition, can be said about any other prime.


Right. In the language of the article, if p is a whole number whose digits end in the number 2 then all numbers after it ending in the digits 0 or 2 are not prime. It’s a seemingly weak statement but it says a much stronger thing about the primes than the hypothesis of the article.

Anyway, just trying to react in tone to calling any number an “alien”. I think of numbers as being outside the realm of words like alien - I don’t even know what that would mean for a number. Hence my comment is hyperbole hinting at my love of mathematics. To me, it really is a beautiful area of study. Wish I could do it more.


I wonder what the implications are for cryptography - given the heavy reliance on 'random' prime numbers

but one would have to delve into this - and from the top of my head there are no algorithms that make you pick 'two consecutive' primes - so the implications - if any - aren't obvious - anyone here has some more light or other angles to shed into this?


Why would this have any implications on cryptography? This does not make factoring an RSA modules any easier.


Well, in RSA you have to choose two prime numbers, multiply them, and keep them secret. p and q: pq =n . And n is made public. I wonder if this probability, maybe coupled with concrete implementations, makes it for a more restricted set of guessing p and q. That would be it. I guess that given that this only introduces 'restrictions' on sequential prime numbers, doesn't really help at all, given that p and q should be random. Unless there's a shortcut applied in implementation that you find a random p and then q is the next prime number. Hence my question to the community. But I only know that both RSA and DH rely on prime numbers.


That's a fair point. Yes, I thought the same thing too when I read this. It is only introducing restrictions on sequential prime numbers. I doubt it has any bearing on the factoring problem yet, at least not without more work that can connect this to the factoring problem.


Not sure I understand the significance of a last digit of 9 for a prime implying the next prime is likelier to end with a 1 and not a 9. After all, base 10 is arbitrary. Can someone explain whether similar such "conspiracies" occur for all bases?


Yeah, the article says they investigated in base 3 first.

In that base, it can by 0, 1, or 2 mod 3 (for the final digit). Primes clearly aren't 0 mod 3, so all (not 3) primes end in 1 or 2. They found that primes "liked" to alternate between them.


I wonder if it's a some extension or variation on Benford's Law [0].

[0] https://en.wikipedia.org/wiki/Benford%27s_law


I don't think this is a law and it's stupid. It is just an observation. Of course 1 will be more frequent.


Why of course


Because 100 to 200 is twice as much and 700 to 800 is not


Carry on.. why does that make it obvious?


In base 30, a prime greater than 30 can end in only 9 of the possible 30 terminal digits. In contrast with base 10, where this statistic is 4 of 10.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: