Karatsuba Algorithm

tzs · on Oct 5, 2019

I used this in a C++ big integer library I wrote in the '90s. One issue I came across is how to best handle it when the operands are different sizes. I asked on sci.math, but never found an answer. 10 years later, a similar question arose on gmp-discuss [1], also sans answer. The second message in that thread quotes my old sci.math post [2], which includes some experimental results that suggest that there are patterns here, but the patterns escaped me.

[1] https://gmplib.org/list-archives/gmp-discuss/2003-January/00...

[2] https://gmplib.org/list-archives/gmp-discuss/2003-January/00...

ColinWright · on Oct 4, 2019

This was improved to O(n·logn·loglogn)[0], and very recently has been improved still further to O(n·logn)[1] - discussed here[2].

O(n·logn) has for some time been conjectured to be the lower bound.

[0] https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strasse...

[1] https://hal.archives-ouvertes.fr/hal-02070778/document

[2] https://news.ycombinator.com/item?id=19474280

olliej · on Oct 5, 2019

The O(NlogN) algorithms all have very large constant factors - IIRC the crossover point from karatsuba or tooms to FFT based solutions (the NlogN algorithms) is in the thousands of digits.

jacobolus · on Oct 5, 2019

The recent O(n log n) algorithm is only useful for numbers of size larger than 2^(2^(9^12)). Which is to say, this will never be of practical use in our universe. (Of course, conceivably someone else could invent a different algorithm of the same asymptotic complexity with a more reasonable constant factor.)

But it’s still nice to have a proof that this lower bound is attainable in theory. Makes it easier to write down the theoretical lower bound for a wide variety of other algorithms.

xscott · on Oct 4, 2019

If it could go lower than n log n, that would have huge implications for signal processing. Multiplication, convolution, and FFTs are all so similar to each other.

fny · on Oct 5, 2019

I always thought the Russian peasants algorithm was faster than this at O(log n). Can someone explain what I've misunderstood?

naniwaduni · on Oct 5, 2019

Conventionally n is the length of the input, which is generally logarithmic in the numerical value of the input for algorithms dealing in integers.

Moreover, analyzing Russian peasant multiplication as O(log N) implies treating addition as constant-time, which it certainly isn't. (This is intuitively clear if we try applying the algorithm with slightly large integers!)

dreamcompiler · on Oct 4, 2019

Karatsuba is still quite useful for medium-length numbers that are too short for the faster methods to be worthwhile.

bcaa7f3a8bbc · on Oct 5, 2019

Its discovery is quite legendary.

It was the first multiplication algorithm faster than long multiplication known to humans. Before 1960, few people thought it was possible to do it faster than O(n^2). Even one of the best mathematician in the 20th century, Kolmogorov, believed the long multiplication is probably asymptotically optimal in 1955, and asked people to try proving this conjecture in a 1960 conference. And within a week, his student Karatsuba found this algorithm. It's also one of the earliest divide-and-conquer algorithms discovered in computer science. Since then, faster and faster algorithms have been discovered.

Although in practice, the ALU itself is almost never the bottleneck of computation on a modern computer. The setup cost of these algorithms is often greater, making them only useful for bignum (Karatsuba is only faster for number longer than a few hundreds of decimal digits?) It also explains why faster algorithms haven't been discovered in the last 2000 years.

---

The historical review paper by Karatsuba

* The Complexity of Computations

http://www.ccas.ru/personal/karatsuba/divcen.pdf

bascule · on Oct 4, 2019

Here's a real world application of Karatsuba: carryless multiplication of finite field elements for cryptography (universal hashing):

https://github.com/RustCrypto/universal-hashes/blob/master/p...

(note: that function is a bit more than Karatsuba, it also has a modular reduction at the end. I should probably refactor it to make that more clear)

ur-whale · on Oct 4, 2019

One interesting aspect of the Karatsuba algorithm is how it was discovered, and Kolmogorov's implication in the story.

IIRC Kolmogorov got upset by the discovery as it was made during a seminar dedicated to proving that one couldn't do better than O(N^2)

More info : https://cstheory.stackexchange.com/questions/21564/why-did-k...

jhncls · on Oct 4, 2019

The linked wikipedia article seems to contradict your assumption that Kolmogorov got upset: Kolmogorov was very excited about the discovery; he communicated it at the next meeting of the seminar, which was then terminated.

opus132 · on Oct 4, 2019

I took a Computer Architecture class for undergrad, and one of the pieces of coursework was to implement a Karatsuba's Algorithm in raw UAL Arm Assembly, using some weird non-standard (almost Chen-Ho, but not quite) decimal encoding of numbers devised by the instructor to "make things interesting". The idea was for it to support arbitrarily large numbers encoded using this format, as opposed to the considerably more trivial requirement that it just support stdint style integers.

It was considerably harder than it sounded (very few people in the class actually managed to get something working, since it was 'optional' but could still raise your grade), and I had daily nightmares about stack frame corruption throughout the project.

teej · on Oct 4, 2019

Cunningham's Law at work. Throw a conference saying something isn't possible and someone will attend just to disprove it.

saagarjha · on Oct 5, 2019

If someone would only be so kind to lend me a million dollars, I'd love to hold the "P=NP will never be solved" conference ;)

nickcw · on Oct 5, 2019

Python uses Karatsuba multiplication for its long integers. This works very well for not too big long integers but when they get huge using the GMPy module is much quicker.

If you want to see a practical demo of that when calculating pi see: https://www.craig-wood.com/nick/articles/pi-chudnovsky/

adenadel · on Oct 4, 2019

The Strassen algorithm for multiplying matrices is also fascinating

https://en.wikipedia.org/wiki/Strassen_algorithm

pgt · on Oct 5, 2019

The Fourier transform and its generalisations are so underappreciated. I've been trying to understand where the speed savings come from and suspect it has to do with coprime harmonics.

I hope someone smarter and more knowledgeable will correct my understanding, but as far as I can tell, the logN savings in harmonic analysis comes from the exponential decay of harmonics in any real ringing system (a passive resonant chamber), because you can avoid "counting"[^1] the energy of subsequent harmonics as precisely as the fundamental frequency.

An excitation frequency of 400Hz will produce resonant harmonics at 800Hz, 1200Hz, etc. at exponentially lower energies than the fundamental. So essentially, you can skip measuring the totatives of the frequency bin you care about.

For example, to find the cause of excitation at 9Hz, you only need to count the energy at 1Hz, 3Hz, 6Hz and 9Hz. You can ignore the totatives of 9 except for 1, which are {1,2,4,5,7,8}.

So if you detect 10J at 2Hz and 10J at 4Hz, you can be sure that there was an excitation at 4Hz as well as 2Hz. Euler's totient function is multiplicative so you win if you do this simultaneously for coprime frequencies.

This is not my area, so math or signal guys please tell me where I'm wrong.

[^1]: "counting" happens by cross-correlating the input with a reference signal, which in Fourier analysis is a cosine wave at the frequency bin.

lawrenceyan · on Oct 4, 2019

Use Fast Fourier Transforms for O(nlogn) multiplication!

klyrs · on Oct 4, 2019

Actually look at GMP or MPIR for the real dirt. There's a lot of detail hidden in that big-O notation, and IIRC Karatsuba is faster for certain ranges of integers.

fdej · on Oct 4, 2019

Here's GMP's current tuning (for Skylake CPUs, from https://gmplib.org/devel/thres/)

Basecase: up to 520 digits (27 limbs x 19.26 digits per 64-bit limb = 520)

Karatsuba: 520 - 1425 digits

Toom3: 1425 - 4045 digits

Toom4: 4045 - 6068 digits

Toom6: 6068 - 8053 digits

Toom8: 8053 - 120835 digits

FFT: 120835+ digits

jheriko · on Oct 4, 2019

yes, for smaller integers it is necessarily faster... there are higher order versions, the Toom-Cook algorithsm, which fill the gap between them too...

xscott · on Oct 4, 2019

There are subtleties that make this more difficult than it sounds. For small N digits, you don't have to worry about rounding error on your FFTs, but as N gets large, you need to be careful.

ranie93 · on Oct 4, 2019

It seems to be mentioned in the article:

"[...]and the Schönhage–Strassen algorithm (1971) is even faster, for sufficiently large n."

https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strasse...

jwilk · on Oct 4, 2019

Schönhage–Strassen is O(n · log n · log log n).

dreamcompiler · on Oct 5, 2019

It's too slow unless n is at least on the order of several thousand. Big-O notation doesn't always tell the whole story.

nautical · on Oct 5, 2019

If someone is interested in things like, they should also look up "Vedic Mathematics" (Vedic = Something from the Vedas), its Maths formulas for faster multiplication / addition / square roots etc.