I used this in a C++ big integer library I wrote in the '90s. One issue I came across is how to best handle it when the operands are different sizes. I asked on sci.math, but never found an answer. 10 years later, a similar question arose on gmp-discuss [1], also sans answer. The second message in that thread quotes my old sci.math post [2], which includes some experimental results that suggest that there are patterns here, but the patterns escaped me.
The O(NlogN) algorithms all have very large constant factors - IIRC the crossover point from karatsuba or tooms to FFT based solutions (the NlogN algorithms) is in the thousands of digits.
The recent O(n log n) algorithm is only useful for numbers of size larger than 2^(2^(9^12)). Which is to say, this will never be of practical use in our universe. (Of course, conceivably someone else could invent a different algorithm of the same asymptotic complexity with a more reasonable constant factor.)
But it’s still nice to have a proof that this lower bound is attainable in theory. Makes it easier to write down the theoretical lower bound for a wide variety of other algorithms.
If it could go lower than n log n, that would have huge implications for signal processing. Multiplication, convolution, and FFTs are all so similar to each other.
Conventionally n is the length of the input, which is generally logarithmic in the numerical value of the input for algorithms dealing in integers.
Moreover, analyzing Russian peasant multiplication as O(log N) implies treating addition as constant-time, which it certainly isn't. (This is intuitively clear if we try applying the algorithm with slightly large integers!)
It was the first multiplication algorithm faster than long multiplication known to humans. Before 1960, few people thought it was possible to do it faster than O(n^2). Even one of the best mathematician in the 20th century, Kolmogorov, believed the long multiplication is probably asymptotically optimal in 1955, and asked people to try proving this conjecture in a 1960 conference. And within a week, his student Karatsuba found this algorithm. It's also one of the earliest divide-and-conquer algorithms discovered in computer science. Since then, faster and faster algorithms have been discovered.
Although in practice, the ALU itself is almost never the bottleneck of computation on a modern computer. The setup cost of these algorithms is often greater, making them only useful for bignum (Karatsuba is only faster for number longer than a few hundreds of decimal digits?) It also explains why faster algorithms haven't been discovered in the last 2000 years.
The linked wikipedia article seems to contradict your assumption that Kolmogorov got upset: Kolmogorov was very excited about the discovery; he communicated it at the next meeting of the seminar, which was then terminated.
I took a Computer Architecture class for undergrad, and one of the pieces of coursework was to implement a Karatsuba's Algorithm in raw UAL Arm Assembly, using some weird non-standard (almost Chen-Ho, but not quite) decimal encoding of numbers devised by the instructor to "make things interesting". The idea was for it to support arbitrarily large numbers encoded using this format, as opposed to the considerably more trivial requirement that it just support stdint style integers.
It was considerably harder than it sounded (very few people in the class actually managed to get something working, since it was 'optional' but could still raise your grade), and I had daily nightmares about stack frame corruption throughout the project.
Python uses Karatsuba multiplication for its long integers. This works very well for not too big long integers but when they get huge using the GMPy module is much quicker.
The Fourier transform and its generalisations are so underappreciated. I've been trying to understand where the speed savings come from and suspect it has to do with coprime harmonics.
I hope someone smarter and more knowledgeable will correct my understanding, but as far as I can tell, the logN savings in harmonic analysis comes from the exponential decay of harmonics in any real ringing system (a passive resonant chamber), because you can avoid "counting"[^1] the energy of subsequent harmonics as precisely as the fundamental frequency.
An excitation frequency of 400Hz will produce resonant harmonics at 800Hz, 1200Hz, etc. at exponentially lower energies than the fundamental. So essentially, you can skip measuring the totatives of the frequency bin you care about.
For example, to find the cause of excitation at 9Hz, you only need to count the energy at 1Hz, 3Hz, 6Hz and 9Hz. You can ignore the totatives of 9 except for 1, which are {1,2,4,5,7,8}.
So if you detect 10J at 2Hz and 10J at 4Hz, you can be sure that there was an excitation at 4Hz as well as 2Hz. Euler's totient function is multiplicative so you win if you do this simultaneously for coprime frequencies.
This is not my area, so math or signal guys please tell me where I'm wrong.
[^1]: "counting" happens by cross-correlating the input with a reference signal, which in Fourier analysis is a cosine wave at the frequency bin.
Actually look at GMP or MPIR for the real dirt. There's a lot of detail hidden in that big-O notation, and IIRC Karatsuba is faster for certain ranges of integers.
yes, for smaller integers it is necessarily faster... there are higher order versions, the Toom-Cook algorithsm, which fill the gap between them too...
There are subtleties that make this more difficult than it sounds. For small N digits, you don't have to worry about rounding error on your FFTs, but as N gets large, you need to be careful.
If someone is interested in things like, they should also look up "Vedic Mathematics" (Vedic = Something from the Vedas), its Maths formulas for faster multiplication / addition / square roots etc.
[1] https://gmplib.org/list-archives/gmp-discuss/2003-January/00...
[2] https://gmplib.org/list-archives/gmp-discuss/2003-January/00...