The problem with the algorithms faster than conventional mm is, that they are defined on mathematical numbers.
E.g. strassens-algorithms assumes something like:
(A+B)C - BC = AC
With the convential floating point representation this becomes a problem if B >> A because of rounding errors.
Most algorithms faster than Strassen assume:
(( fA + B)C - BC)/f = AC
and
(fA + B)C = BC +fAC
f is then choosen in such a way, that fAC is neglegible. But if that is the case fA is much smaller than B and we automatically get numerical problems.
There is a procedure to remove the fAC terms (which would imply that f could be choosen close to 1) which armortizes over enough recursions. But for Coppersmith-Winograd (and derivatives like Le-Galls algorithm). We need to divide the matrix in each recursion step into enough submatrices that the indices can be treated heuristically(lets say a division of each index in each iteration in 100 subindices). This would imply the sides of each matrix must be at least of size 100^n_recusion. We won't need to multiply matrices of that size.
E.g. strassens-algorithms assumes something like:
(A+B)C - BC = AC
With the convential floating point representation this becomes a problem if B >> A because of rounding errors.
Most algorithms faster than Strassen assume:
(( fA + B)C - BC)/f = AC
and
(fA + B)C = BC +fAC
f is then choosen in such a way, that fAC is neglegible. But if that is the case fA is much smaller than B and we automatically get numerical problems.
There is a procedure to remove the fAC terms (which would imply that f could be choosen close to 1) which armortizes over enough recursions. But for Coppersmith-Winograd (and derivatives like Le-Galls algorithm). We need to divide the matrix in each recursion step into enough submatrices that the indices can be treated heuristically(lets say a division of each index in each iteration in 100 subindices). This would imply the sides of each matrix must be at least of size 100^n_recusion. We won't need to multiply matrices of that size.