Hacker News new | past | comments | ask | show | jobs | submit login
What Is a Schur Decomposition? (2022) (nhigham.com)
90 points by vector_spaces 10 months ago | hide | past | favorite | 27 comments



For large and sparse matrices you can use the (restarted) Arnoldi method to compute a partial Schur decomposition AQ=QR where Q is tall and skinny and R has a few dominant eigenvalues on the diagonal (i.e. eigenvalues on the boundary of the convex hull).

MATLAB uses ARPACK's implementation of this when you call `eigs`

I wrote my own implementation ArnoldiMethod.jl in julia, which unlike MATLAB/ARPACK supports arbitrary number types, and also should be more stable in general, and equally fast.

[1] https://github.com/JuliaLinearAlgebra/ArnoldiMethod.jl


I guess it is a rite of passage to rewrite it. I'm doing it for SciPy too together with Propack in [1]. Somebody already mentioned your repo. Thank you for your efforts.

[1]: https://github.com/scipy/scipy/issues/18566


That is an ambitious goal in that issue. What's the reason to get rid of Fortran?


Basically lack of expertise among maintainers, f77 codebase, and neverending packaging/distribution problems.


Why should it be more stable?


All Arnoldi methods basically repeatedly grow and shrink a search space until they contain good approximations to Schur vectors. Shrinking (=restart) is necessary to keep constant memory requirements, as well as performance.

Restart is somewhat non-trivial because it has to preserve the Arnoldi relation. ARPACK uses Implicit Restart, which is a very beautiful trick based on "perfect shifts": a single iteration of the shifted QR algorithm where the shift is taken as an exact eigenvalue is a way to reorder basis vectors (moving vectors you want to remove to the back, so you can truncate them) while preserving the Arnoldi relation. However, perfect shifts in the QR algorithm are good on paper but can lead to arbitrary loss of precision in finite arithmetic, especially with nearly repeated eigenvalues.

I think newer implementations (like SLEPc) are often based on Stewart's Krylov-Schur paper. It's not as pretty but certainly more stable.

In short: give ARPACK a matrix with a cluster of almost repeated eigenvalues at the boundary of the convex hull, and it tends to fail.


Thanks, very helpful. I’ve used ARPACK extensively and have had no complaints with it, but I haven’t tried this particularly pathological case.


It is a terrible shame that this excellent "What is .." series is concluded, Higham died earlier this year aged 62.


Oh no! I had no idea! I met Higham at a conference many years ago, and the lecture he gave was inspiring!

From https://ilasic.org/ilas-net-archive-2024/

  2024 January 22
  ILAS-NET Message no. 2515
  CONTRIBUTED ANNOUNCEMENT FROM:   Francoise Tisseur and Des Higham
  SUBJECT: Nick Higham (1961--2024)
With great sadness we report that Nick Higham, Royal Society Research Professor and Richardson Professor of Applied Mathematics at the University of Manchester, passed away on January 20, 2024, at the age of 62 after an 18 month struggle with a form of blood cancer. An obituary describing Nick's research and leadership contributions will appear in SIAM News in due course.

Francoise Tisseur and Des Higham


Wow, apparently published his last post, "What Is the Spectral Radius of a Matrix?" just 8 days before his passing. Legend.

https://nhigham.com/2024/01/


It's chilling to see the blog archive end at "January 2024"


Remember: if you're not certain that your matrix meets the prerequisites for applying the Schur decomposition, you can always apply the Unschur decomposition instead.


Found the dad :)


The undad


Bravo


explain matrix functions in more detail and how computing a matrix function on an upper triangular matrix is an advantage.

how is this a fundamental tenet of numerical linear algebra?

also, if I just want the Eigendecomposition, why do I need schur decomposition?

The motivation needs more explanation.


Matrix functions (at least ones I learned about way back when) are Taylor series representations of a function with the matrix plugged in. For example:

exp(A) = I + A + A^2/2 + ... + A^n/n!

For matrices, you can show these end up as finite series (i.e., it can be written with the highest n being the dimension of the matrix)

The A^n factors can be computed with fewer flops than dense matrices, I believe. The "Q's" don't involve little work since they're "unitary" and satisfy QQ = I, so that for example

A^2 = (QTQ)^2 = (QTQQT) = QT^2Q*

That's where the f(A)= Qf(T)Q* part comes from.

It sounds to me like he's quoting what someone described as a "fundamental tenet of numerical linear algebra" as "anything Jordan decomposition can do, Schur can do better". He doesn't seem to be defending "this is a fundamental tenet", but is saying matrix functions illustrate the "Schur can do better" concept. Both Jordan decomposition and Schur are QTQ* decompositions, so I think he's justifying "Schur can do better" with his comments on poorer stability of Jordan form (i.e., the matrix function property depends on A = QTQ*, which both cases satisfy, but Schur has better numerical properties).

I don't think the post said you need the Schur decomposition if you want an eigendecomposition (I'm taking to mean the full eigenvalue + eigenvector pair). It just pointed out that the diagonals are eigenvalues, the first column of Q will be have the eigenvector for the first eigenvalue, and you can obtain eigenvectors from the remaining info if you want it. I'm not sure what the typical approach is for doing eigendecompositions, it might not directly involve Schur decompositions at all.

So I agree partially that you could add a little motivation to make it more self contained, but also agree it's probably aimed at a less general audience, like students who'd have more context around some stuff where the details are light.


> ...he's quoting what someone described as a "fundamental tenet of numerical linear algebra" as "anything Jordan decomposition can do, Schur can do better"...

I suspect a reason to keep this little slogan in mind is that the Jordan decomposition/Jordan canonical form is taught as a tool in various classroom settings. I first learned it in studying linear feedback systems of the form:

    x(t+1) = A x(t) + u(t)
where u(t) is a system input. These are used in control systems, audio filters, etc.

In this case, the Jordan canonical form of the nXn matrix A has many properties that explain intuitively what the system is doing.

For instance, if there are two sub-blocks in the Jordan decomposition, these can be interpreted as two smaller non-interacting feedback systems. So in that case the original nXn system is equivalent to two smaller systems.

Anyway, because the Jordan canonical form is remembered as a relevant toolbox item, sometimes people try to use it for numerical problems as well -- e.g., when the "A" above is estimated from data.

And it doesn't work well for that purpose because the decomposition is very unstable and therefore the above interpretations are similarly shaky.


> For matrices, you can show these end up as finite series (i.e., it can be written with the highest n being the dimension of the matrix)

I don't think is quite accurate. It sounds like you may be referring to the Cayley-Hamilton theorem which says that A^n can be expressed as a linear combination of A^0, A^1,... A^n-1. But this is not the same as saying that the exponential taylor series is finite. Any 1x1 matrix is a counterexample. Another counter example is a diagonal matrix. It's exponential is the exponential of each diagonal element, but each of those requires an infinite series.

The statement is true for a certain class of matrices called nilpotent matrices. A nilpotent matrix has the property that for some k, A^k = 0. An example is

  A = [ 0, 1 ]
      [0   0 ]
In this case A * A =0, so the e^A clearly is a finite series.


You're right, I did mistakenly mix a few concepts together. The series representation for the function is infinite, like you say. What I was misapplying was the idea that a convergent infinite sum of matrix products (c0I + c1A + ... + cnA^n + ...) can be written as finite polynomial, which relies on Cayley-Hamilton like you say. So f(A) for any particular choice of A, can be written as a finite polynomial, but the coefficients of that polynomial will change as A does.

So writing

exp(A) = I + A + A^2/2 + ... + A^n/n!

like I did isn't correct. The concept I was thinking of was a Taylor series like this

exp(A) = I + A + A^2/2 + ... + A^n/n! + ... (to infinity)

has an equivalent finite polynomial like:

exp(A) = c0I + c1A + ... +c_{n-1}*A^(n-1)

(usually expressed as order n-1, not n like I was said earlier). The c0, c1, ... c_{n-1} will depend on A, so not as generally useful as what I was misremembering. It does let you build representations for things like exp(tA) (where t is a scalar multiple) in terms of a finite polynomial (the terms depending on t, like the "exp(lambda*t)" in your diagonal matrix example, end up in the c0, c1, ..., c_{n-1} coefficients).

Thanks for the correction and sorry if my garbled attempt at explaining confused anyone. The point about matrix functions being based on series representations is independent of those mistakes.


> For matrices, you can show these end up as finite series (i.e., it can be written with the highest n being the dimension of the matrix)

Can you clarify what you mean here? Don't we want the function to e.g. agree with the real-valued one for 1*1 matrices?


My memory is fuzzy on the details, but there's a way to show that if you have a series representation of a function that equals that converges to a=that function (i.e., an infinite sums of x^0, x^1, x^2, ..., x^n, ... with n going to infinity that equals f(x) - like the Taylor series of an exponential), that when you apply that to square matrices to represent a matrix function, there's an equivalent representation as a finite matrix polynomial. You can take the highest power of the equivalent finite polynomial as low as one less than the order of the matrix (so a matrix function of NxN matrix can be written as a N-1 degree matrix polynomial).

In other words, for a square NxN matrix A and functions satisfying the appropriate conditions, this convergent infinite sum of increasing powers of A:

c0 + c1A + c2A^2 + ... + cnA^n + ...

can be written as the n-1 degree matrix polynomial:

b0 + b1A + b2A^2 + .... + b_{n-1} A^(n-1)

The coefficients b0, ..., b_{n-1} are different than the coefficients c0, c1, ... for the infinite series, but they evaluate to the same thing.

Critical to the point you're making, the b0, b1, ..., b_{n-1} are different for differing choices of A, because the bn coefficients depend on the eigenvalues of the matrix. So you really have b0(A) + b1(A) * A, etc. The coefficients of the infinite series (c0, c1, ...) do not have this dependence on A.

That makes the 1x1 case tautological, because

b0(A) = f(A)

That is, the finite series representation is the value of the function at its argument.

There's some info on some of that here:

https://en.wikipedia.org/wiki/Cayley%E2%80%93Hamilton_theore...


Ok, now I see what you are saying. Thanks!


From Wikipedia[1]:

When X is an n×n diagonal matrix then exp(X) will be an n×n diagonal matrix with each diagonal element equal to the ordinary exponential applied to the corresponding diagonal element of X.

Perhaps that's what GP meant?

[1]: https://en.wikipedia.org/wiki/Matrix_exponential


"why Schur vectors": stability + they're always computed when you compute eigenvectors.

For non-symmetric matrices, the standard approach to compute an eigen decomposition is the QR algorithm, which is an iterative method that converges to a Schur decomposition AQ=QR. So, it's always cheaper to compute only Schur vectors. Eigenvectors are obtained by computing the eigen decomposition RY = YΛ (easy for R diagonal): A(QY) = (QY)Λ is the eigen decomposition of A.

The QR algorithm works well because it uses stable operations throughout: householder reflections and given's rotations. Those preserve norm, so they don't amplify rounding errors.

The Schur decomposition itself is useful for the same reason: Schur vectors are an orthonormal basis for eigenvectors. If you need to project something onto eigenvectors, it's more stable to work with Schur vectors than eigenvectors.

The last step RY = YΛ is done implicitly, with a backwards solve involving R. If that matrix is badly conditioned, the eigenvectors have large errors.


Well, either it needs more explanation, or the audience is people with certain prerequisite knowledge.


This needs a "visual explanation" treatment. The concepts are graspable by the average HN reader, but this presentation is just difficult to approach.

I tried using ChatGPT to decompose a lot of the terms, which actually helps, but it's hard to vouch for its accuracy: https://chat.openai.com/share/709f61b3-b4cb-48df-b97f-fa6877...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: