Hacker News new | past | comments | ask | show | jobs | submit login
Linear Algebra Done Wrong (2004) (brown.edu)
75 points by sebg 87 days ago | hide | past | favorite | 59 comments



FYI, there is a newer version at https://sites.google.com/a/brown.edu/sergei-treil-homepage/l... . Google seems to never have learnt about this site, which is weird as it is on a Google server.


Something seems to be seriously wrong with google. By "something" I mean "a lot".

I know there are Google engineers here, possibly search. Do you guys not also feel it? I mean you dogfood, right? I know it is a hard problem but boy has it become harder to use, especially in the last 6 months.


Like.. the very fact I typed "javascript" or "css" or "html" means you probably don't need to apply your "did you mean" logic to my searches. If I'm searching for something highly technical, I absolutely DO NOT NEED your help, Google.


You’re not the type of users they care about.


Well, I write software for the web, and at this point I consider Google to be an entirely negative company.

So now, to the maximum extent possible, I work to undermine their interests and to not participate with their products or addons or browsers.

Great strategy on their part.


You've over simplified and created a major error. It's a common one though, so allow me to explain.

There is no single entity Google cares about. There is no "average user" (nor median). Trying to make the product best for an "average" user makes it bad for everyone. You have made the assumption that this solution space is smooth and relatively uniformly distributed.

Consider this: on a uniformly distribution, if you interpolate between two points, those interpolations are representative of the distribution. But if you do so on a gaussian distribution, this is not true. This is because the density of these distributions are not evenly distributed (this is what "uniform" in uniform distribution means). For more complicated distributions you need different interpolations. Many real world distributions are the agglomeration of varying distributions (you probably have clusters and pockets).

The distribution of searches (or customers) is neither normal nor uniform.

To understand some further complexity, it is important to remember that the burden of success is the requirement for further nuance. Pareto. It may take few resources to get to "80%"/"good enough" but significant to increase that another 10%. Think about a Taylor Series (Fourier Approximation, or pick your preferred example), the first order approximation will only take you so far. It may be good enough to start, but the need for higher order (and this more complex calculations) approximations are increasingly necessary for precision (non-linearly). So simplify where you can, but don't forget what you've done. If you do, you'll be stuck. Or worse, you will compound your error (also very common).


>probably don't need to apply your "did you mean" logic to my searches

I wish it had a yes/no prompt that you could click and when you click no it would refresh with the results you wanted, and possibly use those answers to update whatever algorithm does the suggestions.


With how much they track us you think that returning to the search page and selecting a different result would result in their prior being updated. You'd think that variations of the same search in a small time frame would be recognized as a signal of bad results.

I understand the problem is difficult, but it appears to me that the signals they have that can help are being ignored.


They may actually do that sort of improvement, and just be somewhat bad at it. At least if they let you check yes/no, you'd feel better about it, even if they ignored the answers.


I wish I could get it to respect advanced filters. I can understand how a problem I type in might be very similar to a simple one, but there needs to be a way to signal differentiation. I know they track me, and you're telling me that clicking on the first 5 links to only revisit the search in under a minute and try a new link is not a signal that I'm not getting what I want? Is visiting page 2 not further signaling?

I don't need a different variation of the same result, I need a different result. While the former can be certainly useful, it is absurd to think this is all there is.

If you ever search anything with any bit of nuance, you'll have experienced this. I'll give an example: try searching "out of distribution" and try to find a result that is not about ML based OOD detection. They do exist and you'll find some better queries, but that result still dominates (and frankly, they are poor results).

For our example query `statistics out of distribution -"machine learning"` has a top result of mathworks on neural networks and all results on the front page work. "statistics" is ignored before variations of "Machine learning." Top few results for `statistics out of distribution -"machine learning" -"deep learning" -"artificial intelligence" -"neural network"` returns to me an ICLR paper who's title that ends in "Deep Neural Networks" and many results are still ML based...

How do I tell Google (or any engine) that I want to exclude results!?

And before someone says GPT/Claude or some other LLM, the problem persists. Yes, I've paid. Yes, for the most advanced models. Derailing these tools from the wrong track is difficult. Even harder the longer the conversation goes (i.e. if you don't recognize you're headed in the wrong direction). The nuance matters.


"Linear Algebra Done Wrong site:sites.google.com" finds the page you linked.

https://www.math.brown.edu/streil/index.html redirects to https://sites.google.com/a/brown.edu/sergei-treil-homepage/h...

But OP's link, https://www.math.brown.edu/streil/papers/LADW/LADW.html does not redirect. So it seems like a problem that the site owner can fix.


  > "Linear Algebra Done Wrong site:sites.google.com" finds the page you linked.
I think you have the wrong framing. It isn't that you can't create a p̶r̶o̶m̶p̶t̶ query that yields the correct result, it is the ability to do this without knowing the answer a priori.

As in, if you didn't know a second newer version existed, would you find it? Certainly the new version should prioritize the older version.

FWIW, my query "Linear Algebra Done Wrong" yields the newer version (2017), but I've just interpreted OP's point as being illustrative and helpful to readers (and it looks like the link now does point towards the new version. Possibly this made my query work)


The newer version is 2021, not 2017, though the title on top of the site says 2017.


LOL well I guess point made. Thanks for the correction.


You can also get to the new site from one of the hits further down ( https://textbooks.aimath.org/textbooks/approved-textbooks/tr... ) if you pay attention. But it's far from the transparency I'd have expected googling an openly available and multiply linked website.


For a really intuitive introduction to linear algebra, I highly recommend Gilbert Strang's "Linear Algebra and Learning from Data": https://math.mit.edu/~gs/learningfromdata/



> Why should anyone read this book if it presents the subject in a wrong way? What is particularly done “wrong” in the book?

Does he ever answer this question? It's posed at the start of the book, but then immediately ignored.


The title is a jab at and a response to "Linear Algebra Done Right" (https://linear.axler.net) by Sheldon Axler.

Sheldon's book is more suitable for a second course in linear algebra. It's an excellent book, that my more advanced students have enjoyed.

Sheldon, however, approaches the subject with some rigid dogma: Determinants are evil. His view is they're not practical to compute at scale. It can be good to learn theory with one's determinant hand tied behind one's back, just as it can be good to learn "constructive" mathematics, but this should not be one's only approach.

I used to attend an academic sponsor's day at the math institute MSRI (now SLMath). Afternoons reached to keep us busy. One year Sheldon's wife demonstrated some academic teaching software, and my group was tasked to solve a linear algebra problem that Sheldon had provided. Unaware that everyone was watching us on the big screen, I solved it instantly, using determinants. I still don't understand Sheldon's longer solution.


His view is not just that they are not practical to compute at scale. (They actually are practical to compute. Just not with the formula. Use row reduction instead.)

His view is that they are presented as an arbitrary piece of magic that people really don't understand. And, exactly because they don't really understand the determinant, they wind up not properly understanding fundamental concepts in linear algebra. Such as Jordan normal form.

If your goal is actually to properly understand linear algebra, he's absolutely right. If your goal is to use linear algebra for a wide variety of practical purposes, though, determinants are often essential.


Since a significant amount of my knowledge/intuition about linear algebra comes from his book, I now feel not only that I don’t like determinants but that I’m allergic even to matrices. Everything seems so much nicer and friendlier if you think of linear maps between finite-dim vector spaces instead.

When I hear, for example, ‘upper triangular matrix’ I have to translate it in my head into something like ‘matrix representing a linear operator that preserves the standard flag’ in order to actually feel like I understand it.

Of course, I’m not a programmer or an engineer of any kind, so I have the luxury of not needing the computational efficiency.


What's a "standard flag"? Couldn't find anything about it...

I would think "upper triangular" would be a weird/bad notion to talk about in a basis-independent setting (because it depends on the basis).


A ‘flag’ is just a funny name for an increasing (and therefore increasing in dimension) sequence of subspaces. The ‘standard flag’ is the sequence spanned by the standard basis vectors: start with {0}, then things of the form (x,0,…), then (x,y,0,…), and so on. *

What I’m saying is that an upper triangular matrix preserves each of these subspaces because it sends each basis vector e_i into the span of the e_0, …, e_{i-1}.

You’re absolutely right that upper triangularity is basis-dependent and so is somewhat ‘weird’/‘evil’ (in fact, not even well-defined) as a purported property of maps rather than matrices. What I meant to say was ‘triangularisable’ by analogy with ‘diagonalisable’ — matrices which represent such a map in some basis. Given a linear operator, its matrix representation is triangular when expressed in a basis B if and only if it preserves the standard flag in basis B.

* https://en.wikipedia.org/wiki/Flag_(linear_algebra)


> I now feel not only that I don’t like determinants but that I’m allergic even to matrices

Determinants are totally basis-invariant, however, similar matrices have the same determinant.


Yeah, and I think that’s enough motivation for determinants being important beyond as a computational trick. I was being somewhat hyperbolic.

Seeing the determinant as the unique solution to the problem of assigning a scalar to each linear map/matrix in such a way that several natural axioms are satisfied is a very nice way to motivate it. Along with the fact that it, along with the trace, can be calculated from the eigenvectors (from which the basis-invariance is clear).


> Use row reduction instead

Or, if you're using exact arithmetic (e.g. cryptography), you should use something like the Bareiss algorithm - Gaussian elimination is only polynomial if you use floats: https://en.m.wikipedia.org/wiki/Bareiss_algorithm

The Leibniz formula is totally impractical for calculating determinants, but it is sometimes useful in proofs due to its symmetric form.

I guess if you just throw a definition of the determinant at people and leave it at that, I can understand that it doesn't make sense. But that's not how I was introduced to the concept (I learned about abstract vector spaces and transformations and how they are represented by matrices - and only then about determinants and their properties and different ways of computing them). That's why Axler's critique rings somewhat hollow to me - maybe it's different if you learn about linear algebra mostly in an engineering context as opposed to pure maths.

For theoretical purposes, determinants are important because they constitute a homomorphism between the general linear group and the multiplicative group of the underlying field, and the kernel of this homomorphism is the special linear group. This leads to some very short and elegant proofs.


Was V.I. Arnold wrong when he said everything a student needs to know about determinants is derived from geometric considerations? It seems true to me that, outside of advanced study linear or abstract algebra, this is the one perspective which can at least dispel the magic.


Yes, in real space the determinant is the signed volume of the image of the unit cube.

Volume is base x height. Height is a linear function in the vector that rises above the base; the usual properties of the determinant follow from this.


This is very informative. Thank you for the explanation.


Related:

Linear Algebra Done Wrong [pdf] - https://news.ycombinator.com/item?id=999494 - Dec 2009 (12 comments)


Why does linear algebra elicit so much confusion despite being an elementary topic. I think part of the problem is that the connection between matrices and matrix operations and their applications is not explained well enough. Linear algebra is typically one's first foray into abstract mathematics, and if taught poorly sets one up for failure down the road.


I took LinAlg the same semester I took Computer Graphics, back in 2001. The first half of LinAlg was all about solving equations, transformation matrices and that sort of stuff, and the first half CG was all 2D drawing (Bresenham and such). Second half of CG was all about OpenGL, and I was able to apply all the stuff I'd just learned in LinAlg.

The second half of LinAlg was a bunch of stuff about Eigenthis and Eigenthat, with no particular explanation as to what you'd use it for or why you would care. I pass the class with an A and was even recommended by the instructor to be work as a tutor for other students, but 23 years later I couldn't tell you the use of any of that stuff.

T-matrices, though - I've used that stuff my whole career, working in medical imaging and related fields. I still couldn't tell you what an Eigenvalue or Eigenvector is useful for, as I never had any reason to apply that stuff.

It's not particularly abstract if you can explain why and how this stuff is used - and I know it's heavily used in many fields - just not those that I've worked in.


Eigenvectors characterise a linear transformation by answering a simple question: What lines map to themselves after the transformation?

Say for example you rotate something in 3D. The rotation axis remains unchanged. That's an eigenvector. Or say you mirror something in 3D, then all the lines lying in the mirror plane remain unchanged (all eigenvectors with eigenvalue 1), and the line orthogonal to the mirror plane remains unchanged - or rather, flipped, so it's an eigenvector with eigenvalue -1.


If that's right that's very good to know, eigenvectors were just procedure following for me.


Yeah, eigenvectors and eigenvalues are abstract and best understood in terms of the linear mapping between vector spaces and change of basis, but change of basis for a vector space and writing a matrix in a new basis gets really complicated. Many people get lost there.


Because it's taught in the 1st year of most math courses and most people don't take it seriously and think that it's "easy 1st year math" when its actually the base for a whole lot of maths down the line.

I've actually stopped asking questions about Cayley-Hamilton and Jordan form (both covered in year 1 of any decent BS math course in Europe) when I used to do inteviews for trading/quant positions because so many people failed.


You can probably reduce it down to a single definition. As soon as matrix multiplication is introduced without careful motivation, almost everyone is lost. And those who aren’t should be.

It’s mostly because, as Axler explicitly tries to address, linear algebra is a subject that can be viewed through at least two different lenses: the intuitive, (possibly higher-dimensional) geometric lens, or the algorithmic and numerical lens. Of course they are equivalent, but almost every course in linear algebra teaches the second and barely even touches on the first. It’s like learning Euclid’s algorithm before you’ve learnt to count. No wonder everyone’s so confused.


> Why does linear algebra elicit so much confusion despite being an elementary topic

I'm not sure that linear algebra is an elementary topic, if the word "elementary" is as in math of elementary school. That said, I don't think there's much confusion about linear algebra either. It's more likely that linear algebra is substantially more abstract than high-school algebra, so many students have a hard time deeply understanding it, just like many students already bail out on high-school math. When I was a TA in college, I also observed that many students were not prepared for the fact that college-level math is fundamentally different from high-school ones. College-level maths has far less time for students to grasp the concepts through sheer brute-force practice. College-level maths requires students to focus on intuitive understanding of they key concepts so the students won't get bogged down by hundreds and hundreds of concepts or corollaries or theorems. Of course, high-school maths requires intuitive understanding too, but because high-school maths is so simple that many students get the intuition naturally so they are not aware of how important such intuitive understanding is.

This book used to help my students build intuitions: https://www.amazon.com/Algebra-Through-Geometry-Undergraduat.... It starts with 2D geometry to teach linear algebra, and then moves to 3D, and then 4D. The author also chooses to use calculations and constructions to prove most of the theorems instead of more abstract or algebraic approach.


Thinking back to high school and college, the biggest issue with I had math in both is how it's frequently taught in absence of practical examples. Speaking personally (though I believe there are many who feel similarly), there's a need to see real world examples to develop a true grasp on new concepts in any reasonable amount of time, for reasons related to both motivation and intuition.


> how it's frequently taught in absence of practical examples

Unfortunately this has been the debate for thousands of years. One of Euclid's students asked him what practical use geometry had. In response, Euclid instructed a servant to give the student a coin, saying, "He must make gain out of what he learns."

Legend aside, I'm actually surprised to see, many times actually, that people on HN criticized the "absence of practical examples". If we compare the textbooks written in the US and those in China or Europe, there's a sharp contrast. The textbooks from the US are thick, full of discussion of motivations and practical examples across multiple domains (Thomas' Calculus, for instance). In contrast, Chinese and European textbooks are terse and much thinner. They focus on proofs and derivations and have only sparse examples.

Personally, maths itself is practical enough. I'd even venture to say that those who can naturally progress to college-level maths should be able appreciate high-school maths for its own sake.


I agree with this in theory, but I don't necessarily need the "real world"-ness.

A lot of the stuff you learn initially, about systems of equations and how they relate to the matrix inverse, are very interesting because it's clear how this can be applied.

But as you move forward into vector bases and change of coordinates there's a very long dry spell that you sort of have to slog through until much later when you start to see how it is actually useful. I'm not sure how to fix this -- maybe take a step back from the symbolic and do some numeric computations because that's where they start becoming useful again.


Some of us do need the real world-ness though.

IMHO there should be two versions of linear algebra. One for computer science majors and one for mathematicians. I regularly run into stuff at work where I say to myself, "self, this is a linear algebra problem" and I have next to no idea how to transform the problem I have into a matrices or whatever.

But I can write a really stinkin' fast matrix multiplication algorithm. So there's that I guess.

Modern CPUs with big ass SIMD registers are incredibly fast at slogging through a linear algebra problem. Lots of incredibly intelligent people (ie, not me) spend an incredible amount of effort optimizing every last FLOP out of their BLAS of choice. For several years the only question Intel asked itself when designing the next CPU was, "How much faster can we make the SPECfp benchmark?" and it shows. Any time you can convert a problem using whatever ad-hoc algorithm you came up with into a linear algebra problem, you can get absurd speedups. But most programmers don't know how to do that, because most of their linear algebra class was spent proving that the only invertible idempotent nxn matrix is the identity matrix or whatever.

Discrete math has the same problem. When I took discrete math in college, the blurb in the course catalog promised applications to computer science. It turns out the course was just mucking through literally dozens of trivial proofs of trivial statements in basic number and set theory, and then they taught us how to add two binary numbers together. The chapters on graphs, trees, recursion, big-O notation and algorithm analysis, finite automata? Skipped 'em.


Yes I’m currently dealing with text that has a line “you will end up with ~2^32 equations and from there it’s just a trivial linear algebra problem” without further guidance (from the general number field seive).

I get that 2^32 simultaneous equations may be a straightforward linear algebra problem but I am now going deep to understand the exact mechanism to solve this.


one issue that comes to mind is the nature of the determinant. when one considers the determinant defined by the recursive definition, it seems like a highly contrived object that is difficult to work with (as it is from that definition!). avoiding that confusion requires that a lot more scaffolding be built (ala Axler in the "Done Right" book). either way you have some work: either to untangle the meaning of the weird determinant or get to the place where you can understand the determinant as the product of the eigenvalues.


Do you know why introductory textbooks don't define the determinant in terms of the exterior product? This is how some "real" mathematicians I've talked to define it. It also is more intuitive (in my opinion) to define determinants as "signed volumes" than some sum of products multiplied by signs of cycles.

The product of eigenvalues definition is also somewhat intuitive to me ("How much does the matrix scale vectors in each direction? Now multiply those numbers together."), but it's harder to motivate the fact that adding rows together doesn't change the determinant, which is kind of important to actually computing the determinant.


Really? You want to teach freshman people exterior products?


It's hard to make the connection between matrices and a linear mapping between vector spaces. A lot of the time students in a linear algebra classes do not have a great grasp of what a function is, so when talking about vector spaces and linear maps between them they're lost. Then connecting that to a matrix is hopeless.


Linear algebra is definitely not elementary. There are layers and layers of linear algebra, and depending on the audience, you have to select which topics to emphasize.


We seriously need better terminology, notation, and pedagogy when it comes to linear algebra. In 2024, such old-style text books just don't cut it anymore.


What, specifically, do you think should be done better? What style of pedagogy do you think would be more appropriate? What notation? What terminology?

Apart from "argument by year number", what's actually wrong with the book?


As far as I know, this notation and terminology is still very standard.

In some ways I like the idea of replacing traditional indexing with something more like the Einstein summation notation, and moving away from the arbitrary feeling that you get around how matrix multiplication is structured, and provide an immediate route into tensor theory.

But so many things that are very important in linear algebra, like matrix inverses, are awkward to express in this notation.


1. What's wrong with this specific book other than that it hasn't been updated in 7 years?

2. What are examples of better terminology, notation, or pedagogy that are improvements over this book? (Doesn't need to be total, could retain the same terminology and notation but just have better pedagogy for example)


I think interactive linear algebra could be superior to a pure textbook. So, a textbook with interactive visualisations that show a concept and allow manipulation. Something like this maybe: https://textbooks.math.gatech.edu/ila/


That does offer slightly better presentation for students with its interactive elements, but otherwise looks pretty standard for an undergrad linear algebra textbook in overall presentation. It has many sections that could use some interactive portion with none at all.

However, I'll give you that interactive texts (or texts with supplemental interactive material) are often better for some things (and some parts of linear algebra) than a plain dead tree format.


Yeah, I didn't mean this book specifically, rather the presentation as an example.


... and another beauty of a math book written in LaTeX is that you can fit a 286-page book into just 1.3 MB. Good luck doing that in MS Word.


After having my first Linear Algebra course I came across the online course called Linear Dynamical Systems by Prof Stephen Boyd of Convex Optimisation fame.

Every lecture was so eye opening. I couldn't believe that linear algebra could be taught in such a context with such a variety of application domains.

The lectures are all available online with the assignments : https://ee263.stanford.edu/archive/


Boyd has published a related book which is a bit more elementary but still great: https://web.stanford.edu/~boyd/vmls/


I hadn't known about this, so thank you! Steven Brunton also has a book that covers dynamical systems (among other things) which is linear algebra adjacent, and the expositions are great [1].

[1] https://www.amazon.com/Data-Driven-Science-Engineering-Learn...




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: