That was explanation from a perspective of someone acquainted with modern physics. As such, it will make sense to physicist, but no sense to most everyone else, including mathematicians who don’t know modern physics.
For example, in the beginning, author describes tensors as things behaving according to tensor transformation formula. This is already very much a physicist kind of thinking: it assumes that there is some object out there, and we’re trying to understand what it is in terms of how it behaves. It also uses the summation notation which is rather foreign to non-physicist mathematicians. Then, when it finally reaches the point where it is all related to tensors in TensorFlow sense, we find that there is no reference made to the transformation formula, purportedly so crucial to understanding tensors. How comes?
The solution here is quite simple: what author (and physicists) call tensors is not what TensorFlow (and mathematicians) call tensors. Instead, author describes what mathematicians call “a tensor bundle”, which is a correspondence that assigns each point of space a unique tensor. That’s where the transformation rule comes from: if we describe this mapping in terms of some coordinate system (as physicist universally do), the transformation rule tells you how to this description changes in terms of change of the coordinates. This setup, of course, has little to do with TensorFlow, because there is no space that its tensors are attached to, they are just standalone entities.
So what are the mathematician’s (and TensorFlow) tensors? They’re actually basically what the author says, after very confusing and irrelevant introduction talking about change of coordinates of underlying space — irrelevant, because TensorFlow tensors are not attached as a bundle to some space (manifold) as they are on physics, so no change of space coordinates ever happens. Roughly, tensors are a sort of universal objects representing multi linear maps: bilinear maps V x W -> R correspond canonically one-to-one to regular linear maps V (x) W -> R, where V (x) W is a vector space called tensor product of V and W, and tensors are simply vectors in this tensor product space.
Basically, the idea is to replace weird multi linear objects with normal linear objects (vectors), that we know how to deal with, using matrix multiplication and stuff. That’s all there is to it.
> Instead, author describes what mathematicians call “a tensor bundle”, which is a correspondence that assigns each point of space a unique tensor.
Technically, that’s a tensor field which is a section of the tensor bundle. Similarly, a vector field is a section of the tangent bundle (the collection of all the tangent spaces of the points on the manifold). A vector field is just a choice of a tangent vector for each point from that point’s tangent space.
> author describes tensors as things behaving according to tensor transformation formula
In grade school it drove me nuts when the homework required us to describe a word without using the word (or it’s Latinate siblings). And yet as an adult there are few enough weeks that go by where some grownup doesn’t try to pull that same trick.
If you think developers are guilty of circular logic, check out some of the math pages on Wikipedia. You can get lost in moments.
Speaking of math pages on Wikipedia ... and math text more generally
Is it just me or are we horrible at teaching advanced math? Where are the examples (with actual numbers)? Where is the motivation? Where are the pictures?
Randall Monroe has a comic about how most people need enough math to be able to handle a birthday dinner where the guests split the bill for the birthday boy/girl evenly and pay for their meals and tip separately.
That’s a pretty good bar and I wonder if we could just cut to that chase earlier. But I also believe that people need enough math to see when they’re being cheated, and I feel like you could just tell middle schoolers that and they would pay attention. Maybe even primary school.
You told Billy he could have three apples, and now there are two left. Did Billy take more apples than he should have?
It’s always how do you share your cookies fairly with your friends and if they’re my cookies why do I have to share them at all? Screw “fairly” I’m keeping the extras at least. That sort of sharing is a socially advanced concept they don’t entirely get just yet.
Except that humanity desperately needs a better understanding of probabilities and non-linear relationships. We don't use more than division because we haven't succeeded teaching more, not because nobody needs it.
Do we actually need to know how it works, or do we just need to really deeply understand that common sense is not scientific evidence and everything we actually care about can't be predicted just by "Being smart and thinking about it"?
Actually being able to do stuff with Bayes law by hand is going to be not only hard to teach, but probably impossible to remember for those of us who don't actually do math in real life. People forget stuff after a few months or years.
I highly doubt the average person is interested in checking the math on a science paper, so if you want the general public to understand statistics you... have to show us all a reason to, and also teach us all of the related skills needed to make it useful. Or else.... we will all just forget, even with the best teacher in the world.
Most of us aren't doing random game engines as a hobby project or testing things on bacteria cultures.
Maybe they should teach it in context of how to understand a scientific paper, since that's one of the more relevant things for non-pros. If you just teach statistics alone people will say
"Ok, now I know that it's easy to lie to yourself if you don't use any numbers but I don't have collections of large numbers of data points in my life to actually analyze"
"Thinking Fast and Slow" makes a quite extended argument that our brains have a faulty intuition about probabilities, and I nodded along thinking of all the bad decisions I've watched my teams make over the years (either noticed retrospectively, or presaged by myself or some other old hat).
If you have a way to fix this, you would be set for life, going around playing a sort of corduroy jacketed Robin Hood, keeping the rich from stealing from the poor.
Realistically, it is (somewhat) fixable in small contexts. I've worked (and continue to work) on teams that are somewhat decent at risks and probabilities, but it's definitely an exceptional experience.
I don't know how to widely teach that. But I'm not yet ready to give up and say "it can't be taught", because those folks on my team are the counterexamples.
In upper-level undergraduate math, I made a game of seeing how many pages I would go before seeing 7 printed anywhere. It was usually 10 pages, if I included the page numbers.
What are we calling advanced math? There comes a point where I personally find it much easier to avoid examples until I'm problem-solving, since otherwise I'll get stuck in a loop of wondering if the thing I noticed generalizes. Could just be that my working memory is poor, but when I see a real honest number I know I'm in for a grueling day.
Wikipedia is a terrible place to learn advanced mathematics, for the reasons you raise (and more). There are lots of terrific short books, and many terrific lectures online.
This is definitely a problem! Having a large set of interests and problems to draw examples and intuition from are how I deal with it.
I suspect this is why so many mathematicians are also into physics.
100%! For those of us who need to learn from practical examples through to generalized intuition maths can be really really hard to learn depending on the source. Wish I was one of those people who finds it easier to learn from abstract first through to implementations second.
It's not just you, we are horrible at teaching advanced math. However, the reason for it is that advanced math is, as far as we can tell, just really, really, really hard. It's not that mathematicians don't care about teaching others (they very much do, and they try their best to get their understanding across to others), or that Wikipedia authors are particularly bad at clear exposition (they are, if anything, above average). Quite simply, we know of no royal road to understanding mathematics, you have to put in many hours to bite it in very small pieces.
It has motivation, examples, and even actual numbers (though they're really just 0 and 1. most of the time). In my opinion, it's very good and clear exposition, for an encyclopedic article. However, I strongly suspect that people without enough mathematical knowledge (and "enough" in this case is something in the neighborhood of "enough to obtain an undergraduate degree in Mathematics") will simply not get anything about it beyond "it's about number of holes" (and that's not even remotely close to the whole picture: homology theories are important and useful in context of things with no "holes" to speak of). If you think otherwise, but not know what a quotient group is, you're just fooling yourself.
This is something I observe on HN a lot: people don't understand advanced mathematics, and are dumbfounded by the fact, trying to blame weird notation mathematicians insist on, or lack of motivation/examples/pictures etc. I never see people here do the same with advanced physics ("if the Standard Model is so standard, why can't they briefly and clearly describe what it is" is not something I ever see), molecular biology, or material science. People seem to know their limits and understand that really grokking these fields requires many years of deep study.
I think it's because many people on HN have good experience learning mathematics at school: it was something they always grasped really easily, and were easily able to figure out how to calculate derivatives, integrals, get matrices into normal forms etc. I don't want to rain on anyone's parade, because these things are still relatively difficult, and it does require more intellectual ability and effort that probably 3/4ths of the population aren't capable of. However, relative to advanced mathematics, undergraduate calculus is really rather trivial stuff.
Point is, if you don't understand modern advanced mathematics, you shouldn't get any more disappointed than you are about not being able to play violin. These things just don't come easy.
I'm working on a language MathLingua (www.mathlingua.org) whose goal is to precisely describe mathematics using a format that is easy to read and understand to help address ambiguity in mathematical texts written using natural language.
It is still a work in progress, but does it help address some of the problems you see in learning mathematics? Any feedback is greatly appreciated. Thanks.
It has nothing to do with tensor fields, uniform/constant tensors still obey the proper coordinate transformations, that's the defining property of any tensor. (With non-uniform tensor fields, covariant derivatives also pick up a correction, but that's a separate thing.)
TensorFlow "tensor"(and most other use of "tensor" in programmer jargon) is not a tensor at all, it's just a multidimensional array.
Mathematicians would disagree with you there. There are no coordinates to transform in an ordinary tensor space and therefore no way for a tensor to be affected by such a transformation.
Matrices (or linear transformations in general) are important examples of tensors. There's a nice adjunction between tensor spaces A(x)B and the space of linear transformations B=>C given by:
Hom(A(x)B, C) = Hom(A, B=>C)
In the case of Tensorflow I think they do actually still talk about linear transformations of some kind so it's perfectly fine to call them tensors.
Tensors are introduced by physicists to ensure various physical quantities (which involve coordinates and their derivatives) do not depend on the arbitrarily chosen coordinate system. This is ensured through the transformation properties of tensors.
The name tensor itself comes from the theory of elasticity, Cauchy stress tensor, which BTW is uniform in many practical cases, and obeys the following tensor transformation rule:
Matrices are not examples of tensors. Matrices can be used for representation of tensors, in which case tensor product becomes Kronecker product, but matrices in general don't have to represent tensors. You can put anything, including your favorite colors or a list of random numbers, in a matrix, and it won't be a tensor in general, not unless it must transform like a tensor under coordinate system changes.
Similarly, TensorFlow "tensor" is just a multidimensional data array, with no transformation rules enforced on it, and therefore is not a tensor.
i did a whole phd-level course (a long time ago) in deformable materials, which was entirely based on tensors, and i still don’t know how to differentiate one from vectors/matrices. even the idea that tensors must obey coordinate transforms doesn’t really do it, since the practical applications of vectors/matrices do so as well.
it’s like some people invented a new word and won’t tell you what it actually means in sufficient detail to differentiate it from all the other words you know. so you keep using it with others in the hopes that contextual information will finally make it clear. one day.
I just taught a module on Tensors in one of my physics courses. The mistake lots of people make is not show examples of matrices that are not tensors. But this is really difficult to do in physics courses because all physics matrix-like-objects must be tensors. Any theory that has non-tensor like objects in it will necessarily fail as soon as you change your coordinate system.
Thankfully, there is a great historical example of this. The electric field vector \vec{E} = (E_x, E_y, E_z), is not a tensor. It doesn't obey the tensor-transformation law. Similarly, the magnetic field vector is not a tensor. These are matrices, but not tensors.
As you know the Electromagnetic tensor [1] is the tensor that correctly transforms under coordinate transformation, and hence allows different observers to agree with each other.
thanks for the examples. i vaguely recognize the term 'electromagnetic tensor', but i have to admit i didn't know that it was special in that way. i can barely spell 'tensor' at this point.
ps - one thing that always annoyed me was the limitation of linearity in so many of these models (which i totally understand why, but still). all the interesting real-world stuff happens non-linearly...
Indeed it is annoying. But the models are linear not because physicists mistakenly believe that the world is linear, but because in most cases linear models are the only ones that one can solve to get qualitative predictions out. Non-linear models can be constructed as well, and then numerically solved on computers to get exact answers. But a physicist is one who understand the essential qualitative features of the world, rather than one who can compute understanding-free numerical answers.
In the words of Dirac, "I consider that I understand an equation when I can predict the properties of its solutions, without actually solving it." This usually only works if your equations are linear.
yah, as an engineer[0], i totally get the solvability angle, and even the physicist's core desire to be able to test (and predict via) the math rather than the physical manifestations (which may be impossible to test directly), but i'm eager to see us advance deeper into the non-linear, since that's where it gets really interesting. like, how do proteins really work? or multi-body energy fields? we're in the infancy of really understanding all this stuff. the future is stochastic and non-linear. in a thousand years, people might look back with amusement on how ignorant we were with our puny little linear models and deterministic computers. =)
[0]: but at this point, not really. even in grad school, i only did linear modeling, and relatively rudimentary ones, at that.
Interesting, you use the word matrix differently than me. The way I use it, a 2d array isn't necessarily a matrix. It's only a matrix if it represents a linear map between vector spaces with respect to chosen bases. Then again, I'm not much of a programmer, but I've taught linear algebra a few times. My head is just in a different place I guess.
I'm not sure I buy this after you just used an example of a 2d array of you're favorite colors just a few minutes ago. Maybe I'm missing something. What kind of linear transformation does that represent, and between what vector spaces?
You have a list of labels you've conveniently decided to conflate with numbers. It's fine to use an enum for your data, but no, this data structure is not meant to convey a transformation.
You bring up a good point though, if this were meant to be a transformation, then we're talking about modules (Z/2^8Z being the underlying ring) and not vector spaces, which is fine. I was needlessly narrow when I said "vector spaces" earlier.
You can have matrices of anything including rings, quaternions, octonions, dual numbers, matrices, vectors, bras, kets, etc etc, as long as you can multiply and add those objects. If you prefer real numbers, use the corresponding wavelength of the light, or voltage values in a photosensor, it doesn't matter. It still is a matrix.
Why do you mention things like data structures, enums and labels? This is math, not C++.
Anyway, the point is, any arbitrary 2D arrangement of numbers can be a valid matrix, whatever those numbers may represent.
Sure, I can get rid of the word enum. The symbols in the example are numerals for sure, but you explicitly say they're representing colors. I don't know how to multiply colors. What I'm saying is you merely have labels there, the numbers those numerals usually represent have nothing to do with the explicitly stated meaning. If I saw that array without context, yeah sure, I'd assume it's a matrix. But there is context that tells me interpreting this as a transformation makes no sense.
I'm genuinely claiming they aren't numbers due to the explicit context. The entries are colors with numerals as labels.
Now, you've said something slightly different where I think we can agree. If the entries are from a set closed under a notion of multiplication and addition, then it represents a matrix even if it isn't being used in that way. But I still say that if you're merely using the array as a place to keep some data then I won't be using the word matrix, I'll just call it an array.
Anyway. I started this by saying it was interesting that we use the word matrix slightly differently. I still think that's interesting and think it's totally fine if you want to call an array a matrix even if the entries don't come from something where a transformation makes sense. I wouldn't "correct" someone's usage. I just think it's interesting how we use it different. Anyway, I feel like you just think I'm dumb, so I'm ending this here. You might not, but it's hard to read the vibe via text. If you have something to add you think will get me to change my word choices, feel free to respond and I'll read it, but I'm not replying on this thread anymore.
> FYI, matrices transform between vectors within a vector space.
Square ones do, but m x n ones represent linear maps from an n-dimensional to m-dimensional vector space (over the/a field containing the elements of the matrix).
There are no coordinates to transform in an ordinary tensor space and therefore no way for a tensor to be affected by such a transformation.
Sure there are: Any basis of the underlying vector space(s) induces a basis of the tensor space. Components respective to some basis are coordinates. You can then investigate what happens to the induced basis (or rather, the respective components) under a basis transformation of the underlying vector space(s), which is where the "physicist's" definition of tensors originates.
The components of a vector aren't the same as the coordinates physicists talk about when they're dealing with tensors. The components would be something like the value of the magnetic potential, or the local wind speed. The coordinates would be the location where that particular vector is 'anchored'.
A change of coordinates does indeed induce a change of basis, but a change of basis isn't really a change of coordinates. And strictly speaking some vector spaces don't really have an obvious basis (without invoking choice), so having a basis be a prerequisite for the definition is not ideal.
The whole requirement that a tensor is 'something that transforms like [...] under a coordinate transformation' is just how physicists have chosen to phrase that a vector bundle is only well defined if it's definition isn't dependent on some arbitrary choice of coordinates. In my opinion this requirement is more easily apparent in the mathematical definition where there is no choice of coordinates in the first place, rather than the physicists way of working with some choice of coordinates and checking how things transform.
I'm aware. Though if we want to be more precise, that's about tensor fields, where the basis transformations of the underlying vector bundles (the tangent and cotangent bundle) are in turn induced by coordinate transformations of the base manifold.
However, physicists get introduced to tensors far earlier than any excursions into differential geometry when discussing rigid bodies.
The terms co- and contravariant make sense on a purely algebraic basis, with components of tensors transforming 'the same as' or 'opposite to' the basis vectors. That the basis transformation is induced by transformations of some base manifold is incidental.
Exactly. The fact that the bases are related to coordinates on the manifold is a property of differential geometry but the laws for transformation between bases are more general.
The post was also a poor explanation for someone doing modern physics. [edit: not true actually I should have read the rest of the post - it’s a good post]
Wald's approach in General Relativity is much better - he treats Tensors as a multilinear map from vectors and dual vectors to scalars.
He then derives the underlying coordinate transformaton rules, for the vector spaces used in differential geometry. But
That’s the approach I used as well in the second half of the article - I just mentioned the transformation law in the beginning since that’s what most physics students encounter first.
Most of the article tries to provide some intuition behind why multilinear maps, which sound like a fairly abstract concept, might be relevant in physics. The key link being the importance of coordinate invariance.
I didn’t go into deriving the coordinate transforms from the multilinear map definition as I didn’t feel that it’d provide much better intuition, but I did mention the equivalence near the end.
Yeah sorry you’re right - I should have read the rest of your post, which is excellent and describes precisely why the coordinates/transformations focused definition is bad for one’s intuition.
> author describes tensors as things behaving according to tensor transformation formula
Yeah, the idea that there are pre-existing things that we're trying to describe is somewhat weird to me when we're trying to come up with a definition of a tensor. The whole point of mathematics is that you come up with the definitions and theorems fall out.
In particular, this comment is funny and speaks to some difference in how I and the author view what we're doing when defining a tensor:
> But why that specific transformation law - why must tensors transform in that way in order to preserve whatever object the tensor represents?
Because we defined it like that! When you make the definition "a tensor is a thing that follows X laws", you don't get to ask why, you just defined it!
Just a funny bit of phrasing, I get what is meant :)
It's presented that way in textbooks because it's way easier to learn it that way (or it's Stockholm syndrome, which I won't deny is possible). The motivated way would require way, way, way more background knowledge.
>[the explanation in the OP] will make sense to physicist, but no sense to most everyone else, including mathematicians
And then went on to describe tensors in a way that is unfriendly to non mathematicians by saying
> tensors are a sort of universal objects representing multi linear maps: bilinear maps V x W -> R correspond canonically one-to-one to regular linear maps V (x) W -> R, where V (x) W is a vector space called tensor product of V and W, and tensors are simply vectors in this tensor product space.
Engineering usage seems to match the physics usage. In classic engineering fashion however we were always taught just to 'plug them in' without learning all the minutia that go with them.
For example the stress and strain calculations which are used for calculating Deformation (Say if you were rolling a sheet of steel in a mill) makes use of tensors and also something called an "Invariant" I assume this also comes from Physics/Mathematics world.
Even as a physicist I found it highly confusing when I got told in physics classes that a tensor is "just a thing (or object) that behaves like so under coordinate transformation". Like, what do you mean by "thing"? I have no intuition to this yet, I need it concise definitions! Fortunately I took a differential geometry class at the same time, which was really helpful.
For example, in the beginning, author describes tensors as things behaving according to tensor transformation formula. This is already very much a physicist kind of thinking: it assumes that there is some object out there, and we’re trying to understand what it is in terms of how it behaves. It also uses the summation notation which is rather foreign to non-physicist mathematicians. Then, when it finally reaches the point where it is all related to tensors in TensorFlow sense, we find that there is no reference made to the transformation formula, purportedly so crucial to understanding tensors. How comes?
The solution here is quite simple: what author (and physicists) call tensors is not what TensorFlow (and mathematicians) call tensors. Instead, author describes what mathematicians call “a tensor bundle”, which is a correspondence that assigns each point of space a unique tensor. That’s where the transformation rule comes from: if we describe this mapping in terms of some coordinate system (as physicist universally do), the transformation rule tells you how to this description changes in terms of change of the coordinates. This setup, of course, has little to do with TensorFlow, because there is no space that its tensors are attached to, they are just standalone entities.
So what are the mathematician’s (and TensorFlow) tensors? They’re actually basically what the author says, after very confusing and irrelevant introduction talking about change of coordinates of underlying space — irrelevant, because TensorFlow tensors are not attached as a bundle to some space (manifold) as they are on physics, so no change of space coordinates ever happens. Roughly, tensors are a sort of universal objects representing multi linear maps: bilinear maps V x W -> R correspond canonically one-to-one to regular linear maps V (x) W -> R, where V (x) W is a vector space called tensor product of V and W, and tensors are simply vectors in this tensor product space.
Basically, the idea is to replace weird multi linear objects with normal linear objects (vectors), that we know how to deal with, using matrix multiplication and stuff. That’s all there is to it.