These are just issues that someone unfamiliar with a field would face. None of them are problems for those of us in the field.
First, the expectation thing. He's using a special case, E(X), and complaining that the more general case doesn't follow the general case. It's like saying "Well the plural of mouse is mice but the plural of house isn't hice!". The general definition of expectation (for a discrete probability) is
E(f(x)) = sum f(x_i)* p(x_i)
If you start with this general definition, both E(X) and E(X^2) are perfectly natural. The author's error of starting with the special case in no way implies an issue with notation.
And how is the fact that Wikipedia is inconsistent between E(X) and E[X] in any way mathematical notation's fault? If you read a novel that starts using ' for quotes and switches to ", that's an issue with the novel (assuming its not stylistic) and not an issue with the typography in general.
> These are just issues that someone unfamiliar with a field would face. None of them are problems for those of us in the field.
True, but it raises a "barrier to entry" (on purpose, or by mistake) because it is almost impossible to enter the field without a supervisor/colleges that provide the "semi-supervision" needed to learn the notation.
I can understand how those in the field think that's a good thing, but for the rest of humanity it probably isn't... Look e.g. what has happened in academic operating system research: innovation has moved from Berkeley and Bell Labs to the Linux kernel mailing list. Are academic OS researchers better off because of it? Probably not. Is the world better off? You bet!
Unfamiliar is a weasel word. It's a No-True-Scottsman. And I don't mean, that it is entirely wrong, I am saying you are unjustly putting a limit to whom you deem worthy for the field. There is no need for imprecision other than speed breaking things while you go.
> "Well the plural of mouse is mice but the plural of house isn't hice!"
Are irregular word forms necessary or essential? Probably, but I doubt you could explain why. Hence you are not qualified to ridicule anyone. It's a perfectly valid complaint, IMHO, but probably only loosely related to the math example, which I can't be bothered to follow at the moment.
I have taken ten years of math in school and then calculus, discrete math, linear algebra. That's all useless if I want to follow math in a simple research paper because they're using notation common in THAT field and it has nothing to do with the notation in another math field.
And it's all V hat superscript pi subscript h. It's not like code, where I get descriptive variable names. And you thought pi meant pi? No, it means policy in THIS context.
Reading a "simple" research paper requires a different approach compared to reading a newspaper or a blog. I hope the following explanation helps.
The notation actually helps to keep things simple. I think of it as a kind of metalinguistic programming [1] where a notation is introduced which then makes the important parts easier to understand.
I am not mathematically inclined but I have to read papers containing maths quite a lot of the time. I tend to read them 3 times.
The first time, I tend to skip the equations altogether and just get a feeling for the paper - what is it about, is it useful for me to read?
The second time I have a pen and a highlighter where I actually label the mathematical symbols with arrows and words (using the textual descriptions). I also highlight important sentences. I think of this stage as trying to make the paper as clear as possible for later reading.
In the third stage I am trying to understand the paper as a whole - something it seems you try to do on the first read, I am familiar with the frustration because this is what I used to do.
I quite enjoy reading papers now and I have more respect for the notation.
> The second time I have a pen and a highlighter where I actually label the mathematical symbols with arrows and words (using the textual descriptions).
Gee, it's almost as if we should be writing the words out instead of writing one letter variable names.
Funny how programmers found this to be good practice and mathematicians still write with a notation that's purposefully unreadable.
I've taken years of English Studies in school, then German, Greek, Russian. That's all useless if I want to read a French book, because they are using a language common in THAT country, and it has nothing to do with the language in another country.
That's not a direct equivalent, but it is close; somewhat equivalent but distinct notations arise in Math and Physics because authors weren't working together (much like natural languages). But otherwise it is "turtles all the way down" - it can only be "simple" or "needlessly complicated" if you assume something about the reader's knowledge.
All of
integrate(f(x)dx,dx)
sum f(x_i) for i=1..n
f(x_1) + f(x_2) + .... + f(x_3)
Are essentially equivalent to a mathematician, often with a preference for the first, whereas someone unfamiliar may claim that only the 3rd is clear and the others are unreadable. A friend of mine was in a classroom where the lecturer started with the first form, when to the 2nd and 3rd over class objections, and finally switching to something like "our function result at the first data point, added to our function result at the second data point, ....". This was an OR class for students pursuing an MBA.
This sounds so much like something the greybeards used to say in the 80s: "If it was hard to write, it should be hard to understand"
Basically, it's not the fault of our systems; it's the user's fault. Once he learns the arcane incantations, he'll understand why our way is the better way.
Computer UX has finally progressed beyond this arrogance. Why not math?
The reason is that difficulty of mathematics lies not in notation - it is in ideas and techniques. There is no doubt that a good modern UX complete with graphics, animation, and audio would facilitate understanding of the ideas, but, just as in programming, the need for textual notation cannot be overcome. Category Theory is an interesting example: while a lot of reasoning in it is done by "diagram chasing", if you look at a book on this theory, you will find more textual proofs and formulas than diagrams. Even more strikingly, same is true for topology, differential geometry etc.
Years of studying what, though? Calculus, most of linear algebra, discrete math are all calculation based. Math papers are proof based. If you spend years studying proof based math (analysis, algebra, topology, and so on) then you'll be familiar enough with proof based math to understand it.
It's like saying "why isn't spending years studying spelling not sufficient to understand Ulysses?"
We spend years learning "spelling", because the mathematical equivalent (arithmetic, up through calculus, linear algebra, differential equations, discrete math, etc) is incredibly valuable on its own—and that that value doesn't really require familiarity with the proof-based foundations underpinning it. Learning real analysis is valuable if you're going to be a mathematician, but as an engineer it won't help you get more out of calculus (at least, not compared to the cost of learning it).
I believe you have adopted your opinion of what should go into the first set of brackets of yours.
An engineer is expected to allot too much time to math to turn out unfamiliar with it. I think we should spend less effort on practicing cleaning fish and more on learning to forage for fish.
> I believe you have adopted your opinion of what should go into the first set of brackets of yours.
I'm not really sure what you mean by that.
Certainly I'm not of the opinion that proof-based math is useless; I studied math in grad school, and stayed as far away from anything applied as I could. But I don't think it offers much value to engineers or scientists. Let mathematicians establish the foundations, and let scientists and engineers build on top of that. They're different skill sets, with different goals.
"These are just issues that someone unfamiliar with a field would face. None of them are problems for those of us in the field."
How is this argument different from defending code with bad variable naming by stating that it doesn't cause issues to anyone familiar with the code base?
In the other words, E(...) has a hidden lambda there, and the fully consistent usage would be E(X -> X^2) (instead of E(X^2)) and so on. The covariance thing would be E((X,Y) -> X*Y) with an implied domain being a Cartesian product of that of X and Y. Of course we humans can easily infer the domains, and writing explicit domains every time is not efficient.
Well, formally a random variable is already a function which assigns values to elements of the probability space (outcomes). The expected value is just another name for an integral over this space. When the probability space is discrete, "the integral over a probability space" is just another name for a weighted sum. The domain is always the same: it's the probability space.
The real bad notation (which is employed here, actually) is f² meaning (x ↦ f(x)²) while at the same time f⁻¹ means the inverse function of f instead of (x ↦ f(x)⁻¹).
I think the fundamental source of confusion might be due to not understanding the concept of a random variable. In his example, X is a random variable; the expectation E[X] is an functional applied to its probability mass function. Given this, it should not be surprising if it seems to behave differently to "any other function, in math".
If you understand this, I think the notation is natural:
We have a random variable X, which takes value x_i with probability p(x_i).
Thus, the random variable X^2 will take value (x_i)^2 with probability p(x_i).
Given that the expection for the random variable X with p.m.f p(x_i) is defined as E[X] = \sum x_i p(x_i), it should be clear that to obtain the expectation of any random variable we must sum over the product of (value) and (probability of that value).
It should also be clear that this gives E[X^2] = \sum (x_i)^2 p(x_i)
I'm confused by his comment that:
p(xi) isn't, because it doesn't make any sense in the first place. It should really be just PXi or something, because it's a discrete value, not a function!
The probability mass function is a function: for a given value, it gives the probability that the the discrete random variable takes that value. To calculate the expectation we use the values obtained by evaluating the function at discrete points, but what else could we do?
You are right, and the deeper source of the authers beef seems to be that he wants mathematics to be about textual rules. Now that's a fine thing in a computer language -- and there are philosophers who say mathematical truth is no more than computation. But mathematical notation is for humans to talk about things that have meanings that humans can understand.
The fact is that E[X^2] is natural way of an important concept. Whereas "\sum \sum (x_i)^2 p(x_i^2)", need mean nothing at all (especially is $p$ is not defined over all the $x_i^2$).
> Now that's a fine thing in a computer language -- and there are philosophers who say mathematical truth is no more than computation. But mathematical notation is for humans to talk about things that have meanings that humans can understand.
Then how come an unfamiliar piece of source code is easier to follow than a mathematics paper using unfamiliar notation?
You can look up the definition of each function and learn precisely what it does, without ambiguities. You can open up a debugger and trace every step of the algorithm. Source code tends to be formatted to emphasise its division into units that can be analysed separately. Even if the code uses a framework you don't recognise, you can probably understand some of the underlying logic just fine, thanks to a consistent syntax and (often) descriptive function names.
In a mathematical paper, often there's some notation that isn't defined and there isn't even any pointer where to look for definitions. Important objects are given one-letter names. Theorems are often only numbered; good luck remembering what lemma 2.17.3b was about. Proofs are written in prose (instead of something like Leslie Lamport proposed[0]). You have to fill in conceptual gaps and remind yourself of unstated assumptions. Notation and terminology is often ambiguous — what does "exponential time" mean? Is it DTIME(2^(c*n)) or DTIME(2^polynomial(n))? Does "increasing" mean "strictly increasing" or "non-decreasing"? Is zero a natural number? And so on.
A mathematical formalism that could be interpreted in a purely mechanical way would be a huge improvement.
Indeed, looks like scientific literature in general and mathematical literature in particular are stuck in the past century. As much as I enjoy reading printed materials, I often think that using a modern computer-based approach to presenting them could prove a lot superior. And no, PDF would not cut it, even with the possibility of hyperlinking. In my view, PDF is like the idea of the horseless carriage, or, rather, of a carriage pulled by a mechanical horse - a new technology stuck in the past. Looks like the modern web browser platform, on the other hand, could provide a reasonable base on which one could build an extremely powerful "IDE" for creating and interacting with mathematical expositions.
> Then how come an unfamiliar piece of source code is easier to follow than a mathematics paper using unfamiliar notation?
This is clearly subjective. I read lots of mathematics research that I find more accessible than even pretty mundane source code (and yes, I write software for a living and have done so for over ten years).
I feel obliged to point out that HoTT is… really rather abstract. I'm just starting to learn about it now in my spare time, and I hold a master's degree in mathematics specialising in logic.
The answer to all of your questions: computer programs are written for computers to understand, and computers aren't very smart. Mathematical papers are written for mathematicians to understand, and they tend to be a lot smarter than computers.
Machine language is also written for computers to understand, but it's not easy to read. Computer-formalized math is usually even harder to read than regular notation. I don't think computers have much to do with this.
True, I should have said it better as, programming languages are written for compilers to read, and they're not very smart. Machine language is written for computers themselves. Computer-formalized math I don't think is written for anyone to read.
In a way he is also kind of right - although I don't agree with his conclusion. E[X^2] is not the most precise notation. I have occasionally seen notations like E_X[X^2] to avoid confusion and clearly state "the expected value for X^2 with respect to the distribution of X" or something like that. I remember multiple times were imprecise notation has kept me from understanding statistical methods, while I would say that I am reasonably familiar with the field.
Another problem: Different "schools" use different notations, which are then mixed arbitrarily. This is a major problem for example on Wikipedia, where articles use different notation and sometimes switch notations within the article.
The old joke that "differential geometry is the study of properties that are invariant under change of notation" is funny primarily because it is alarmingly close to the truth. Every geometer has his or her favorite system of notation, and while the systems are all in some sense formally isomorphic, the transformations required to get from one to another are often not at all obvious to the student. —John M. Lee, "Introduction to Smooth Manifolds"
I started doing a lot better in calculus when I started using longer notation (e.g. f = x -> x³ instead of f(x) = x³) and making sure that things "type checked". For instance, this tendency to use f(x) to refer to a function, rather than just f, was very confusing to me, because f(x) is an element of the co-domain while f is a function (typically from real to real in my undergrad classes). I had to figure this out by myself because the textbook I was using and the prof all went with the notation that wouldn't type check. When I finally realized that dy/dx should instead be (d/dx)(f), things started being a lot clearer to me: derivation takes a function and returns a function and f is a function so everything checks out.
It's good to think of dy/dx as (d/dx)y. In addition, it is also possible to make some sense of dy/dx. Here's one very hand-wavy way of looking at it.
Let ε be something very small, and define the difference operator d so that (df)(x) = f(x + ε) - f(x). Usually we don't want to handle the functions dx and dy by themselves, because they are so small, and their exact values depend on ε. But when we divide dy by dx we get something that is no longer ε-sized, and doesn't (in a limit sense) depend on the value of ε.
And why think way? When I learned the chain rule dy/dx = dy/du * du/dx I was told that even though the du's appear to cancel out, this is just abuse of notation and basically a meaningless coincidence. I understand that the teachers just wanted students to be careful; they don't want people "simplifying" dx/dy to x/y. However, I was never really satisfied with this explanation. I finally realized that by thinking about it using the difference operator above, it is not a meaningless coincidence: the du's actually do, in a sense, cancel out.
This a thousand times. I boycott math notation. My most memorable math moments are:
- giving a talk on notation in high school (I was not tasked to do this, I decided to do it on my own) because I was FREAKED OUT by how we were using tons of symbols nobody had ever explained or defined
- converting all Math I encountered in University to Common Lisp programs to get rid of the shit notation
- bursting into crazy laughter when after five algebra lectures the prof notices that the students parse his notation differently than he does
I had the similar experience when I started to learn math, even the Common Lisp part. I knew how to program from young age and I thought math syntactically. From that mindset it was obviously painful.
But once you really start to understand math, you realize that mathematical notation is not very formal or rigorous. It's shorthand visual help to keep track actual mathematical objects behind the notation. S-expressions are perfect for formal definitions and programming. They are not so good for actually thinking mathematics. Mathematical notation can be dense and contain lots of information.
Notation is problematic for students because underlying concepts or notation is almost never explained well. For example, I don't remember anyone explaining what functional or implicit functions are before using them heavily. I had to figure them out myself.
Out of interest, what subject did you study at University? I studied maths, and everything was defined rigorously at the start of every pure course; and you could always stop the lecturer and ask them what a given piece of notation meant.
Mathematical notation was designed to calculate by hand. If you actually tried to calculate anything by hand, you'd dread long variable names and not being able to use infix symbols with sensible precedence rules. But, of course, a programmer would rather die than calculate anything by hand.
I had an exam in probability theory yesterday, so these topics are still quite fresh in my mind. The confusion already starts when he uses the equation
E[X] = \sum_{i=1}^{\infty} x_i p(x_i)
for the expectation. In my class it was introduced as
E[X] = \sum_{\omega \in \Omega} X(\omega) P(\omega)
= \sum_{ x \in X(\Omega)} x p_X(x)
which totally makes sense if you know that X is a function that assigns a value to each possible outcome. In most cases we don't actually care about the outcomes, so there is the second description using p_X.
The subscript X is important to highlight that p is not just some arbitrary function, it is p_X, the probability mass function of X.
Now when you want to compute the expectation of X^2, you use
i.e. the substitution he wanted to do actually works when you make the dependency of p_X on X explicit.
Now p_{f(X)} is not that easy to compute from p_X in general, because you have to account for multiple possible ways to reach the same value, e.g. x^2 = (-x)^2. For f(x) = x^2 we have
p_{X^2}(x^2) = p_X(x) for x = 0
and = p_X(x) + p_X(-x) otherwise
If f is more complex, there is a third way using
E[f(X)] = \sum_{x \in X(\Omega)} f(x) p_X(x)
which amounts to the same thing, but is usually easier to calculate.
One interesting observation that popped into my mind when I read the OP: Mathematicians don't write mathematical notation in papers/books by hand anymore, they use a far more verbose language called LATEX.
Wouldn't it be great if every time you saw a mathematical formula there was a little widget to push that would show you the "source code" in LATEX++, and LATEX++ was like LATEX but made up of stringently defined mathematical operations (like '\element_wise_multiplication' instead of '\plus_sign_with_circle_around_it')? :D
Now we're getting close to the heart of the matter: In LaTeX it's "good practice" to define the semantics of operators, in programming languages it's an absolute requirement.
Yeah, it's a bummer that LaTeX math notation is really just markup and has no indication of what belongs together and what the purpose of various symbols is.
Well, when I write TeX (and I stick to the "good practice" of defining "semantic" macros for common symbols), I often notice myself finding the TeX source code more readable than the PDF it produces.
I remember being so outraged when i was first introduced to derivative in high school... Seeing that they didn't use the same notation nor exact same definition in my physics class and maths class, in the same year, that was making me absolutely furious...
I have similar feelings with music notation too. we never applied our "user experience" standards to mathematical notation and music notation. If these things were invented today, we might come up with better ideas.
One painful experience I have reading math is telling which variables are "vectors" and which are "scalars". another is the similar looks of certain greek characters and english characters, such as alpha and a.
I was just going to mention music. I learned some piano as an adult. I was never able to accept that the same note positions on the treble staff had a different value on the bass staff. Yes, I understand why it's so, it's just terribly user-unfriendly.
Also, in keys other than C, I felt that it wouldn't kill them to mark every instance of the sharp and flat notes, instead of doing it once at the beginning and implying it everywhere else.
If I could fix one things in Maths, it would be to introduce an explicit import statement. Right now it's very hard to work back what the symbols mean in a specific context unless you're familiar with the field.
from url/to/geometric-algebra.pdf import X;
I don't mind the overloading too much and it would always be possible to alias symbols in case two or more fields are user together.
I think this would work both at the logical level (e.g., the concept of azimutal angle in a xyz-coordinate system) and as a stylesheet level (theta for physcisits, and phi for mathematicians).
Such annotations would really help with browsing math content—you can see what prerequisites concepts are used in a paper from its import statements, without the need to read the whole thing. Also, you could browse from the other end (reverse-imports / uses), looking for docs that make use of a given concept.
Math notation is a language that evolved over centuries.
The English language isn't consistent but we all seem to be able to use it to communicate here.
Same thing with maths. Some notations are holdovers from earlier eras, but we still introduce them to students in case the run into it in an older book (dx/dy for example).
And maths isn't just about computation. It's also about expressing ideas, and sometimes that is easier when the notation isn't rigidly 'executable' as some posters here would have it.
The post is a pretty childish rant. One of the great facets of mathematical writing (and not learning, unfortunately) is that you can explicitly define your own notation, and then use said notation whenever you want. The author even notes this in his final paragraph, but doesn't seem to see it as an advantage of mathematical writing.
Prior to modern notation, mathematics was written out in english in full. What we have now is significantly better than what existed before modern mathematical shorthand.
Also, the expectation is an operator and not a 'function' (in the sense that it does not take values in one of the canonical scalar fields e.g. R, C). The notation makes perfect sense in this setting. For example, the expression E[x^2] should be interpreted as E acting on the function x -> x^2 and not on a number x.
As a developer I'm not quite happy with a single-letter names. Of course it's ok for minor local things to be named 'x' but for more important values and functions there is no problem these days to have a readable names just like in software. So you could actually read of a paper without guessing and searching those epsilons, lambdas and cappas and cryptic symbols. Use expectation(x) instead of e(x), use mean(y) instead of \hat y and so on.
It really depends. Within a given field certain concepts are so common that you really do want a way to express them in the most terse possible way, just as programmers use i for index, err for error and n for counts. This allows you to put more stuff in a smaller space which means you don't have to jump back and forth to understand something -- sort of like how it's nice to put different parts of an app into different files and even to split them up into separate libraries, but go too far and the flow of the application becomes opaque.
Have you ever had a chance to read some of those really old mathematical proofs that didn't use any mathematical symbols at all? They're a nightmare to try and understand. There's a tradeoff. No disagreement, though, that mathematicians tend not to be very great at finding the sweet spot where the trade-off balances :-)
(Also, the sample mean of y would be `\bar y`, hats are for estimators in general.)
In my experience people that are very verbose, that talk or write at great length and with ease, tend to have a dislike for terse formulations. Sometimes to the point of being offended by it. Math is just about the ultimate in terse notation and perhaps the author is one of these verbose people.
The same thing can be seen with programmers: The verbose programmer writes a lot of lines of code and is proud of it, while the terse programmer is proud to remove or simplify code.
This is just really weakly argued. Firstly, the notation does use capital X and and lowercase x. You need to realize these are different. It's not substitution to go from X^2 to x^2. Capital letters are not real numbers, and thus it's a type error too. Secondly, if you are familiar with differentiation and integration, you are familiar with straight substitution not always being correct.
You should learn what the concepts mean instead of judging the names associated to them. We call something a Ring to avoid saying "a set with two operations which behave in a certain way [...]" every time we refer to it.
From wiki:
"""[edit]
The term "Zahlring" (number ring) was coined by David Hilbert in 1892 and published in 1897.[9] In 19th century German, the word "Ring" could mean "association", which is still used today in English in a limited sense (e.g., spy ring),[10] so if that were the etymology then it would be similar to the way "group" entered mathematics by being a non-technical word for "collection of related things". According to Harvey Cohn, Hilbert used the term for a ring that had the property of "circling directly back" to an element of itself.[11"""
Funny, I said the same about Andrew Ng's ML course.
Ultimately, if what you're teaching is going to end up in software, why use math at all? Use code or pseudo code. I don't think it's bad to just give the working algorithm without having to prove the math.
Really how many students will end up being computer scientists anyway, and research and write about new methods of doing AI, and do the actual math? So few. I guess that's a simple criticism of academics.
It's just easier to work with code than mathematical notation most of the time, in my view. You can't replace math, of course, but when things are simple enough, it could be avoided. It's a matter of making math accessible to the most people.
Code is amazing because a computer can check to see if it works. A computer doesn't understand math.
Understanding underlying math is what allows people to create improved algorithms. If it's just "implement scikit" then you barely even need a developer.
Understanding why things work is still important and why software companies still routinely test on algorithm design.
Domain-specific languages, notation, and vocabulary exist for many disciplines. They are for the benefit of experienced practitioners and allow them to express common concepts and ideas in a precise and concise fashion to those who have been also been trained in the art.
Sometimes jargon, abbreviations, and cryptic notation exist to artificially make it harder for outsiders to understand what's going on, but the examples in this post don't convince me. The true issue here is that the appropriate Wikipedia article has no link to the "Simple English" version.
Most humans are capable of using context to distinguish usage. Operator overloading in programming languages and in mathematics can be useful sometimes.
> Most humans are capable of using context to distinguish usage. Operator overloading in programming languages and in mathematics can be useful sometimes.
No doubt about it! But the difference as I see it, and what the OP is complaining about, is that in domain specific programming languages it's always possible to find a definition for the notation (an overloaded operator is defined somewhere, and so on); whereas in mathematics the only real definition is in the head of a mathematician.
The problem is that while it's easy as an expert to distinguish usage using context, it can make things needlessly hard on students who are learning new concepts, because whenever they're faced with something they don't understand they have to worry about whether they don't understand the concept or whether they don't understand the terminology. For example, sin^2(x) is often used as a shorthand for sin(x)^2 when really it should mean sin(sin(x)). Or in basic combinatorics, n and k and s can mean pretty much whatever the author wants it to mean, which is a pity when you could just standardize on something.
If you're doing math, you need to be able to manipulate the symbols easily. These notations develop because they're easy to work with: consistency is less important than brevity. Lambda calculus terms may be easier to read, but they're much slower to write, and and they take up more space on the page or blackboard.
That said, once you've arrived at some kind of result, you could switch to a more consistent notation for explaining whatever you've found. It doesn't necessarily have to be textual, either (see http://worrydream.com/KillMath/).
>Two, wikipedia refers to E(X) and E(Y) as the means, not the expected value. This gets even more confusing because, at the beginning of the Wikipedia article, it used brackets (E[X]), and now it's using parenthesis (E(X)). Is that the same value? Is it something completely different? Calling it the mean would be confusing because the average of a given data set isn't necessarily the same as finding what the average expected value of a probability distribution is, which is why we call it the expected value. But naturally, I quickly discovered that yes, the mean and the average and the expected value are all exactly the same thing! Also, I still don't know why Wikipedia suddenly switched to E(X) instead of E[X] because it stills means the exact same goddamn thing.
IIRC, the mean is the particular case of the expected value where the transformation function is the identity function.
The general expression of the expected value, where f(x) is the probability density function and g(x) is the transformation function, is:
E[g(X)] = \int{g(x) * f(x) * dx}
Let g be the identity function (g(x) = x), then E[g(X)] is the mean.
For example, let's say that you bet on a throw of dice such than you win 2$ if the result is odd, lose 3$ if the result is 2, and win or lose nothing otherwise. Here, the event X is the throw of dice: you have 6 possible outcomes. Your transformation function is the gain you get for each outcome, therefore g(x) is defined by:
1 -> +2
2 -> -3
3 -> +2
4 -> 0
5 -> +2
6 -> 0
Then, the expected value E[g(X)] can be calculated by summing all the g(x) multiplied by the probability p(x) for all x (all outcomes).
The first point of the article, in my opinion, is more about statistics than math in general, and I believe people find it weird because of the confusion between X and x, the former being a random variable associated with a distribution function, the ater being a value in the domain of X.
The second point of the article is just ease of notation. The derivative of a function doesn't exist; a derivative is always made with respect to a variable (df/dx), but if a function has only one variable, then the derivative of the function is defined as the derivative of the function with respect to the one and only variable and we don't need to write this only variable (f'(x)).
You could always write the variables of a function (df(x, y, z)\dy instead of df/dy), but that's just a waste of time, specially when you have to write it for every line of the demonstration.
The problem here is not the notation. Mathematical notation is not perfect, and can sometimes be confusing. Let me say this in a minimally offensive manner, without being obfuscatory: the author of this piece does not understand the mathematics underpinning the notation he is using. The root cause seems to be the use of probability theory in a cookbook manner. It can hardly be surprising that confusion results.
The first example is a formula. An instance of magic, in the sense that you use it to compute, without knowing what it does. The $x_i$'s are not quantified. What are they? Are they real numbers? Matrices? Elements of some semi-group? How can you expect to understand the "formula" if the summand is not explained? At best, I can say that it is a formal sum of something. We can forget discussing convergence or it being well-defined. You can cook up arbitrarily 'nice' notation. It won't help. This notation is absolutely fine for someone who can infer that the support of the distribution of X is some denumerable set {x_i}, equipped with p.m.f. p.
A suitable definition of the expected value (as an operator) would have cleared up all the confusing with the variance and E[X^2] vs (E[X])^2. This confusion is not the notation's fault. It is the user's fault for not knowing what E[f(x)] means (for some appropriate meaning of the symbol f).
>> Only the first xixi is squared. p(xi)p(xi) isn't, because it doesn't make any sense in the first place. It should really be just PXiPXi or something, because it's a discrete value, not a function!
Functions are not algebraic expressions by which we associate one real number with another. In fact, we call p(x_i) the probability mass function. It seems to be a common flaw in many undergrad programs. Formulas and functions are never made distinct. The vast majority of functions f : R -> R do not admit an expression in a formula.
The example with the different notation for "derivatives" is a good non-example. The so-called Leibniz notation is used because it allows people to make statements with differential forms, without needing to invoke exterior algebra. If this is done correctly, statements such as "dy = f'(x)dx" can be made fully rigorous, if need be. Students are told that dy/dx is not a fraction, and yet it is used exactly as though it were. This confuses people - because they don't know what is going on. The dot-notation for derivatives is extremely useful in classical mechanics.
Notation is a clutch for succinct and meaningful writing amongst the initiated. One cannot expect to be able to use these tools without knowing what is going on, or by suspending a great deal of questions.
>> There must be other ways we can explain math without having to explain the extraordinarily dense, outdated notation that we use.
My final gripe with this post. We typically use clean and modern notation. It could be so much worse! Also, if we didn't re-use symbols, then we would run out, very quickly. Mathematics exists independently of the symbols we use to communicate it.
Seem like you are talking more about statistical notation than mathematical notation. Please correct me if I'm wrong, I just think the topic is a tad off.
Real maths is done either in books or papers, where the authors explain every bit of notation they will use. This is not the case in online resources, which is indeed a bad practice... But hey, don't blame maths for that.
And regarding the "understanding maths in terms of computer programs", sure that's possible to do with _some_ topics, but you just can't expect a computer to represent _every_ concept you'd like.
Show me a paper that explains "every bit of notation they will use"! ;P
(I understand there's a difference between "explaining" and "stringently defining", but I can't help thinking of the only book I know that actually tried to stringently define all notation: Principia Mathematica. If I remember correctly it took Bertrand Russel about 160 pages to define addition and prove that 1 + 1 = 2. :P)
First, the expectation thing. He's using a special case, E(X), and complaining that the more general case doesn't follow the general case. It's like saying "Well the plural of mouse is mice but the plural of house isn't hice!". The general definition of expectation (for a discrete probability) is
E(f(x)) = sum f(x_i)* p(x_i)
If you start with this general definition, both E(X) and E(X^2) are perfectly natural. The author's error of starting with the special case in no way implies an issue with notation.
And how is the fact that Wikipedia is inconsistent between E(X) and E[X] in any way mathematical notation's fault? If you read a novel that starts using ' for quotes and switches to ", that's an issue with the novel (assuming its not stylistic) and not an issue with the typography in general.