I think one of the problems is that calculus teaches us the ideas that there's a...

I think one of the problems is that calculus teaches us the ideas that there's an construction called a derivative, but in reality there's lots of different kinds of derivatives with different semantics and types. For the vast amount of applied math and engineering work, the two we need are the total (Fréchet) derivative:

https://en.wikipedia.org/wiki/Total_derivative#The_total_der...

By the way, most of that article is terrible, but the linear map definition is the one we want. There's also the directional (Gâteaux) derivative:

https://en.wikipedia.org/wiki/Directional_derivative

Candidly, unless you really know your problem, we virtually always want the total derivative since it gives rise to things like gradients and Hessians, which are useful objects that we can store in memory.

Now, the reason that I bring these two up is that their spaces, or really their types, are different. Given a function f:X->Y, the total derivative is a linear operator from X to Y:

(total) f'(x) \in L(X,Y)

The directional derivative is an element in the space Y:

(dir) f'(x;dx) \in Y

Now, at this point, the notation is screwed up since we used Lagrange notation for both. The reason that we can get away with this is that under certain assumptions, that are mostly satisfied in the things we care about, we have that:

f'(x)dx = f'(x;dx)

Alright, so why should we care? Leibniz and Newton notation do a terrible job at capturing this information. Lagrange and Euler notation do a good job at this. For your example:

f(x0) = d/dx (sin(x)cos(x)+x^2) | x=x0

The types don't line up because sin(x)cos(x)+x^2 is value, literally a real number, not a function. Using the above, I would write this as:

(x \in R |-> sin(x)cos(x)+x^2)'(x0)

In LaTeX |-> would be \mapsto. This types correctly in the definitions above since

x \in R |-> sin(x)cos(x)+x^2 \in [R -> R]

and

(x \in R |-> sin(x)cos(x)+x^2)'(x0) \in L(R,R)

Of course, you probably wanted the value and not the function, which explains why we cheat in 1-D. So, we really should write:

(x \in R |-> sin(x)cos(x)+x^2)'(x0) 1 \in R

where we feed it the direction 1. And, yes, this is slightly more cumbersome that we may want, which is why there's a huge number of different notations. However, I do assert that the above generalizes properly all the way into infinite dimensions (Hilbert spaces) and provides a good foundation for typing out mathematical codes.

By the way, if anyone is looking for a book that does this right in my opinion, Rudin's "Principles of Mathematical Analysis" is amazing and his notation is good. For infinite dimensions, I prefer Zeidler's "Nonlinear Functional Analysis and Its Applications." Personally, what I look for is what I call properly typed notation that gives us easy access to useful tools like gradients, Taylor series, chain rule, and implicit and inverse function theorems. Again, most engineering and applied math work requires these theorems everywhere, so I find it best to keep them clean.