There are many evaluation strategies for tensor contractions; naively translating into nested for loops and then doing direct random-access arithmetic is just one approach. If you like numpy's einsum, a drop in replacement is https://optimized-einsum.readthedocs.io/en/stable/. In general, it's an open question as to how to best "compile" a tensor contraction expression into something executable, for several reasons (there are a lot of possible orderings for the contractions, if one wishes to rely on BLAS routines for efficient primitives one will find that they don't exactly fit the problem of tensor contraction like a glove, and so on).
For example, the optimization of np.einsum has been passed upstream and most of the same features that can be found in this repository can be enabled with numpy.einsum(..., optimize=True).
So numpy should provide the same perf for most cases.
This is, strictly speaking, not true. Talking about an "orthonormal" basis implies that you have in mind some Hilbert space; but in any such instance there will be interesting functions that are not in this Hilbert space. So consider for example the standard space L^2(R) of square-integrable functions on the real line. This does not contain the function f(x) = 1, as a really dumb example.
You are nitpicking, and completely missing my point. Let me rephrase it.
The original article is about neural networks being able to represent any function (for some definition of any).
I was just pointing out that there exists much cheaper ways of representing any function. Therefore the article seems very unexciting to me.
Btw, you have chose L^2(R) yourself and then used that to show that there are interesting function not in the space you have chosen, quite a circular argument.
Since the article on neural networks never mentions functions defined on a infinite domain, one can easily take L^2([0,1]) or L^2([0,Lambda]) up to some cutoff Lambda. I would say that all non-pathological functions you can think of are there!
> You are nitpicking, and completely missing my point.
> I was just pointing out that there exists much cheaper ways of representing any function. Therefore the article seems very unexciting to me.
To be fair, your original comment didn't really make a point. What kind of cheaper representations do you have in mind? What makes an orthogonal basis of functions too "expensive" a representation for your taste?
> I would say that all non-pathological functions you can think of are there!
I'd argue that most "real functions" that we care to learn (e.g. mappings between high dimensional data and labels) are pathological. In this sense, we should really care about the completeness of these spaces, perhaps even more than the well-behaved ones.
All physics assumes you are dealing with non-pathological functions, except for some really particular cases. You can do nearly everything in Electromagnetism and nearly all Quantum Mechanics with non pathological functions.
Maybe we have a different definition of pathological, I am using it in the way a physicist would use (i.e. continuous, continuous derivatives, so on)
The physics I've done have only done with n-particle systems, where n is a number I can count on a hand. Likewise, most of the examples I'd learn in a pure math class are either (a) relatively well-behaved objects with nice properties or (b) "pathological" objects that are constructed specifically to prove a counterpoint.
The kind of pathological function that I'm referring to is neither of these. For example, what does the manifold of all 1 second clips of the word "the" look like? If the clip is sampled at 60 Hz, each clip is already in 60d space. I'm inclined to think that it's some unimaginably complicated manifold that would likely fall into the category of "edge cases", which previous commenters have mentioned and it sounds like you're discounting as nitpicking.
I don't know if this aligns with what you mean by a pathological function, and I'm happy to continue having this discussion with a more concrete example of what you mean. :)
No, neural networks cannot "represent any arbitrary function". Find a theorem that you think states otherwise, and then read what the theorem actually states.
You're correct, the statement of the theorem refers to continuous functions.
In practice, however, the input to neural networks is represented by floating point values, which is a discrete set. So pick whatever arbitrary function you would like, there is some continuous approximation to that function which is actually equal to it on every floating point value, and that function can be approximated arbitrarily closely by a neural network.