Definitely not an expert but there's a fundamental hurdle to this.
One way to think of this is to realize how easy it is to multiply numbers but how much more work it takes to divide numbers.
For something like automatic differentiation, you're essentially applying the chain rule for partial derivatives repeatedly. This is analytically pretty straightforward to do for most applications. All you need is an analytical derivative for the simple functions your more-complex function is comprised of (e.g. a neural network).
For integration, the analogue of the chain rule is integration by substitution [1]. The toolbox for solving integration problems is more limited than for differentiation. You run into issues where the answer cannot even be expressed using standard mathematical notation [2]. Sometimes you get lucky and the answer can be expressed via an alternating Taylor series so you can estimate the answer within some margin of error [3].
Stan is a piece of software that runs state-of-the-art MCMC methods to basically just compute fancy integrals. A Stan model will take an order of magnitude more time to run than a simple neural network via something like PyTorch on the same dataset. But they answer different questions.
Slightly tangential fact I heard recently on a podcast [0]: historically the problem of integration preceded that of differentiation. But because differentiation is so much easier, we teach that to kids first. It's one of the many ways that schools obscure the intuition of calculus and turn it into so many formulae to be memorised.
You're assuming that you want to express the integral as an elementary function, or as a member of another narrow class of functions. It's only in that case that the lack of a chain rule is a problem. What you actually want is the ability to evaluate the integral to within epsilon numerically, which is a much more flexible problem.
But then again, it's not clear what the OP meant by "automatic integration" anyway.
Can you please elaborate on "you never get the expression"?
When you input the variable values into a symbolic derivative you just get a value at the end. d/dx x^2 = 2x. If x = 0.5 then d/dx = 1. The same is true for symbolic integrals. For most practical applications, we don't really care about the full symbolic expression. We just want the answer, or at least a good approximation. This post uses a specific example of the difference between two Beta distributions. We want to get that 0.71. It is very hard to "automatically" make that happen.
Just to clarify: This result follows directly from the definition of the derivative:
f'(x) = lim_{e->0} (f(x+e) - f(x))/e
If you're able to express f(x+e) on the form (f(x) + y e) then it follows that y is the derivative.
It also should be noted that auto-diff doesn't let you skip the rules for derivation and you're using the same calculation as you would to show that e.g. f'(x^n)=n*x^{n-1}.
Closed form integration is fundamentally different.
What's the integral of exp( sqrt ( 1 + (tan^(3/2) X)2 ) ) ) ?
We only know a handful of forms that can be integrated in closed form and its down to our creativity to discover new forms that can be integrated (same deal with solving differential equations and the reasons are the same).
The forms that we know how to integrate can be done by a computer. CAS tools will do that for you. For example Mathematica.
The reason why auto-diff is exact and amazing is because (1) the derivative operation is "decomposable" through the chain rule and (2) we've found the symbolic derivative of the basic functions. There's not a mathematical trick in auto-diff which we recently discovered. It's mainly a reformulation of the properties of the derivative which turns out to be useful for computers.
Finding an "auto-integrate method" would probably involve finding a way of calculating the integral in a decomposable way, and that indeed would be amazing, but I don't really see that happening any time soon.
One way to think of this is to realize how easy it is to multiply numbers but how much more work it takes to divide numbers.
For something like automatic differentiation, you're essentially applying the chain rule for partial derivatives repeatedly. This is analytically pretty straightforward to do for most applications. All you need is an analytical derivative for the simple functions your more-complex function is comprised of (e.g. a neural network).
For integration, the analogue of the chain rule is integration by substitution [1]. The toolbox for solving integration problems is more limited than for differentiation. You run into issues where the answer cannot even be expressed using standard mathematical notation [2]. Sometimes you get lucky and the answer can be expressed via an alternating Taylor series so you can estimate the answer within some margin of error [3].
Stan is a piece of software that runs state-of-the-art MCMC methods to basically just compute fancy integrals. A Stan model will take an order of magnitude more time to run than a simple neural network via something like PyTorch on the same dataset. But they answer different questions.
[1] https://math.stackexchange.com/questions/1635949/is-there-a-...
[2] https://math.stackexchange.com/questions/1397132/why-cant-so...
[3] https://math.stackexchange.com/questions/145087/how-to-calcu...