Thanks, I can vouch for the Bertsekas book. I'd also recommend:
Russell, Stuart, and Peter Norvig. "Artificial intelligence: a modern approach." (2002).
Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018. (As another commenter just pointed out.) They have a great example with a recycling robot.
If you buy "Artificial intelligence: a modern approach," I suggest coughing up the extra money for the latest version. It's several times more expensive than older copies but they really updated the content.
I might have mentioned that for your example, we can have good confidence in the Markov assumption -- we just say we need to trust the random number generator.
In real world practice, i.e., where we don't get to drive the stochastic part of the problem with our own random number generator, the Markov assumption (past and future conditionally independent given the present) can be, commonly is, difficult to justify.
But, then in just the theory, every stochastic process is Markov -- just let the state be the history -- I intend this as a joke, but actually, technically in the math, it is true.
If we DO make the state the history and have a problem in continuous time, then, at least in theory, we are conditioning on uncountably infinitely many random variables, and just defining how to do that is a bit interesting.
Then for our decisions. policy, we can encounter the problem of measurable selection, also a bit interesting and surprising.
I.e., dynamic programming in continuous time gets us into measure theory, the Radon-Nikodym theorem, etc.
Stochastic dynamic programming, when find a good application and can build a good solution, i.e., keep the computational demands within the super computer range, can seem really smart, of course is not prescient but first cut, intuitively, can seem to be. Your solution likely also seems prescient.
More applications would be nice.
Here is a potential, potentially large, collection of new applications: Develop a spreadsheet, for an example, for, say, budgeting for the next 5 years. To have a simple description, have one column for each month, for a initial column and then one more for the end of each of the 60 months, and have one row for each variable. Some of the spreadsheet cells might have random quantities (have them all independent, to keep the Markov assumption easy to justify), and some of the cells might be empty and available for decisions.
Now, presto, bingo, whether we knew or intended it or not, we now have formulated a problem in discrete time stochastic dynamic programming, and since in practice problem formulation is a common obstacle, we have maybe made some progress.
In principle at this point the software for a solution is well defined. And since long ago people wrote general purpose stochastic dynamic programming software, the software we would need is in a sense actually simple (if we don't care about execution time).
Now just feed this problem into a suitably large computer and get the resulting decisions and the expected earnings, we wanted to maximize, at the end of the 5 years.
If make the software more complicated, then can put in some ideas to make the computations faster by some factors of 10.
Nemhauser and Dreyfus and Law are
intended as intros and are quite well written. Bellman, who likely should be named the father of dynamic programming, is the oldest book on my list and is not intended as an intro. But if can get Bellman for less than $10, maybe for the price of a Big Mac, then eat a lighter lunch and get the book.
Bellman's book likely covers the Hamilton–Jacobi–Bellman equation,
not so easy to understand via the other intros.
Not really a biggie but worth knowing, a big part of this subject in computation is the value function, and at least Bertsekas used neural nets as a means to have an efficient approximation. So more recent treatments may include this idea.
Dynamic programing is one of the leading examples of "the curse of dimensionality".
The field has not been still; e.g., multivariate splines have been an idea for handling the value function in computation.
Of course can have software go through the motions of stochastic dynamic programming when the Markov assumption is not satisfied; the software won't object. But when the assumption does old, and the rest of the effort has good quality, can argue that the work provides an optimal solution, the best possible, of course, in expected value, for any means of handling the data. In short, we have what is called "the principal of optimality".
The continuous time case, with all the math details filled in, is more difficult; so for an intro, stay with discrete time.
There has been a lot of attention to this subject in the ORFE (Operations Research and Financial Engineering) department at Princeton. Long the chair was E. Çinlar. The best course I took in school was from a Çinlar student.
Of course, if expect artificial intelligence to be really smart, then as a special case it should be able to be as good as stochastic optimal control. And apparently now some of reinforcement learning is using the core idea of stochastic dynamic programming. But without the Markov assumption, might count that application as a heuristic with no more than weak claims about optimality.
I spent a lot of time in the field. Eventually I concluded that generally in the US economy a lot of knowledge of stochastic optimal control and a dime wouldn't cover a 10 cent cup of coffee. Would have a lot better chance buying a house and supporting a family getting paid to develop Web sites, e.g., to sell, say, used math books, including Bellman's. Maybe there have been and are some niche applications (maybe somewhere in US national security), but that niche is likely much sharper than any razor.
There is something of an organizational problem for a math guy getting hired to apply stochastic optimal control: The person hiring you, your hiring manager, will likely know much less about the math than the math guy, likely know nothing about the math. So this manager will be very reluctant to allocate much of his (her) budget and risk his job to support work he doesn't understand and, that, indeed, might justify the C-level suits promoting the math guy over his manager.
Due to such issues, the math guy with an application potentially valuable in the economy could be better off doing a corresponding startup. Else, to avoid scaring hiring managers, he might be better off omitting such math from his resume. So, for getting hired as an employee, a lot of math background on a resume can be from useless down to, say, a felony conviction.
A recipe for rabbit stew starts out "First catch a rabbit.". A recipe for applied math might start, "First find an application." Can't argue that some topic in math will be useless forever outside of math and in the economy, but forever is a long time. As it is, it can appear that some math papers and books, including what looks like applied math, are written before seeing any rabbits.
Yet, the best of pure/applied math is in some senses super terrific stuff, way up there with the best of Bach and Beethoven, etc., and at times there are good applications. E.g., there is some pure and applied math at the core of my startup; the pure math provides some special support, a version of optimality, for the whole effort. Just what that pure math says is terrific, gives a solid guarantee where intuitively we have only confusion, is so good it's tough to believe, but still it's true. The applied math is original with me, powerful for my startup but no biggie as research.
Thank you, I picked it up today. There's a chapter on the calculus of variations saying it has a new viewpoint to offer (with the equation you mentioned, iirc). Going to be a challenge with my background.
It does sound like a startup suits you best. Best of luck with it!
Calculus of variations is, yes, a part of optimization, now regarded as an old part. But subsequent to Bellman's book was the Pontryagin maximum principle. There is an entry at Wikipedia:
David G. Luenberger,
Optimization by Vector Space Methods.
which I call fun and profit from the Hahn-Banach theorem.
There may be more in
L. S. Pontryagin,
V. G. Boltyanskii,
R. V. Gamkrelidze,
and E. F. Mischenko,
The Mathematical Theory of Optimal Processes.
and in
E. B. Lee and
L. Markus,
Foundations of Optimal Control Theory.
and in
Michael Athans and
Peter L. Falb,
Optimal Control:
An Introduction to the Theory and Its Applications.
At one time I was interested in such math as a way to say how best to climb, cruise, and descend an airplane. So, I chatted with Athans in his MIT office and got a copy of his class notes. And I got accepted to grad school at the Brown University Division of Applied Mathematics where Falb was. Then I applied a little skeptical judgment and didn't go back to Athans and went elsewhere for my Ph.D.
I also visited Brown and Cornell. At Cornell I met with Nemhauser, and he gave me three words of advice that was the real direction for my Ph.D. dissertation. Flying back on the airplane, I figured out what Nemhauser told me.
When I got back, I called a meeting to outline what I'd discovered from Nemhauser, Athans, etc. My manager ordered me not to hold the meeting, but I did anyway. All the C-level people and the founder, COB, CEO showed up. I got a promotion and big office next to the founder. What I presented was the start of two Ph.D. dissertations.
The Fleming reference may be enough. Uh, while I visited Brown, I had lunch with Fleming -- a very bright mathematician.
A good source for prerequisites is the first (real) half of the W. Rudin, Real and Complex Analysis.
Broadly that optimization -- deterministic optimal control -- works with curves where each curve is regarded as one point, vector, in some vector space. The space is short on good assumptions so may be only a Banach space. Actually can still do a lot in Banach space, e.g., Luenberger's book -- one of my favorites. Gee, there can also get a really short treatment of Kalman filtering!
The fraction of mathematicians who know all that math well is tiny. The people who make good money from applications has to be smaller -- I would believe so small there are none. From all I can see, good applications are so rare there almost aren't any.
With the rapidly growing power of computing and the rapidly growing complexity of computer applications,
discrete time stochastic optimal control (that can attack problems that first cut intuitively seem hopelessly difficult yet be comparatively simple mathematically) has a chance of valuable applications, so far not a very good chance, but a chance, probability small but still greater than zero.
That spreadsheet connection I outlined may be a seed for some applications.
Microbiology research gets some motivation from applications, saving the lives of people dying of cancer, heart disease, new viruses, etc. Maybe research in mechanical engineering works on how to put up buildings and bridges that won't fall down. Some years ago I concluded that math is being throttled by too much isolation from important, motivating applications. So, if my startup works, that will be n = 1 cases of evidence that math can do well, at least make money, with some new applications outside of math.
Dimitri P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models.
George L. Nemhauser, Dynamic Programming.
Stuart E. Dreyfus and Averill M. Law, The Art and Theory of Dynamic Programming.
E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes.
Wendell H. Fleming and Raymond W. Rishel, Deterministic and Stochastic Optimal Control.
There is more from R. T. Rockafellar and R. J.-B. Wets.
And of course we should mention R. Bellman.
IIRC, Dreyfus was a Bellman student, and Dynkin's dissertation advisors were Gelfand and Kolmogorov.