Just going to throw my 2-cents in - I should note that my knowledge of PID is very limited.
The first time I encountered an explanation of how PID worked that made sense to me, was via the Udacity CS373 course that I took in 2012 (Thrun also had a great explanation on how a Kalman filter worked as well). This explanation of PID was repeated when I took the Udacity Self-Driving Car Engineer Nanodegree (starting in 2016).
In the CS373 course, Thrun detailed a few different ways to tune a PID controller - but one way he covered was rather curious in how it worked. It wasn't perfect, but it got you "close enough". I'm not sure if it would work well for a "real system" but for the purpose of learning it seemed to work well enough.
He called it "twiddle" - I don't know if he was the original author of it, or what - but here's a video that describes how it is implemented and how it works:
Twiddle, AKA coordinate ascent, definitely works in practice. It's basically the same process a human uses to tune PID parameters, cyclically tune one parameter at a time until you're performance is good enough. It's closely related to gradient ascent and suffers from the same problems, namely that it can get trapped in a local minimum, but it's easy to implement and does a good job if you start with an ok guess at the parameters.
The first time I encountered an explanation of how PID worked that made sense to me, was via the Udacity CS373 course that I took in 2012 (Thrun also had a great explanation on how a Kalman filter worked as well). This explanation of PID was repeated when I took the Udacity Self-Driving Car Engineer Nanodegree (starting in 2016).
In the CS373 course, Thrun detailed a few different ways to tune a PID controller - but one way he covered was rather curious in how it worked. It wasn't perfect, but it got you "close enough". I'm not sure if it would work well for a "real system" but for the purpose of learning it seemed to work well enough.
He called it "twiddle" - I don't know if he was the original author of it, or what - but here's a video that describes how it is implemented and how it works:
https://www.youtube.com/watch?v=2uQ2BSzDvXs
It's actually pretty neat - and the concept can be applied to virtually any algorithm in which you are trying to minimize an error amount.