Hacker News new | past | comments | ask | show | jobs | submit login
Kalman filter from the ground up (kalmanfilter.net)
349 points by ColinWright on Oct 14, 2023 | hide | past | favorite | 150 comments



I always get excited when I see these "tutorials for dummies" (like me). Like "finally, I get to take an evening to understand this concept that's eluded me for years." Generally, I get let down. This time is no exception.

They always start off well, then inevitably there's a concept or key terminology that gets glossed over without sufficient explanation. .

"The random variable is described by the probability density function. The probability density function is characterized by moments.

The moments of the random value are expected values of powers of the random variable. We are interested in two types of moments"

This is where my journey ended this time. OK, so is that the expected value of the expontnent.of the random value orthe expected value of the random value to some power... and what makes the power of a randome value so special vice just operating on the randome value itself.

It's like these authors get tired of "dumbing" things down at random points in their chain of thought and just decide to skip over stuff. Or, perhaps they don't understand the underlying concepts themselves and simply can't explain them to others.

Who knows. Just frustrating. At least in Udemy you can write to instructors and ask questions. Unlike a book author who aren't paid to respond.


One of the biggest challenges in learning, especially in learning something new, is that it’s a chain. If you don’t have something’s prerequisite knowledge, you can’t understand it (by definition). This means that only one insufficient explanation by the author blows it. As a general rule, it’s really hard to not mess up once!

You make a good distinction between someone being paid to respond and not. A feedback loop helps out a lot here.

Failing that, you need to take this into your own hands. Honestly, you’re probably going to have to tell yourself a new story to get there. Maybe having empathy for the difficulty of teaching. Maybe it’s finding some inner drive. I don’t know. But you need to look at that paragraph, accept it’s insufficiently explained, and take responsibility for understanding it.

I’m not saying to read it over and over until you “get it”. (I don’t know why people try that, it’s kinda foolish). A simple strategy works most of the time. Read it until you find a word or phrase you don’t sufficiently understand. Maybe that’s “random variable”. Maybe it’s “probability density function”. Find an explanation for that (Wikipedia, ChatGPT, textbooks, videos). The fun thing is this algorithm is recursive. So you’ll likely run into something you don’t know again. That’s okay, just keep going. If it’s really tough, a lot of the value of a tutor is steering this depth first search.

Get each concept to the point you understand it very well for the problem at hand. You don’t have to know everything about PDFs, but don’t hand wave it either. After this process, you’ll be able understand this paragraph and continue.

This may take a while! If something is in a new domain, it’s usual for you to actually spend more time backfilling knowledge than on the main content itself. That may make it not worth it for you, but it’s not inherently bad. And the next time for something similar should be faster.


The pdf is just the pullback measure.

A random variable is a function X(w) taking (eg) real values. In your probability space you already have an ambient measure space and an ambient probability measure P which takes sets in the measure space to [0,1]. The pdf is then the function defined on sets P(invX(q)). invX is a set valued inverse.

Ok, consider coin flips. Then X takes each element of sample space either to 1 or -1. Set values inverse of 1 is the sets that map to 1. Then we get the ambient probability measure of them.

You don’t really have to cope with measure theory in full to take this tiny step.


I don’t think the set of people who couldn’t understand the quoted paragraph but could understand your comment is very large.


I'd be shocked if somebody knows what an ambient measure space is but doesn't understand the nth moment of a random variable.


1/ I think you are referring to pushforward measure (https://en.wikipedia.org/wiki/Pushforward_measure): the random variable "pushes" the probability measure to its codomain. 2/ pdf requires a stronger condition: the pushforward measure needs to be absolutely continuous with respect to the sigma-finite measure (usually the Lebegue measure) on the codomain.


If anyone was frustrated like me on this concept of "moments" I found the following insightful and fascinating:

https://gregorygundersen.com/blog/2020/04/11/moments/

I have no idea how I got through undergrad and graduate school without internalizing this concept that seems so foundational. Whacky


I thought I had a pretty good grasp on this, but the idea that an infinite sum of higher order moments uniquely defines a distribution in a way analogous to a Taylor series, was new and super interesting! It gives credence to the shorthand that the lower order moments (mean, variance, etc) are the most important properties of a distribution to capture, and is how you should approximate an unknown distribution given limited parameters.


The hardest part of teaching is it's impossible to remember what it was like to not already know something. You can't un-know it so get back your previous perspective. So, you forget all the things you take as known.

This is the problem I find with the 3blue1brown videos. They're pretty but I never get any understand from them. To people who already know they nod and see all the concepts they are already familiar with shown in a neat way but to some (like me) they don't generally get me to understand. Too many pre-reqs or someting


> This is the problem I find with the 3blue1brown videos. They're pretty but I never get any understand from them.

So relieved to learn that I am not the only one!


> Generally, I get let down. This time is no exception.

This is why I don't even bother to read through such tutorials. To understand Kalman filter one first need to understand basics of probability and then the importance of Gaussian distribution (Kalman filter mathematical derivation assumes that all the probab distributions involved are Gaussian). Then one notices that Gaussian distribution is uniquely defined when you know it first and seconds moments (yes, you cannot dance around introducing moments at some point). And then pretty nasty math follows :-) Kalman filter is not an easy thing. Rudolf Kalman claimed in one of the interviews that without his filter American landing on the Moon would not be possible.


There are quite a few similarities between the Elo rating system and Kalman filters. I’ve always thought this would be a good way to teach them, because you can start with a simplified univariate case, then modify and generalize from there.


I would love to know more about the similarities. Can you share any resources that utilize Kalman filters in rating systems?


I feel this frustration in my bones, and I felt the same way up until the proliferation of LLMs. Just paste your comment to Claude (or ChatGPT maybe?) and it will explain everything you want to know in realtime.


Sure, if you don't care if it's actually correct or not.


There are so many explanations of moments in the training set, it should be correct.


Ha, no joke. I definitely find myself doing that more and more. Good reminder.


> They always start off well, then inevitably there's a concept or key terminology that gets glossed over without sufficient explanation. .

Obligatory meme: how to draw an owl: https://www.memedroid.com/memes/detail/265779.



Yep. When I write tutorials I try very, very hard to avoid having a single moment like this. Takes more effort, but I find that it makes for a better explanation.


> and what makes the power of a randome value so special vice just operating on the randome value itself.

Literally four sentences later you would have found: * The first raw moment E(X) – the mean of the sequence of measurements. * The second central moment E((X−μX)2) – the variance of the sequence of measurements.

And if you had gotten as far as the part you quoted, you would have seen an extended example of why one is interested in means and variances.


I of course read that next section, and beyond, but respectfully disagree that the expressions provided don't justify further explanation. For example, the author neglects to define 'X' as the set of support values for some probability distribution. That's left to the reader to figure out for some reason. Further, nowhere is 'E(X)' defined as the integral of x*f(x) dx, or that the exponent 'k' only applies to the first 'x' term in that expression (i.e. if k=3, then E(X^3) = integral of (x^3)f(x) dx. How is the reader supposed to know all that?

That was left up to me to hunt down... which is fine I guess, but I certainly wouldn't say this is "from the ground up". At the very least, link to some external content that provides the necessary definitions.


I attended a short series of lectures by Kalman years ago. As I recall it, he very strongly emphasised the virtue of working directly with observed data, avoiding the biases resulting from positing a model and trying to adapt it to the data. He was quite insistent on this point, citing Newton’s _Principia_ as a good example of this principle at work: Newton, he said, did not cast around for models that might explain Kepler’s laws. Instead, he _derived_ the inverse square law of gravitation from Kepler’s laws, largely using geometric arguments. He was an excellent speaker, and highly opinionated. And of course he did explain the idea behind the Kalman filter. However, since I never needed them myself in my work, I have long since forgotten the details.


That’s a cool story. I feel like this is so overlooked. I’ve worked with scientists and engineers who’ve actually deployed systems that are incredibly complicated and pretty well known. Those folks are usually skeptical of trendy algorithms and usually start with first principles. 90% of the time, a boring old linear Kalman with simple (or no) motion model will do the trick.


No motion model? If your model doesn’t have dynamics, it’s not really doing Kalman filtering


I've been asked by a friend recently to implement a Kalman filter for a side project. Among many others, I've read this website and still have no idea how to implement it. They all seem to be in the style of 'draw the rest of the owl'.

Can anyone point to anything that explains it in a manner a programmer would explain (eg array iteration instead of sigma notation) ?

As far as i understand, Kalman Filters seem to be a moving average of position, velocity, and (perhaps) acceleration, and use those 3 values to estimate the 'real value' as opposed to what the sensor is telling you.


As far as resources, I’m not sure. It really only takes a few lines of code if you have a linear algebra library like NumPy or similar. However you need to write down the math describing your dynamics first.

The Kalman filter is really just Bayes Theorem applied to a particular set of system dynamics. Really it’s two applications of Bayes rule: given my current state estimate, how do I update my estimate when I receive a measurement? Then, after receiving that measurement, how will the system dynamics impact my uncertainty before the next measurement? You iterate applying these two every time you receive a measurement.

Do you have a dynamics model? I.e., a set of equations that tell you how what ever you are trying to track will update in time? That’s the first thing you need and if you don’t have that, I could see you being quite confused about the implementation.

It might help you to either read about particle filters or just Bayes filters generally. The Kalman filter is a Bayes filter for a specific set of assumptions that leads to a lot of linear algebra. If you understand the Bayes filter, the Kalman filter will make more sense, and you can learn about the Bayes filter without all the matrices. The particle filter is another algorithm worth reading about. Like the Kalman filter, it is a Bayes filter but it doesn’t require the linear algebra.

Aside: I always get frustrated when I see people saying that you don’t need to know math to be a software developer. Maybe that’s true if you want to make web pages, but the more math you know, the better your ability to model problems and either solve them yourself or find solutions others have created. The scope of problems you can solve via programming becomes so much bigger the more math you know.


> As far as i understand, Kalman Filters seem to be a moving average of position, velocity, and (perhaps) acceleration, and use those 3 values to estimate the 'real value' as opposed to what the sensor is telling you.

Sort of. The actual variables (v, r, a in your case) vary depending on your use case. What matters is you make an estimate based on that state, then you take a measurement and there’s an iterative way of adjusting your next state based on those two. It’s actually pretty easy to code once you understand formulas.

Maybe this could help http://bilgin.esme.org/BitsAndBytes/KalmanFilterforDummies


Thanks for the link! That's the best i've read so far :)


> They all seem to be in the style of 'draw the rest of the owl'.

You need a linear discrete-time dynamical system model of your process. With the possible additions of external control action, and process noise. Your dynamical system has an internal state, described as a vector of values. Then you also need a linear model, of how the internal state translates to your observations, plus observation noise.

x – model state F – linear state transition model: x(t+1) = F x(t) Q – covariance matrix for process noise, so actually x(t+1) = F x(t) + N(0,Q) H – linear observation model R – covariance matrix for observation noise u, B – external control vector u, and B for how u acts on the model state. Optional.

N(0,Q) is a normal distribution with 0 mean and covariance Q.

For example, in the case of a moving physical object, the model state is

    [ x ]
    [ v ]
where x is location and v is velocity. (Earlier I used x for the entire model state. Here I use x just for the location.) Then with radar or whatever, you usually only observe the location, and current velocity is unobservable. So your observation model is

    [ 1 0 ]
    [ 0 0 ]
so that

    [ 1 0 ] [ x ] = [ x ]
    [ 0 0 ] [ v ]   [ 0 ].
This is the owl you need to have, before doing anything with a Kalman filter. You really need to come up with these things on your own, based on what kind of a process you have.

Then Kalman filter tells you, how to best estimate the real state vector at each time step, when both your state transition model, and your observation contains noise.

You are correct, many Kalman filter tutorials are a mess, where the explanation of the parts of the chosen process model, and parts of the Kalman filter, are blended together and the reader has a hard time telling them apart. And quite often they start with the simplest possible example, where the model state is 1-dimensional, i.e. just one number. And where the process model is a constant (no change): x(t+1) = x(t). So F = 1. This can lead even further confusion, what is the dynamical model and what are the parts of the Kalman filter, when the model is so small that it sort of disappears in the equations.


> x – model state F – linear state transition model: x(t+1) = F x(t) Q – covariance matrix for process noise, so actually x(t+1) = F x(t) + N(0,Q) H – linear observation model R – covariance matrix for observation noise u, B – external control vector u, and B for how u acts on the model state. Optional.

Better formatting for this part:

x – model state

F – linear state transition model: x(t+1) = F x(t)

Q – covariance matrix for process noise, so actually x(t+1) = F x(t) + N(0,Q)

H – linear observation model

R – covariance matrix for observation noise

u, B – external control vector u, and B for how u acts on the model state. Optional.


I can recommend a textbook called Probabilistic Robotics by Dieter Fox, Sebastian Thrun, and Wolfram Burgard. There's good coverage of Kalman and information filters in the first few chapters.


Hoping I can help you grok the problem a bit more. Typing on phone, so please excuse the lack of decent formatting.

A moving average is a good place to start thinking about it. Think about a case where you would use a moving average, and why. You are probably using it because you have some measurement (sensor) that you know is noisy, and so by averaging out the noise (taking the mean of a number of samples) you try to get a better estimate of the true value. If you know how noisy the sensor is, you can get an idea of how many samples you should average over to get a good measurement. You can also take the standard deviation and report both the mean and variance of your measurement over multiple samples if you wanted.

For purposes of this example moving forward -- we are going to estimate all of our sensed or inferred values as a gaussian, parameterized by mean and variance. It's a simple way to give a measurement with some uncertainty around it.

This can be a good start if you know nothing little about the system you are measuring, other then the sensor / measurement is noisy. However for many systems you may have multiple quantities you are interested in estimating, and you have some idea of how they relate to each other.

Take your example of a physical system with acceleration, velocity and position. Basic physics will tell you that if at time t, you are at position p, moving at velocity v, then at time t+dt, your position should be roughly p+(v*dt). Similarly, you can update your velocity estimate using your estimate of acceleration. If this is a system under your control, then you can also take things like a force you commanded to update your acceleration model. This is great, by using physics, without any measuring after time 0, we can just figure everything out forward in time forever, by simply using our process model! However, because your initial estimates had some uncertainty, what you will find is that, if you just keep doing this, the uncertainty grows larger and larger with each time step, and eventually become so large as to be useless.

Enter tha kalman filter. What the kalman filter does is tries to combine the information given by your sensors and combine it with your process model to give you a better estimate of the quantities you are interested in than you could get from either technique alone.

Every time step the filter will make an state estimate using the process model based on your previous state estimate, and then use your current sensor measurements to update that state estimate, both in terms of the mean and uncertainty. In the basic kalman filter you assume your process model is linear, all of your estimates are simple gaussians, and then decide how much you want to weigh your model vs. your sensors via a simple multiplying factor, "the kalman gain"

Sorry again I couldn't write this out as a program, and the likely horrible run-on sentences that come from typing on a phone, but I hope that a quick overview of what the technique is trying to do will help make it a little easier to fill in the owl!


Wow. Thanks. That was a wonderful explanation. I’ve been wondering if a Kalman filter could be used in a distributed system to estimate latencies between pairs of processes for the purpose of computing a near optimal dissemination tree.


Thanks a lot! That does explain some of the ideas behind it nicely. Sometimes when looking at equations it's easy to miss the forest for the trees so to speak, so this helps.


Here is another tutorial on Kalman Filters, step-by-step video playlist -- https://www.youtube.com/watch?v=CaCcOwJPytQ&list=PLX2gX-ftPV...

Once you get the intuition, Kalman filters are really interesting. As are particle filters -- those are fun to work with and visualize.


I have a copy of this book and have used it in anger quite successfully. There’s a couple of spots in it where it’s a little bit awkward to follow even after re-reading multiple times, but it’s quite good overall!


> used it in anger quite successfully

Care to elaborate?


Thec phrase "used Y in anger" means "used seriously" or "used for its intended purpose". The origin phrase/trope-namer is "never fired its guns in anger" applied to a naval warship in peacetime to describe the case where it might have fired its guns in practice, but not in combat.

Entering "never fired its guns in anger" in your favorite search engine should give several examples.

Here's one in an article from last week. https://owlcation.com/humanities/What-If-The-USS-Arizona-Nev... - "The USS Arizona never fired her guns in anger."


It's a British-ism for "used in actual practice, rather than academically/theoretically"

https://dictionary.cambridge.org/dictionary/english/in-anger


Ah thank you -- as an American I've never come across that expression my whole life!


I wasn't aware of this as a British/American thing. As a programmer, it is just a phrase that I picked up because other programmers used it and I liked it.

And yes. I'm an American.


Yeah I'm pretty sure I picked it up on HN :) Am Canadian.


Possibly used to dash bugs into oblivion, or held up at arms length at a protest rally to condemn book banning? My guesses in case OP doesn't come back. I'm sure if we all make enough guesses, then take the average, we'll come up with the correct answer.


Do not forget to weight the guesses by their uncertainty, considering the correlation between the guesses.


The sibling posts figured it out. I have gone from picking up this book to using the material in it in production, using the contents of the book as my primary reference to get from a vague understanding to an implementation I don't think much about because it Just Works.


Hah, yes, I see everyone else has filled you in already. I have several Kalman Filters modelling both linear (conventional KF) and non-linear (Extended) systems in production that were derived primarily from the material in this book. I did have to do some application-specific research for a couple of things, for example 3D attitude estimation is a deep dark hole with many different formulations with different pros and cons, but the book gave me a very solid foundation for being able to take attitude estimation papers from NASA and others and work through the approach I wanted to use.


Another good article on the subject:

Is the Kalman filter a low-pass filter? Sometimes!

https://jbconsulting.substack.com/p/is-the-kalman-filter-jus...

PS: I've used it to remove jitter in virtual camera movement while cropping video around faces in real-time, streaming detected face locations to a Kalman filter worker and get back stream of stable camera locations.


Kalman filters are great when there is limited compute available, but I’m a bigger fan of the newer and more advanced models such as “particle filters”.

The advantage of particle filters is that they can handle complex scenarios with nonlinear physics and non-Gaussian distributions.

For example, a vehicle GPS unit could use street maps to eliminate impossible locations based on the recent history of turns. Gaussian filters can’t do that and just result in blurry blobs that cover several blocks.


Do you have any recommended resources for learning about particle filters?

Also l, what other types of newer models or more advanced models other than the particle filter would you recommend for offline processing?


Look up the positioning and tracking literature; e.g., Probabilistic Robotics by Thrun et al. The method is also called "sequential Monte Carlo". I found this article useful in learning the method:

Bayesian Filtering for Location Estimation (https://rse-lab.cs.washington.edu/postscripts/bayes-filter-p...)

It might help to first review importance sampling. https://en.wikipedia.org/wiki/Importance_sampling

A comprehensive survey (for 2003) is Zhe Chen's "Bayesian Filtering: From Kalman Filters to Particle Filters, and Beyond". A contemporary survey would surely incorporate machine learning.

https://people.bordeaux.inria.fr/pierre.delmoral/chen_bayesi...

For extensions see "An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo"

https://people.eecs.berkeley.edu/~jordan/sail/readings/cappe...


Unfortunately I don't have anything at hand better than what Google would provide.


I was at an adtech company in 2007 where the CEO and research teams became obsessed with Kalman filters for the purposes of optimizing ad campaigns on google and msn networks. It kind of worked from what I remember but I can’t find the patent filings any longer and Zeta & Walmart bought the tech.


Almost every explainer I have ever seen on Kalman filters starts with something like this:

"Let's begin with an intuitive example: think of a thermostat as it adjusts to the temperature. Got that example? Good! Next here is some advanced linear algebra to help make it more intuitive"

Has anyone comes across an explanation of Kalman filters that doesn't immediately dive into the math?


It is a weighed average of the propagated uncertainty of a state and the uncertainty of a measurement of that state. If you walk towards a wall with your eyes closed as time on you will be less and less certain where you are (prediction steps) then when your fingers touch the wall you will suddenly be much more certain where you are (measurement update). The linear algebra is just how you calculate the weights for a linear dynamical system with gaussian noise.


I have no answer for your question, but the "draw the rest of the owl" meme is so pervasive in maths it's not even a cliche.



Depends what level you are comfortable with. Given that the kalman filter makes certain statistical assumptions (normally distributed) you need to be familiar with at least statistics, and know what mean/covariance matrices are. If so you should be able to follow the derivation in https://sites.ualberta.ca/~dwiens/stat679/meinhold&singpurwa... up until section 4. (Beyond section 4 it's really just a matter of grinding out the math, and while necessary for implementing it in a practical fashion doesn't really give you any additional understanding).


For me the intuition came from Sebastian Thrun in his Ai for robotics online course. Running through everything in 1D definitely helped get the concepts before extending to multidimensional problems.

This looks to be the course playlist: https://youtube.com/playlist?list=PLAwxTw4SYaPkCSYXw6-a_aAoX...

The kalman filter stuff starts at video "Tracking Intro - Artificial Intelligence for Robotics"

Also the free course appears to be available here, although a login is required to access:

https://www.udacity.com/course/intro-to-artificial-intellige...


For my money -- as a machine learning engineer -- the simplest explanation is Christopher Bishop's:

https://www.youtube.com/watch?v=QJSEQeH40hM

http://mlss.tuebingen.mpg.de/2013/2013/bishop_slides.pdf#pag...


Unfortunately there is no halfway with maths. Math is the language of logic, the purpose of mathematical formalism is to allow easy communication, not just naval gazing.


I strongly disagree. While rigorous mathematical proof (logic) is a big part of modern mathematical research output, it is an impenetrable barrier to learning for most students, including maths students. I doubt any mathematician ever learned their subject by mainly reading proofs. And I say that as someone who secretly enjoys reading and working through proofs of theorems. Understanding mathematics demands developing strong intuition of ideas before attempting their logical justification. That rigour and intuition in mathematics education are largely contradictory goals is widely acknowledged by maths teachers; 3BlueBrown, for example, has often mentioned this duality in his videos. It was only in latter part of the twentieth century that the idea of abandoning intuition in teaching mathematics was seriously attempted, by the Bourbakists I believe, and was embraced for a while, before fading away.


> I doubt any mathematician ever learned their subject by mainly reading proofs. And I say that as someone who secretly enjoys reading and working through proofs of theorems. Understanding mathematics demands developing strong intuition of ideas before attempting their logical justification.

This is half right: you don't learn math by reading proofs, you learn it by writing proofs. Developing strong intuition without proofs is possible (albeit difficult) for applied topics, but not viable at all for pure math.


That's not really true, in several senses. Mathematics is not the language of logic. I'm not even sure what that was supposed to mean. Logic is the language of logic. Also, engineers and physicists can have rather intuitive understandings of various filters with almost no mathematical calculations being done, on paper or their head.


Hoped someone here would come out and say this.

It's only supposed to be somewhat intuitive for those who already possess some intermediate level college math background.

And no, rejecting the notion that not every advanced topic is supposed to be accurately ELI5-ed with a cringy Redditesque tone, means neither there's any gatekeeping taking place nor there's lack of understanding from the potential explainer's side.


To characterize a mode of communication the learner is struggling to understand as "easy communication" is badly missing the point.


It is only easy compared to the alternative.

But I've seen both it and the alternative. It is literally easier to learn the math, and then use that to facilitate communication, than to attempt to directly communicate key concepts while avoiding the math.


I think it's a bit like trying to convey to someone in English the beauty of a symphony. Better to take them to see the orchestra play it.


This is not exactly what you are asking for but a very nice explanation and it tries to work without math as long as possible https://www.bzarg.com/p/how-a-kalman-filter-works-in-picture...


Yeah, sounds about right. Real world problems are rarely 1D, constant (or no) velocity, etc. there are good examples out there, the one by rlabbe on GitHub is pretty dang good. It still didn’t quite get me as far as I needed to go, but it got me a lot further than I was when I started.


I wasn't familiar with Kalman, and I tried to understand it by reading the primers on this website and it wasn't clicking for me. Last night I was thinking about the sample graphs, and it occurred to me that it just looked like an Exponential Moving Average, which is a much simpler concept to grasp.

I just searched Google for a comparison and found that EMA is as good as Kalman for "random walk plus noise" on a Stats Stack Exchange, and that a 2003 paper from Brown University (Joseph J. LaViola) showed that a Double Exponential Smoothing algorithm is of equal quality to Kalman (and extended Kalman) but 135 times faster, and a simpler approach.

I find Double Exponential Smoothing to be much easier to understand than Kalman, and assuming the LaViola paper is correct, I'm not going to put additional effort into understanding Kalman.


As someone who has never heard of Kalman filters before, I found this page[0] useful to understand and build a mental model for it.

[0]: https://www.mathworks.com/help/vision/ug/using-kalman-filter...


Does anyone remember a tutorial website posted here that taught PID control loops with interactive JavaScript animations? It stepped through the reasoning for and effect of each element of the PID. I have been trying to dig it up for a while but can't find it.




I'm looking for something similar for PID controllers, or control systems in general. Specifically for a teenager. Most sources I can find are either college textbooks or overly simple summaries for FIRST Lego League.


George Gillard's PID document is circulated widely for teaching high school students about PID for the VEX Robotics Competition: https://smithcsrobot.weebly.com/uploads/6/0/9/5/60954939/pid... I'd also add a +1 to the suggestion of the Controls Engineering in FRC book, but the math in that book is significantly more complex than Gillard's guide.


Elliott Williams (editor of Hackaday) held a very interesting talk this year at CCCamp that in very simple terms explained the components of a PID loop. Highly recommended! https://media.ccc.de/v/camp2023-57111-pid_loops_control_all_...


https://file.tavsys.net/control/controls-engineering-in-frc....

This is pretty complex but its been super helpful for me in terms of understanding PID controllers.


That is pretty much exactly what I was looking for. Maybe a touch more complicated than what my 11 year old is ready for; but something he can stretch a bit with. Thank you!


o/t but … I was attempting to understand how one might apply controls systems to a teenager until one of my internal processing threads got around to the correct reading of your comment.


Humans are just complicated plants!

(A plant is a generic term used in controls for “the system you are controlling”)


In my graduation days I've used this[1] and found it very useful.

[1] https://www.researchgate.net/profile/Mohamed-Mourad-Lafifi/p...


I found this very insightful for a hobby project I worked on: https://controlguru.com/table-of-contents/



Terry Davis has a nice video of him demoing a physics program he wrote, where he is coding a PID for a rocket.


PSA: If/when using these tools, know the difference between Kalman filter and Kalman smoother. And be especially careful when using to make forecasts, forward estimates, predictions, etc.

Beyond the scope of this comment and post, but mind two-sided filters (as the Kalman smoother/HP filter uses) as you could be incorporating future/unknown data into your model.


I've actually been looking for a good description of the Kalman smoother, to process recorded data without phase shift, but I haven't found any. I have text books on the subject from school, but I can't really translate it to real code in this case. I understand using and updating the filter sequentially, but how do I use information from both previous and future data for each point?


Max Welling's Kalman Filter tutorial [1] derives the smoother equations using pretty clear and easy to follow notation (and is a great resource generally).

Briefly: you first run the filter equations "forwards", processing each datapoint sequentially from start to end. Then you run the smoother "backwards" in time on the same data going from end to start.

1. http://www.stat.columbia.edu/~liam/teaching/neurostat-spr12/...


Thank you, I'll have a look at it when I can find the time and frame of mind.


A Kalman filter will give you the "best guess" for some state (x) at the current timestep (k). This estimate often has some lag in it, likely because you have some incomplete information that you couldn't model. Sometimes we care about the previous states (e.g., x_k-1). But if we just save these states and refer to them, we're not getting the most out of our data.

The Kalman Smoother can be used to go back and update these past values with all the samples up to the current time. To update your previous measurements, you need to save the state of your filter at every timestep (x_k) and its associated covariance matrix (P_k). You can then apply Kalman Smoothing to reprocess previous data and update it with all current information. This will often remove the phase delay that you would otherwise observe in your estimate.


This is my question as well, I've read multiple sources that imply the Kalman smoother would work for my problem, but not exactly _how_ in a way that I could make sense of programmatically.


None of the Kalman filter tutorials I have seen (including this one) gives a good explanation for the equations of the measurement update - including the covariance update. If anyone is curious, here is an explanation: https://postbits.de/kalman-measurement-update.html

Tl;dr: It's mathmatically equivalent to the Bayes filter equations (no big surprise, KF is a BF), and those equations are easier to understand intuitively, but less efficient to compute.


Your TLDR is slightly wrong. The Kalman filter is exactly Bayes theorem for linear dynamics and additive Gaussian noise. Bayes theorem is more general, but these assumptions yield a simple set of linear equations that can be solved very quickly.

In general, using Bayes theorem for filtering is often called the Bayes filter. Outside of a limited set of circumstances, it is not possible to implement. The Kalman filter is one set of circumstances because linear transforms of Gaussian distributions remain Gaussian, and we can specify a Gaussian process just by its mean and covariance. Another set of circumstances is when you have a discrete system with a small enough number of states.

In general however, the Bayes filter results in an infinite dimensional dynamical system since it yields a typically continuous function representing the distribution of your state given your measurements. The Kalman filter, particle filter, and other methods approximate this problem via a more tractable finite dimensional representation.


You are correct, but I don't get where I am wrong. Is it just that I didn't write "equivalent to the Bayes filter equations for linear dynamics and additive Gaussian noise", or do I miss something else?


Student Dave's "Quailman" Kalman youtube tutorials are incredible.

https://www.youtube.com/watch?v=FkCT_LV9Syk&list=PLzCl0zqNaI...


Hello mojomark, My name is Alex Becker, and I am the author of the "Kalman Filter from the Ground Up book." First, I would like to thank you for your feedback. As an author, it is important for me to receive feedback from the readers. In the book's second edition, I made many changes based on the feedback from the book readers. I've elaborated explanations of the topics some readers failed to understand and added several appendixes. You are right, book authors don't get paid to respond. However, the KalmanFilter.Net is a project, not just a book. I've worked on the project for 6 years, and you will see more in the future. The goal of this project is to help people to understand and implement the Kalman Filter. The website includes a contact page: https://www.kalmanfilter.net/contact.aspx, and the book purchasers receive my email. I do my best to answer emails and help my readers if they fail to understand specific points. Regarding the issues that you and other responders raised. Scientists use terminology that looks weird to non-scientists. But I can't write an engineering book without addressing scientific terms. I agree that "Random Variable" sounds unintuitive, but this is the term. Almost any physical quantity is a random variable. For example, the temperature of your body is a random variable. You can measure the temperature using a thermometer, but it doesn't mean that its reading is your actual temperature. Your temperature is a random variable that can be described by two parameters – the thermometer reading and the thermometer precision. Understanding the "Random Variables" concept is essential for understanding the Kalman Filter. If you make multiple measurements, the temperature is described by mean and variance. You can read again the https://www.kalmanfilter.net/background.html page, where this concept is explained. Mathematicians call the mean and the variance – statistical moments. "Moments" also sounds weird. I mention this term in the introduction, but I don't use it again in the book. If the term "moments" causes confusion, I can consider removing it from the introduction. Regarding the Linear Algebra. You don't need any prior Linear Algebra knowledge to understand the concept of the Kalman Filter and implement the Kalman Filter in one dimension. However, if you want to tackle the multi-dimensional Kalman Filter, you need some basic Linear Algebra knowledge, at least matrix multiplication. Thanks again for your post, Alex Becker.


Hello mojomark, My name is Alex Becker, and I am the author of the "Kalman Filter from the Ground Up book." First, I would like to thank you for your feedback. As an author, it is important for me to receive feedback from the readers. In the book's second edition, I made many changes based on the feedback from the book readers. I've elaborated explanations of the topics some readers failed to understand and added several appendixes. You are right, book authors don't get paid to respond. However, the KalmanFilter.Net is a project, not just a book. I've worked on the project for 6 years, and you will see more in the future. The goal of this project is to help people to understand and implement the Kalman Filter. The website includes a contact page: https://www.kalmanfilter.net/contact.aspx, and the book purchasers receive my email. I do my best to answer emails and help my readers if they fail to understand specific points. Regarding the issues that you and other responders raised. Scientists use terminology that looks weird to non-scientists. But I can't write an engineering book without addressing scientific terms. I agree that "Random Variable" sounds unintuitive, but this is the term. Almost any physical quantity is a random variable. For example, the temperature of your body is a random variable. You can measure the temperature using a thermometer, but it doesn't mean that its reading is your actual temperature. Your temperature is a random variable that can be described by two parameters – the thermometer reading and the thermometer precision. Understanding the "Random Variables" concept is essential for understanding the Kalman Filter. If you make multiple measurements, the temperature is described by mean and variance. You can read again the https://www.kalmanfilter.net/background.html page, where this concept is explained. Mathematicians call the mean and the variance – statistical moments. "Moments" also sounds weird. I mention this term in the introduction, but I don't use it again in the book. If the term "moments" causes confusion, I can consider removing it from the introduction. Regarding the Linear Algebra. You don't need any prior Linear Algebra knowledge to understand the concept of the Kalman Filter and implement the Kalman Filter in one dimension. However, if you want to tackle the multi-dimensional Kalman Filter, you need some basic Linear Algebra knowledge, at least matrix multiplication. Thanks again for your post, Alex Becker.


There is no explanation about how Kalman filters work in the article, it is just an advertisement for a book. And the book preview doen't even cover how to apply Kalman filters to conventional state space models.


You can navigate on the left side of the page. Seems fairly well put together from a quick glance. https://www.kalmanfilter.net/kalman1d.html


Thanks for pointing it out, it was not obvious on my phone.

I think that this website needs to improve the experience of mobile users. First, the book index on the landing page didn't contain hyperlinks to the free book chapters and these chapters don't appear on the preview, so users may assume it is a shady marketing tactic (trial, requires retweet, subscription to book platform, ...). Second, towards the end of the page, there could be a one-sentence-description of the next section with an hyperlink. Right now there is only a "next" button that is easy to miss, and it can be confused with generic "next post" buttons used by many blog platforms that users learn to ignore because they tend to be useless.


This reminds me of the great "the missile knows" clip: https://www.youtube.com/watch?v=bZe5J8SVCYQ



> The Kalman Filter algorithm is a powerful tool for estimating and predicting system states in the presence of uncertainty...

How is this different than any of the ML models?


Kalman Filters were developed outside the context of “Machine Learning” and found practical applications much earlier. If you define ML as an algorithm that is “trained by input data”, than any statistical model would be ML. So is the ambiguity around the terms ML and AI in general.

Where a KF is really going to kick the pants of a multi-layer perception/neural network is how computationally efficient it is. A KF only takes a couple of matrices of size N^2, where N is the number of variables you’re trying to predict. Compare this to a NN with hundreds/thousands of nodes. Also, the KF is “online learning” in that it “is trained as you go” rather than some other ML models that require upfront training, a KF is very useful for live update-and-predict use cases. And, again, it’s extremely computationally efficient, and can run easily on embedded systems. The book ad, here, suggests tracking: so tracking an airplane with an air traffic control radar would be an effective use for a KF. (Where NNets have found any other uses I’m sure you’re aware of.)

Another huge benefit of a KF is that, unlike NNets, KFs are “explainable”, and in fact extremely well understood by many professionals. This means that a KF can be better tuned to suit a purpose with less fear of unexpected results that may be more common in other ML models. Like, KF(S, x) will always return an explainable new state, where NN(x) may result in a surprise state and no amount of analysis can reveal why (and require training a new model, the “retrain and pray” solution).

There’s a couple of differences for you.


Anything with an input and an output is replaceable by NNs nowadays.


Because of "resume driven engineering". Even simple problems that you can solve with PID controllers or Kalman Filters but everyone wants to throw ML at it instead, so they can put "ML experience" on their LinkedIn because that's what's hot right now as recruiters probably never heard of Kalman filters or PID controllers.

A lot of technical decisions aren't based on "what's the quickest, cheapest and easiest solution to the problem?" but "what solution is most likely to get me hired at a pay bump when I jump ship?"


The thing is, to train a NN to estimate your output from your input, you need input-output pairs. KFs are a way of measuring that output in the first place. So they are not even the same class of solutions.


Unless you need guarantees about the output


ML Models are usually used in the context of prediction, Y = F(X,θ), where X = inputs, θ = weights, F = model. There's typically no explicit feedback look (only historical data), no time-variation (you can add using lags however), and no existing model structure in most cases (most are black boxes, some like linear regression have a linear model which are fairly loose).

Kalman Filters are used in the context of a very specific model-type for dynamic systems (a state-space model, see below) to update states (xₖ) using feedback data from sensors (yₖ). These state-space models can either be derived by fitting data, or they can be derived from first principles through physics equations.

  xₖ₊₁ = f(xₖ) + g(uₖ)

  yₖ = h(xₖ)
The feedback loop is modeled explicitly, including any control actions (uₖ) that you took to affect the environment.

For instance, when driving a car, examples of states (x) are position/velocity/acceleration (which might not be directly measured with a sensor! But can be backed out from a mathematical model from quantities that are measured), sensor measurements (y) might be speedometer, accelerometer readings, and control actions (u) might be throttle position, brake pressure, steering angle. The Kalman filter has a model relating all this in time, and based on that model and sensor readings, it reconstructs/infers the likeliest states in the presence of even noisy measurements. This is why Kalman filters are known as "state estimation" algorithms.

ML models typically do not do this -- they only predict. Kalman filters predict and update.


If you can make certain assumptions about the system (mainly that sources of noise follow gaussian distributions and are independent), then the Kalman filter gives the best possible estimate of the system state. And it can be computed cheaply, like on the Apollo guidance computer.

You basically need to know some kind of a model for the system to run KF. Whereas ML is all about working out the model automatically.

As for similarities, KF is a really efficient implementation of Bayesian inference. I think that any ML model that isn't fundamentally using Bayesian inference, is fundamentally flawed.


It is chalk and cheese.

ML requires training, significant amounts of compute power, and large datasets.

The Apollo program used Kalman filters with limited compute resources.

Kalman filters are for predicting system states in the presence of uncertainty; ML is really searching for and matching patterns, under uncertainty not in its training set, it tends to to find the glitch in the matrix.


You could probably call a Kalman filter an ML model if you wish. One major difference is that you usually have to prescribe the prediction model. It doesn't learn to predict the next value like an LLM, instead it learns to find an optimal weighted sum of its prediction and the measured values. So it's a prediction-correction loop that requires an interpretable model, that has the side effect of allowing you to estimate system states. This is quite different than having an arbitrary hidden state with learned structure.

In other words it performs well for certain applications specifically because it allows you to bring in domain knowledge in the form of the process model and known uncertainties. Whereas deep learning models try to generalize the model and learn implicit structure from data.


Kalman filters are the *optimal* solution to the problem of controlling a linear system driven by additive white Gaussian noise and a quadratic cost function.

The model is updated sequentially (online learning).

https://en.wikipedia.org/wiki/Linear%E2%80%93quadratic%E2%80...


It's Bayesian, and you specify what the model is. If you don't have a lot of data, and the model is a good match for reality that can be a good thing. Otherwise if you have data, ML is going to be more accurate because it can make a better model than your hand crafted model that you used in your kalman filter.


An ML model in theory can model anything with an input and an output. That's almost anything in the universe. You can replace a huge portion of engineering with it in theory.

Actually in theory you can replace everything with it. So what's the point of asking this question here? Ask it for everything.


Ask the question: "can we replace humans with ML?" That question is closest to being answered right now then ever before. We are on the cusp of an impending future where that answer could be: "yes."


Honestly, modern ML models are extraordinarily messy. They are inefficient, unreliable (when the goal is perfect reliability), often misused, mostly unexplainable, and very much a "throw the spaghetti at the wall to see what sticks" type of problem solving.

Kalman filters, and other similar digital filtering and prediction algorithms, are like scalpels compared to the broadsword of NNs and such. There are plenty of things that you can't or shouldn't use a kalman filter for, but for the tasks that it is suited for, you cannot do better with another solution. ML is mostly hand wavy bullshit, and DSP algorithms are like... doing real math, real engineering.


tomato tomato


Often in textbooks and on here on HN Kalman Filters are praised, but folks from the industry often ridicule Kalman Filters.

All People I have interacted with have never used/seen Kalman Filters outside Academica. Do anyone here have industry experience where Kalman Filters is actually used in prod?


They are widely used in narrow circles :)

Two of (arguably the best) open source RC aircraft flight controllers (ArduPilot and PX4) are using extended Kalman filters in their state estimators (essentially sensor fusion that provides attitude/position estimate):

https://github.com/ArduPilot/ardupilot/tree/master/libraries...

https://github.com/PX4/PX4-Autopilot/blob/main/src/modules/e...

I'm not that familiar with cleanflight/betaflight/inav scene to know what the FPV racer flight controllers use.


Yes. Worked in a robotics startup and a tier 1 automotive supplier. In both, variants of the Kalman filter were used for some tasks.

They are a simple tool, and simple often works. You just need to know the limitions and know which variant (if any) is best applicable to the problem at hand.


Done plenty of automotive/robotics by this point and yep. A KF is usually my first (and often only) step between straight up not filtering and finding a controls expert to analyze the system properly. It's cheap, easy, and works well enough that I don't have to think too hard about it.


I recently saw "experience with Kalman filters" listed in a job ad from a Robotics company.


edit: Extensively!! In control theory it seems :)

It is just that I keep seeing KF be mentioned on Twitter and blogs etc. for using it in finance.


I think this is really wild and interesting. In my experience, they (more like specializations of the Kalman filter) are used almost universally in radar tracking. I'm curious where outside of academia you see ridicule for Kalman Filters?


KFs, and more generally state observer systems, are widely used in the field of control systems.

I know someone with a PhD in control systems, works for an eVTOL startup, and their title is something like "estimation expert" or "estimation specialist". That is, 100% of their job is designing relatively complex KF-like algorithms that are used to estimate the aircraft state (position, pose, velocity, wind conditions, etc) in real time based on a bunch of sensors.


I built a variant to very successfully estimate state of charge for a large battery pack in a production hybrid-electric vehicle.

Often some tweaks from the standard formula are necessary to account for real-world non-linearities, and some creative design work is required to define states in such a way that the Gaussian noise assumption can hold well enough.


>> Do anyone here have industry experience where Kalman Filters is actually used in prod?

Yes, ship/vessel navigation software heavily use Kalman Filters. Especially on the inputs received from the various position reference sensors.


We develop navigation instrumentation for directional drilling (oil/gas, geothermal, etc.). In these applications you have to estimate the path of the drilling string based on directional measurements and the length of drill pipe you've put into the ground. To take good measurements you have to stop so that rotation and vibration don't corrupt your measurement.

We use a type of Kalman Filter to estimate direction and instantaneous dynamics of the drill string, this way the drilling operation no longer needs to stop to get a directional measurement.


GPS chipsets use Kalman Filters, its standard practice. I think you have an incorrect perception here.


Yes. I was director of Research and Innovation in a company that produces equipment for people to do VTS and VTMIS, which is the maritime equivalent of Air Traffic Control.

We track targets (mostly ships, mostly maritime, but not always) and we use Kalman Filters (and variants) extensively.

https://en.wikipedia.org/wiki/Kalman_filter#Nonlinear_filter...

Contrary to your experience, there was a time when we were ridiculed for not using Kalman Filters, but in the limited niche we inhabited then, our internally developed algorithms out-performed Kalman.

But mostly, these days, yes, we use Kalman Filters of various types.


>> our internally developed algorithms out-performed Kalman

Could you tell me more about this? What other algorithms are used for position tracking and motion estimation. I have seen various ML models... RNN/DNN used. I'm guessing with VTMIS you are doing time-series predictions?


Regrets, but no. Developed internally and commercial secrets.

8-(


Astrodynamics and satellite navigation use kalman filters extensively for orbit determination


> folks from the industry often ridicule Kalman Filters.

[citation needed]

Siemens (SIMATIC PCS 7)

https://support.industry.siemens.com/cs/document/109748837/s...

Bosch Sensor Fusion

https://www.bosch-sensortec.com/products/smart-sensors/bha26...

Kalman filters are not state-of-the-art, but they are fairly simple to implement and are still used.


Apple uses EKF (Extended Kalman Filters) for the iPhone's GNSS system:

https://www.patentlyapple.com/2020/11/apple-reveals-adding-t...

I'm 100% sure there's some form of EKF on some loop on AirTags too.


In what industry are Kalman filters ridiculed?


at least in HFT as I am aware, however I see that KF actually IS extensively used elsewhere.


KF has been standard stuff in aerospace guidance navigation and control for quite some time. Stock markets are probably too nonlinear to use linear control theory on.


Positional tracking on most VR headsets has the fast, low latency updates from the drifty IMU and slower ground truth updates from vision fused together with a Kalman filter. Consumer drones, military drones, and weapons guidance systems use it extensively too.


I am still confused by this too:

Lots of typical "industry applications" given for Kalman Filters involve moving objects, like air planes, rockets, and so on.

But all these objects generally rotate in some way, rotation is non-linear, and Kalman Filters cannot deal well with nonlinearities. Thus Kalman Filters struggle with rotation, and I haven't really found any good resources that handle this.

Do all these applications just skip over the fact that real-world objects can have a rotation/spin, or do they all use more sophisticated filters as suggested in e.g. [1]?

[1] https://math.stackexchange.com/questions/2621677/extended-ka...


The "Integrating Generic Sensor Fusion Algorithms with Sound State Representations through Encapsulation of Manifolds" paper linked from your [1] is indeed the current state of the art. It shows how to use the EKF or UKF for rotation.

The naive Kalman filter is only suited for linear problems; extended and unscented Kalman filters (EKF/UKF) are necessary for anything non-linear (including rotation). In any case, they build on the basic KF, so you have to understand that first.


That paper [1] is from 2010. What did "industry" use before that for pysically moving objects?

If this is the current state of the art, are there generally-available/open-source libraries existing that implement this and practitioners use for this?

The only one I could find is https://github.com/kartikmohta/manifold_cdkf, which currently has 8 Github stars.

I also found an approach mentioned in [2] that is to just treat a single rotation angle as linear, and then wrap it around at 180 degrees in between state updates with additional conditional logic. Is this what people did in practice before? I cannot find substantial info on this.

How did people use KF on physical objects before 2010?

[2]: https://old.reddit.com/r/ControlTheory/comments/d2yrjq/kalma...


> If this is the current state of the art, are there generally-available/open-source libraries existing that implement this and practitioners use for this?

The authors of [1] do discuss and link to their own library in section 5.

However, in my experience, most people implement the math themselves rather than use any libraries (beyond e.g. Eigen).

> How did people use KF on physical objects before 2010?

The Multiplicative EKF (MEKF) was used since 1969 according to [3]. It's a hacky approximation of [1]. [1] is really just a generalization/unification of lots of application-specific hacks that were used before, including the MEKF.

[3]: https://ntrs.nasa.gov/api/citations/20040037784/downloads/20...


I’d suggest looking at the related work in that paper as they list prior methods.


I'm asking what people actually use in production. It is part of a thread that started with

> All People I have interacted with have never seen Kalman Filters outside Academica. Do anyone here have industry experience where Kalman Filters is actually used in prod?

Looking at prior academic work in an academic paper won't help answer this question.


This is where the extended Kalman filter comes in (basically just a KF with linearization). I imagine you could also do a KF to include manifold constraints (e.g., rotations) more explicitly but I have never had the need.

If you model your rotation as a quaternion, there is a way to linearize the update process of the quaternion and use a KF. This can work very well and is what most quadrotors I’ve worked with do. However, care is needed to ensure the result gives a valid updated rotation and that you implement the disgustingly messy equations correctly.


I used them extensively when I worked on Radar contact tracking. They are an effective tool when used with a good understanding of the limitations.


I love ELI5 articles. I work in fields that use Kalman Filters. I have an MS in EE. I’ve been in industry for more than ten years and tried my hand with more than one KF. I know the terms and mostly grok the math. I’m a Senior trending to Principal at a faang, working in robotics. I STILL STRUGGLE and have never deployed a KF to production.

Side story - a friend of mine tried to recruit me to a well known robotics startup in the commercial autonomous air vehicle space (DM me if you’re curious who). The technical portion was basically, “implement a state estimator for radar target tracking, In one day, here’s some data”. Meanwhile, I have a day job. Lol, I told ‘em, “I’m not a good fit if this is what you’re looking for”. Oh yeah, my recruiter/friend also said they were looking for rock stars. I feel like I really dodged a bullet there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: