I should have added that there's a very good reason for making these assumptions: they let you represent the system with extremely little data! In general for a system with a continuous state space (like position) then to represent a general Bayesian prior you need to store one data point for each of the infinitely many possible states. Obviously this is impossible so you might instead discretize the space into a grid and store the probability that it is at each that point in the grid. (Think of an image where each pixel's brightness tells you how likely you are to be there - you can store any shape of probability distributions this way but it's of limited resolution). Same for the likelihood and posterior distribution - this uses a ton of memory and is slow and limits your resolution.
Instead, if you assume that the priors are Gaussian, then you can store that information as just two numbers: the mean and the variance (or a matrix of numbers for higher dimensional state spaces). And Guassian's have the remarkable property that if you start with Gaussians and perform a Bayesian update, then you end up with a Gaussian that can also be represented with this same amount of data. Assuming that the system is linear means that you can also represent it's update from one time to the next as a matrix. Moreover, both of these approximations are pretty good for a large class of real-world problems.
There are other sophisticated ways to get around the downsides of the general approach. Namely, there's particle filters which discretize your distributions by a sample of points, but unlike the grid discretization above, the points aren't at specific fixed locations. They're allowed to move around and are constantly being resampled from the distribution you have. This allows lots of the points to get very close together and accurately represent the most interesting (most likely) parts of the state space without wasting tons of memory on extremely unlikely points in the state space. It's very clever and fun to watch in practice!