There's no "remembering" the last value or Nothing; it immediately short-circuits, and not because Haskell is lazy, but because that's how monads are defined.
If you don't understand why "monads are programmable semi-colons", take that second look. See also the list monad; the key is not the "storing in a list" but the way it does "nondeterministic computations".
No, I'm talking about the definition of Monad themselves. They don't work the way most people think they do, especially most of the critics. Monads have that function application step, and at every point that function can be applied one, zero, or many times. There is no "remembering" of a Nothing; the Monad definition is such that it calls the next application function zero times and immediately returns Nothing, which short-circuits the remainder of the monadic computation. That is, it doesn't "keep running" anyhow the way you might expect an imperative language would (or might, anyhow).
This is why I also point out the list monad; fully understanding that is necessary to be sure you understand monads. I have seen numerous "monad" implementations in $YOUR_FAVORITE_LANGUAGE that can't do the list monad correctly because they can do 0 and 1 applications of a function, but can't correctly do arbitrarily many, because the people implementing the Monad interface didn't actually understand the interface. (In fact when I see such an implementation the first thing I look for now is the list monad, and so far of the four or five I've seen only one has gotten it right.)
If you don't understand why "monads are programmable semi-colons", take that second look. See also the list monad; the key is not the "storing in a list" but the way it does "nondeterministic computations".