A simpler explanation of "a monad is just a monoid in the category of endofunctors":
Endofunctors is a greek-derived word for maps that produce the same type of data that they consume, eg (foo : bar->bar).
Monoid is another greek-derived word for things that act like strings: if you have two monoids, you can compose them and get one monoid; furthermore this composition is equivalent to concatenation: if you have one monoid, there are only two choices for composition with another monoid, pre-composition and post-composition (just like a string can be prefixed or suffixed by another string to yield a string).
From that simple observation all the "monad laws", and their implications, arise.
tl;dr: Monads are things that snap together like 1x1 Lego bricks: you can snap on the top, or on the bottom, and the result is something that's still snappable either on the top or on the bottom.
[Edit: just realised I have glossed over composing maps. It should be evident that composing (foo : bar->bar) with (bletch : bar->bar) will yield some (quux : bar->bar), but there is only convention to suggest which way unspecified composition defaults to.
Most mathematicians and some programmers write (g°f)(x) where (g°f)(x) == g(f(x))
Some algebraicists and most programmers write (f;g)(x) where (f;g)(x) == g(f(x)) ]
Endofunctors is a greek-derived word for maps that produce the same type of data that they consume, eg (foo : bar->bar).
Monoid is another greek-derived word for things that act like strings: if you have two monoids, you can compose them and get one monoid; furthermore this composition is equivalent to concatenation: if you have one monoid, there are only two choices for composition with another monoid, pre-composition and post-composition (just like a string can be prefixed or suffixed by another string to yield a string).
From that simple observation all the "monad laws", and their implications, arise.
tl;dr: Monads are things that snap together like 1x1 Lego bricks: you can snap on the top, or on the bottom, and the result is something that's still snappable either on the top or on the bottom.
[Edit: just realised I have glossed over composing maps. It should be evident that composing (foo : bar->bar) with (bletch : bar->bar) will yield some (quux : bar->bar), but there is only convention to suggest which way unspecified composition defaults to.
Most mathematicians and some programmers write (g°f)(x) where (g°f)(x) == g(f(x))
Some algebraicists and most programmers write (f;g)(x) where (f;g)(x) == g(f(x)) ]