The distributivity argument is a bit sketchy, though. It defines g.p := p▫q, meaning that g implicitly depends on q, but then writes f(y▫z) = f(g.y) and fy▫fz = g.f(y), where in the first instance g.x = x▫y and in the second g.x = x▫fz.
That said, I do see the point. The distributive rule let's us commute the order in which we compute addition and multiplication. Generally, I guess if we have some family of arrows A and B, where all commutative diagrams Bx∘Ay = Az∘Bw hold, then we can say "A and B commute". This can probably be expressed as some literal commutativity using natural transformations, but I'll need to think on it some more.
Yes: the requirement of commutativity in the "addition" for distributivity to work with a noncommutative "multiplication" is exactly why my current model for code* is based on an (anathema to practicing programmers) unordered choice.
* and also data; with suitable choices of definitions there are very few differences between the left and right hand arguments to application.