Because the engine analogy though intuitively appealing, doesn't accurately map onto the way organisms work. The 'nerves' which send control and sensory information to and from our organs are neurons. The 'extended brain' includes the central and peripheral nervous systems, and takes part in local and distributed cognition. So it's not a case of an engine running a machine, or a CPU controlling the information flowing from a bunch of subsystems. The neurocognitive architecture is spread across the body. Naturally more cerebral cortex is needed to perform higher order processing and control the signalling throughout the whales enormous nervous system.
I think a simpler way to put it is that the coordination of more body mass requires more analog input. It's not the case that a large muscle group in a body is sufficiently abstracted into something that only requires a small amount of input to do work. It makes intuitive sense to me that it requires more signal bandwidth to control more muscle cells at once, and in turn more computation to produce data that requires such bandwidth.
A better analogy would then be saying that we need to treat cell groups of a certain mass roughly as one software application that has a certain level of computational resource requirements. The more such applications you have in your system, the more CPU and RAM you need to run them all.
Another way to explain it is that an engine is a autonomous unit that has a specific function, and all you have to do is to instruct it, say, to run with higher or lower power. But most of the body has to be centrally controlled by the brain.