Specifically, you need reaction time faster than sqrt(l/g), where l is the distance between the engine and the center of mass, and g is gravity. But l=30 cm is still with the range of commodity hobby servos to vector the engine.
The same effect makes walking robots harder at small scale, unless they cheat by having large feet and stiff ankles.
If you take an inverted pendulum of length l in gravity g, and perturb it slightly from vertical the error grows like e^(t/sqrt(l/g)). So if you're off by 1 degree, you'll be off by 2.718 degrees sqrt(l/g) seconds later. (The real function involves hyperbolic cosines, but they grow like e^t).
If you can react 2x as fast the control problem is easy. If you can react 1x as fast, the control problem is feasible but requires accurate tuning.
For average-height humans on earth, the height of the center of mass is about 1.3m, so sqrt(l/g) is about 350 mS. Human response time, from the inner ear to the ankle muscles is about half of that. That gives some intuition for how hard it is to balance with 2x faster response. Balancing a yardstick on your finger is closer to 1x faster response.
The period of a pendulum is 2pi sqrt(l/g) for small amplitudes. Heβs most probably referring to the order of magnitude you need, not precisely that number.
The same effect makes walking robots harder at small scale, unless they cheat by having large feet and stiff ankles.