You got any source for this? I am only asking since I got the impression rdp was a superior protocol to vnc, nx etc because of the complex handling of graphical primitives. But I know next to nothing about the real technical details.
I've not looked at RDP, but I've implemented the X11 protocol and the VNC protocol.
The problem is that the more complex the UI, the sooner you reach a threshold where "just transmitting the bitmap" is faster and/or less data.
E.g. consider rendering a simple button with X11: You'd "just" send a request to render a rectangle, maybe fill it, and send the string for the label. 2-3 small requests. But then the moment you have a UI with a gradient, a drop shadow for the text, a differently shaped border, a shadow for the border, you suddenly add on enough requests that it's very easy for the numbers to look different. Especially because compressing these bitmaps reasonably well tends to be easy.
Modern X11 clients increasingly render into client-side buffers already because even when using X, that's often better when on a local machine and too few of us use it over the network often enough for that to be optimised for.
Having the option in the display subsystem of picking either based on what will perform best is a good place to be, but the more complex the UI the less often the simple primitives will be worth it.