To someone that’s reasonably computer savvy (professional developer but Windows and macOS only), what does X11 do? What are the benefits of this approach? What’s the ‘equivalent’ in Windows or macOS (if there is one)?
A user on any X11 desktop can open a GUI from any X11 app, even one running in a datacenter. It's like a Javascript app in a browser today, but every Unix GUI app knew how to do it on late 1980s hardware, so you didn't even have to buy and admin a complete desktop computer for each user.
Windows sort of has this with RDP, but it's tied in with the app's GDI desktop, and I don't know whether it works without buying a bunch of video cards for the headless app server. NeXTSTEP had Display PostScript (remote rendering worked like printing!) but macOS lost support for it.
Somewhat misleading. Because these days, none of the network transparent primitives are used anymore. All the rendering happens server side, and bitmaps are sent over the wire. It's basically a crappy VNC. And at that point, just use RDP or VNC
That greatly depends on the type of applications you run and the toolkits and font rendering they use. My most commonly used applications (terminal emulator, Emacs) do their font rendering using the the glyph compositing functionality of the RENDER extension. Server side glyphs are created when a font is loaded and all the compositing is done on the server side based on the client's CompositeGlyphs requests. Same for images (in Emacs) using CreatePixmap and CopyArea.
Indeed. There are also plenty of terminals that uses the old server-side font rendering calls. For terminals, using client-side buffers is the exception - mostly the handful that use OpenGL.
That's really not true. Qt as of Qt 6 still supports using native X11 drawing commands and that covers a lot of apps. Tkinter too (and this covers many technical apps which are exactly the ones likely to be used over the wire).
Just last week I was debugging remotely an art installation which uses my software, https://ossia.io and was running on a Pi 5, I compared X11 and VNC and X11 was really much more useable even over the internet.
XPRA using h264 or h265 does a decent job in my experience in term of performance to increase over ssh -X.
On wayland from I am also getting good results with waypipe opening individual apps from VMs to make it a poor man's QubeOS without the complexity and without having to open the whole remote desktop.
> All the rendering happens server side, and bitmaps are sent over the wire. It's basically a crappy VNC.
Even if this were true (which it isn't), there's a lot more to a GUI than the G. A lot of nice interoperability is provided too, like clipboard integration, dragging and dropping, mixed windows on the same taskbar, etc. Far more pleasant to use than awkwardly going to a full screen thing to get a window out.
Yeah, I have nothing constructive to say about apps and toolkits that choose only local rendering with no hardware, but it's pretty funny to see Javascript apps beating them on performance.
AFAIK, Windows Remote Desktop always renders through virtual display adapter devices, though it can optionally make use of GPU resources to accelerate rendering to these devices.
I just tested in an ESXi VM[1], and Windows 10 will happily boot and accept an RDP connection without any (virtual) display hardware at all:
Whether a particular computer will boot Windows without display hardware depends on the firmware — in my experience, some workstations will, some won't, and I honestly haven't seen a modern x86 server without some sort of integrated graphics to support out-of-band management, often with a VGA port as the only physical video output.
[1] By hand-editing the VM's .vmx configuration file, setting
svga.present = "FALSE"
and adding the undocumented option
hwm.svga.notAlwaysPresent = "TRUE"
Interesting aside: I figured this out by running "strings" on the hypervisor executable, which turned up this suspiciously Scheme-like bit of configuration code:
X11 is a distributed systems clustering protocol for asynchronous message passing that outputs graphics as a side effect. It's like Erlang's dist, but with a side channel to a display.
A more real answer, is the X server manages access to the input and output devices... Roughly speaking, it lets a client define a region (rectangle) and get clicks and mouse motion and keyboard input sometimes (maybe too often), and lets the client send things to be displayed. In modern times, that's mostly images, but it used to be lines and curves and letters and things, or like OpenGL display lists. The server can tell the client when it is exposed and need to redraw or it can use a backing store to keep obscured parts of the region local to the display. Additionally, clients can adjust other client's resources (this is what a window manager does) and clients can communicate with each other through the server (no full mesh like in Erlang dist) ... that part is a bit confusing.
On Windows and macOS (either one), there's no sense of a network involved, similar things happen, but mostly with system calls, I think. Otoh macOS X has all those mach ports? does UI go through that? But there are X11 servers for most platforms that integrate reasonably well, so it's not like you can't use X concepts there, it's just a bit more setup to get started. Windows also has RDP which can be used to run a program on one computer and display it on another.
The benefit of this approach is you can take better advantage of asynchrony... Many GUI libraries and toolkits run in a synchronous model where you do a request and can't continue until you get the answer. That's fine when everything is fast, but when there's a network between the client and server, it's better to send requests when you have them and only wait for the response when you need it. (See also xcb vs xlib)
Yeah on macos the communication between your app and the window server (which is conveniently called WindowServer) happens via mach ports. Most of it is undocumented, in fact anything more "low-level" than using AppKit is undocumented, although IIRC it is in principle possible to use undocumented CG* apis to create and manipulate windows yourself without going through the appkit layers. I think each CG* api is basically a thin shim that communicates to the window server, which has a corresponding CGX* implementation which does the actual logic. This article has some details https://keenlab.tencent.com/en/2016/07/22/WindowServer-The-p...
Afaik, Windows RDP used to do more complex remoting of drawing primitives so that the host computer would send the clients instructions for drawing individual elements.
But for many years now, they switched to an approach of rendering all the graphics on the host side and then sending a video stream to the client using standard video compression. I think this compression based approach scales better for the common use cases, especially office apps where the display is mostly static. I guess the approach has almost gotten good enough that you can play games remotely that way.
You got any source for this? I am only asking since I got the impression rdp was a superior protocol to vnc, nx etc because of the complex handling of graphical primitives. But I know next to nothing about the real technical details.
I've not looked at RDP, but I've implemented the X11 protocol and the VNC protocol.
The problem is that the more complex the UI, the sooner you reach a threshold where "just transmitting the bitmap" is faster and/or less data.
E.g. consider rendering a simple button with X11: You'd "just" send a request to render a rectangle, maybe fill it, and send the string for the label. 2-3 small requests. But then the moment you have a UI with a gradient, a drop shadow for the text, a differently shaped border, a shadow for the border, you suddenly add on enough requests that it's very easy for the numbers to look different. Especially because compressing these bitmaps reasonably well tends to be easy.
Modern X11 clients increasingly render into client-side buffers already because even when using X, that's often better when on a local machine and too few of us use it over the network often enough for that to be optimised for.
Having the option in the display subsystem of picking either based on what will perform best is a good place to be, but the more complex the UI the less often the simple primitives will be worth it.
Real life scenario: you have a headless server that has GUI software installed at a remote site. You connect to the network from home with a VPN. Then SSH in to the server with "ssh -X user@server" and run the GUI program in that terminal. The GUI appears on your local display. SSH sends the X11 protocol traffic through an encrypted tunnel from the remote server to the local X11 display server, because of the -X.
In Windows, it's kinda split between the Windows Display Driver Model (WDDM) and the Desktop Window Manager (DWM). That's not a 1:1 match, though, as those two combined cover more components of a functioning whole than X11/XOrg itself does. X11 just split the components needed to draw everything you'd need for graphical environment into a different choice of layers.
X11 got network transparency out of the box (a sibling comment touches this), and the capability of switching out the components more easily, while Windows had less work to do to smooth out the overall desktop experience.
Ultimately X11 is the main component that you use to draw and interact with a GUI on a traditional Unix or on a Linux system (Wayland is a more recent alternative, and there have been other attempts in the past, but X11 is still the most commonly used).
There is no 1:1 correspondence on Windows. Some of what X11 does is part of Win32, other parts are in explorer.exe, others are built in at deeper layers; there are multiple alternative systems available on Windows too.
Ultimately what X11 gives you as an app is a way to draw something on the screen, and to get input events from users. X11 also coordinates this between multiple separate apps, so that you get movable windows and focus and other such behaviors without having to have each app coordinate manually with every other GUI app. Copy-paste is another similar cross-app functionality that X11 offers.
The way X11 does this is through a well defined protocol. Instead of relying on system calls, you open a socket to some known port and send drawing commands there, and receive input events from it (of course, many libraries abstract this for you). Because of this, it can work transparently regardless of whether you are drawing on the local machine or on a remote one. So, X11 itself can work as a Remote Desktop solution as well without the need of a separate program or protocol (though there are significant differences with pros and cons).
I am not very familiar with any of them at a low level but I think you can sum it up as Apple and Microsoft defined their display server as an api, a programing interface. MIT defined theirs as a protocol, a communication interface. With the intention that any api supplied should have a very flexible transport.
Same thing as Cocoa or the windowing API in User32.dll, but as a network-transparent client/server architecture (usually behind a library though, so doing things like opening a window is quite similar, except that Xlib isn't as ergonomic as Cocoa or even Win32).