Hacker News new | past | comments | ask | show | jobs | submit login
Opening Windows in Linux with sockets, bare hands and 200 lines of C (hereket.com)
265 points by libcheet 6 months ago | hide | past | favorite | 121 comments



One of the neat things we'll be losing with X11 is the fact that you can do graphics -- fast -- entirely with the wire protocol. It was really a protocol for a sort of smart graphical terminal, so it's like having an Amiga blitter at the other end of the network connection. From a "zero to something on the screen" standpoint, it's fast and convenient, as you don't have to manage your own framebuffer, shaders, dirty rectangle list, or any of that.

Of course, as any Gen-Z graphics hacker will tell you, That's Just Not How Things Work Anymore, and rendering is and should be client-side, leaving the compositor to only present the final display. But it was fun while it lasted.


> One of the neat things we'll be losing with X11 is the fact that you can do graphics -- fast -- entirely with the wire protocol.

As long as you only want to do very basic graphics -- no antialiasing, no color blending, no subpixel coordinates. And it isn't even that fast; everything ends up rendered on the CPU, possibly even all on a single thread, using code paths that haven't seen much optimization in the last 20+ years.


This is why the XRender extension was introduced. There you have antialiasing, all blending modes you could wish for, subpixel coordinates, advanced drawing operations like gradients and it is fast because it is fully hardware accelerated. All working over a very efficient wire protocol. E.g. Cairo uses Xrender as a backend.


It's not all hardware accelerated though is it. Both the X server and Cairo depend on the pixman library, which is a CPU/SIMD optimized pixel manipulation library.

Even xf86-video-intel, the intel X11 driver package, on my system depends on pixman.


Nothing on your feature list is incompatible with a client/server approach though. The only downside would be when large data blobs need to be sent over a network each frame.

In the end, modern 3D APIs are also just a 'wire protocol': rendering commands are recorded by the CPU into local command buffers and played back by the GPU (which is often connected to the CPU by a comparatively slow bus), the only limiting factor is the amount of data that's communicated between the CPU and GPU - and of course any additional latency that would be added by a network connection.


I wonder if there could be a X server implementation that ran fully on the GPU.


Pure-CPU is side effect of XFree86 being essentially "lowest common denominator" platform (and even then some stuff was accelerated) - it comes from X.Org X11 releases being "base code" for vendors to built their own.

For comparison, SGI used a wildly different architecture underneath, which heavily impacted performance - it's also why "xsgi" would default to indexed colour and you'd use 24bit/32bit visuals only for some windows - the server would use indexed colour visuals on the actual framebuffer to reduce memory bandwidth and composite everything together in hardware.

It's also why there were "X11 Overlays" for GL - it meant you could easily implement parts of the GUI in X11 and still render on-top of direct-rendered visuals that bypassed X11.


Interesting! I wonder if there are any writeups around with more details on the SGI X architecture?

But I was really thinking about going further with all or nearly all of the X server executing on the GPU with only cpu shims for input peripherals, networking etc.


The closest to this idea was less X11 and more the never actually fulfilled to my knowledge promise of NeXTDimension color board for NeXT Cube - which was supposed to implement the entire graphics stack on the embedded CPU.

X11 could be implemented similarly - essentially sticking a complete standalone "X11 terminal" on an Add-In Card - the current GPU architectures aren't necessarily fitting for implementing it directly, but combine essentially an embedded OS running on extra CPU handling the protocol interactions and I/O then use a possibly more tailored interface to talk to GPU (compare with AMD's promise of HSA, or Xbox post 360 or PS4/PS5 architectures where GPU and CPU are on common memory).

NeWS-style engine (possibly with something simpler than full postscript) would work great with that, especially if you had properly shared interface libs on it.


We could run a OS on current GPUs I think, hardware-wise. And we know from history that compilers can make up for various hardware shortcomings on OS support side. Eg multi-task scheduling and concurrency can be done at the compiler level.


The Gen Z web hacker will also tell yiunthatbthis isn't hownthings are done anymore and use a wire protocol consisting of HTTP, HTML, CSS and JS to make a bloated X server substitute called web browser do pretty much the same things.

I exaggerated, but only a little bit. Even the widget library zoo has somehow found an equivalent on the web.


It's like I say, Wayland is the replacement for X, but browsers and Electron are the replacement for NeWS.


Actual hardware "smart graphical terminals" that used that wire protocol were built for a while, too - I had some Labtam / Tektronix XP400 X11 thin terminals.


Why would we "lose" it?

XWayland is here to stay for a long time.

And if there's a need for this it's trivial to get the same DX via a library that's optimized for small on-the-wire serialized format of scene descriptions. (For whatever backend over whatever wire, local shared memory message passing, unix socket, TCP/UDP, WebSocket, etc.)


The X Windowing system is a hackers delight.

If you have a remote server with UI, you can set up a X Window server on your Windows/MacOS machine and forward via SSH X messages to use GUI apps on your server, but view the result locally. The Responsiveness of the UI depends on your network capabilities.


This is fun and all, but if you lose your connection, your windows will go away and your program will usually exit.

It’s almost always more usable to run a desktop session in Xvnc on the server and connect to it with a VNC client, because if you get disconnected, you can just reconnect.


VNC is also much faster in most cases, because the X protocol requires a lot of round trips that waste a lot of time. So instead the VNC server "runs these round-trips locally" (so to speak) and you only interact with the pixels remotely. X was developed when bandwidth was as limiting as latency but nowadays only latency is limiting so a different protocol makes more sense.


> X protocol requires a lot of round trips that waste a lot of time.

This isn't very true. The X protocol is very async and lets you batch plenty of things when a response is required.


You can't pipeline X11 operations in the presence of anything but a perfect network because

1. TCP streams require stalling when packets are dropped to keep the stream in order

2. X _requires_ by design that commands are performed in order

Which means that using something that can do UDP and manages it's own sequence ordering can do significantly better. This is why things like RDP, PCoIP, etc could do full frame rate HD video 15 years ago and you still can't with X protocol over the network.

Breaking up the screen into small 16x16 chunks or so, encoding on the GPU, and shipping that turns out to be significantly faster.

Especially when you take into account that virtually _nothing_ draws with X using X drawing primitives. It's almost all using Xshm for anything non-trivial.


Most modern Ethernet LANs are effectively lossless. You're not going to see a single dropped packet unless you saturate the link.

> Which means that using something that can do UDP and manages it's own sequence ordering can do significantly better. This is why things like RDP, PCoIP, etc could do full frame rate HD video 15 years

No it isn't. It's because those things actually compress the video, and X-forwarding generally doesn't. The transport protocol is completely irrelevant, it's just a bandwidth problem.

I've X-forwarded Firefox between two desktops on 10G Ethernet. I can watch 4K video, and I genuinely can't tell the difference between local and remote.


If you're including TCP ACKs as part of the "chatty"/"required round trips" of a higher level protocol, that's bad new for a lot of things. (Which, granted, is why they made those QUIC protocols etc., but still, it seems unreasonable to single out X's protocol for this, especially since RDP and VNC are commonly used over TCP as well).

But: > This is why things like RDP, PCoIP, etc could do full frame rate HD video 15 years ago and you still can't with X protocol over the network.

Compression is going to have a much bigger impact over a large motion than most anything else; you can stream video over HTTP 1.1 / TCP thanks to video codecs, but X (sadly i think, seems like such an easy thing that should have been in an extension, but even png or jpeg never made it in) doesn't support any of that.

> It's almost all using Xshm for anything non-trivial.

Xshm is not available over a network link and it is common for client applications to detect this and gracefully degrade.


VNC hasn’t always been that good though. There was a time when I’d routinely use X forwarding via SSH because my (then) girlfriends internet connection wasn’t fast enough to run VNC smoothly.


different implementations of vnc support different compression algorithms and for some reason vnc always defaults to the worst (most compatible?!?) ones. You can achieve a lot by tweaking the client and server settings. The results were good enought for me to support customer with ISDN internet between 2001-2006.


or you are like me and remote-desktop from a thin client (or from home) to a windows VDI, run vnc to a X server and ssh -X to the dev box to run emacs (edit: and firefox). I do this every day and I'm amazed that it actually works smoothly.


You can use Xpra, it's like tmux but for X. You can have your window follow you across clients.


Same deal with a remote shell. Usually it's not a problem, but if you anticipate it will be, you can use additional tools (screen, tmux, Xpra).

Xvnc is fine if you want a desktop session, but usually I just want one program, so X11 is a better fit for me. Once you have the infrastructure setup, it's really easy to go from a normal shell to running a graphical program. For me, I use a Microsoft Windows desktop environment, have X11 forwarding enabled on my usual shell targets, so I just start the X server and run the program. I could have the X server autostart (it's unobtrusive if there's no clients), but I haven't yet. On a mac, the Xserver starts automagically (once installed), so it's just a matter of enabling X forwarding in your ssh config to hosts where it's likely. Even less work if you run an X desktop.


> your windows will go away and your program will usually exit.

This is a huge improvement over most "web applications", whereby your client (web browser) sits there looking like it's doing something when your network connection goes away. Worse still, you don't know whether the last (trans)action you sent to the server was processed or not.


> This is fun and all, but if you lose your connection, your windows will go away and your program will usually exit.

Interestingly, that is xlib's behavior moreso than inherent in the protocol. xlib assumes connection lost is a fatal event, but if you're doing your own socket, you can choose to do this differently. (or, even with xlib, you can throw an exception from the connection lost callback and regain control before letting it abort)

Some server state is lost, but it is possible, with some care, to recreate that from your client upon reconnecting. You can even connect to a different server and somewhat seamlessly migrate windows over.

But yeah it isn't commonly done.


You can survives disconnects even with xlib, I think the abort is just the default behavior.

Emacs is the one program I know of that can actually do this. It can pop up frames (Emacs lingo for windows) on multiple displays at the same time and even mix tty and X11 frames. The Emacs session survives connection loss fine, frames on other displays continue working and you can reconnect if desired.

The one caveat is that Gtk used to have a bug that caused it to uncontrollably abort on connection loss but I build my Emacs with --without-x-toolkit (so it uses raw xlib, no toolkit) and that configuration has always been robust and performant. If I remember correctly the Gtk bug might be fixed now too.


Gtk also has an annoying misfeature here. It literally calls abort when the connection is lost.


You can use xpra: https://xpra.org/index.html


This also works for running a Linux GUI on your phone in Termux. Termux+localhost VNC is the way for this.


xpra or x2go (can't remember) supported that.


Sure, although originally you'd have an X Terminal (an actual dedicated terminal device running an X server, of varying levels of performance), rather than an X server program running on your computer (i.e. an X Terminal emulator).

The terminology is of course a bit counter-intuitive, since it's program-centric rather than computer-centric - the local terminal is the server and the remote computer (or rather program) is the client, utilizing the server as a display device.

It's a bit like a text terminal (VT-100 etc) except of course the X terminal has a network connection., With the VT-100 if you wanted to connect to a remote system you'd have to use a modem (acoustic coupler) to dial into a terminal server on the remote system. I don't think anyone ever made a text terminal with built-in telnet and ethernet.


> I don't think anyone ever made a text terminal with built-in telnet and ethernet.

Maybe not, but there were terminal servers with 10 serials ports for the glass terminals and an Ethernet connector.


For some reason I always found it fun messing with serial cables, and always had a desk draw full of various types (cross-over, gender-benders, db9-db25 etc), as well as occasionally needing to make custom ones.

I still have a USB terminal server for when I want to play with old stuff like my DECTalk (VCR-sized speech synth) which has two DB-25 serial ports on it.


I did the same thing! I actually got my first soldering iron (and learned how to use it) so that I could wire to my own connectors. Building a home-made graph-link cable for my TI-83 in particular was a fun experience, as well as a good education in what can go wrong when you mess up soldering.


I think many diskless X terminals could do just plain telnet too if wanted.


Not telnet/ethernet, but later VTs had built-in modem controls and some utilities in firmware, so you could have an AT-capable modem connected over serial port and just select what connection to dial into.


I used to do this with Virtual Box running locally. I found the performance of the windows x client was superior to trying to use x directly in the virtual machine. I wish the protocol supported audio.


Hacker's Delight is also the title of a handy reference for numeric algorithm recipes.

https://en.wikipedia.org/wiki/Hacker%27s_Delight


This is how I run graphical Emacs in WSL. And how I was developing SDL games as well.

I hear this might become a native built-in feature in Windows but for now I just run an X client.


It's already here, for more than a year, both on Windows 10 and 11. I use this for emacs (with a GTK build) every day.


WSL has been able to open graphical Linux apps for a while. They use Wayland (with FreeRDP to the Windows Host) and XWayland for X11 apps.

https://devblogs.microsoft.com/commandline/wslg-architecture...


Meh we used to have to install oracle databases this way, and the performance was always shit even over direct connections w/cross cables.

I don’t think x over ssh is useful for much, can you imagine a browser this way?


I used to run everything over 100Mbps ethernet. It worked just fine. Two biggest issues: Ssh without tuning settings, and modern X applications increasingly doing client-side rendering into bitmaps and so being far more bandwidth hungry and not tested over the network. If it's the latter, VNC or XPRA are much better suited if what the app ends up doing is streaming bitmaps anyway, as they're actually optimised for that.


It’s usually SSH that’s the bottleneck though. If you just expose your local X11 TCP server port and connect to it from the server (X11 client) by setting the DISPLAY env properly, it will be much more responsive. Often indistinguishable from local apps. Secure the port with Wireguard, if you want to be a bit more responsible and still have good performance. Checked with Firefox, which works like a charm this way, but is unusable over SSH.


> If you just expose your local X11 TCP server port and connect to it from the server (X11 client) by setting the DISPLAY env properly, it will be much more responsive.

That's fine on a LAN with low or sub ms latency. If you need to connect to a server on the other side of the country, you'll want a less chatty protocol, e.g. X2GO (which gives additional benefits, e.g. of restartable sessions).


> can you imagine a browser this way?

It in fact works perfectly fine.

The catch is latency. In my case both hosts are on the same colo.

I remember, 20+ years ago running quake3 over remote X to my roommate machine and I was surprised that glx actually worked over the network with no issues.


It depends on how the client was written. If you build around asynchronous operations, you can get good throughput, and if you always stop and wait for a reply, you'll get bad throughput. If you buffer your output and don't actually send it for a while, that can also make the experience poor. Xlib pushed you in both those directions, but you can make terrible clients with any library.

My personal experience with Oracle products has always involved terrible UX, so I'm not surprised they built a terrible X client.

I've run browsers over X, sometimes actually remotely, and it's OK, but not great. I wouldn't recommend it for video, audio is out of scope, and start up can be very slow sometimes. It's something that used to work better, IMHO. Sometimes you need to try a couple browsers and see which one works better. As a sibling notes, it all feels untested, and I'm sure if developers were testing, they'd fix some of the things that are easier to fix.


X over SSH works really well for me as long as I enable SSH compression. At least for my usage, I can't tell it apart from a native window but then again I'm not running sophisticated programs which update frequently.


Until fairly recently I used to use Quartus on a headless box with ssh -X

It worked really well up until a certain version (18, I think) - after which something changed in the UI toolkit and it was suddenly much more laggy. I'm guessing the UI toolkit's newer incarnation required a lot more round trips.


x2go is a bit smarter than pure x-forwardning and also allows you to resume a session at a different time (or on a different machine) similar to something like tmux. Much much better performance and does pretty well even on cellular.

A bit rough around the edges but brilliant for some usecases.


The use case was a whole bunch of cheap display server machines in a computer lab at a university with a very powerful machine running the computation-expensive client programs. You didn't need SSH for that and the only real problem was contention on the token ring (ah, the old days...)


Back in about 1994, there was a server at I think MIT that would stream a live TV signal to your X Window server.

Yeah, with no compression, that's right.


Related, a talk about replacing Xlib with their own abstraction with zig: https://www.youtube.com/watch?v=aPWFLkHRIAQ


Was just about to mention this. Also, just to add to this, even though it's a zig talk, it's not really the main focus of the talk. I enjoyed it and I don't even know zig.


I share the author's opinion that xlib is harder to grok than the X11 protocol, because it's the X11 protocol plus a queuing system where you will sometimes receive out-of-band messages, plus a large number of seldom-used utilities.


Working with X11 client libraries written in other languages is a revelation in how much nicer an X11 client can be. CLX (Common Lisp) or xgb (Go) are some good examples.

(Talking about native implementations here not xlib bindings).


Same here. My window manager, terminal, (very basic) file manager all use pure Ruby X11 bindings, and the more I looked at where Xlib deviates (wraps, obscures) the protocol, the more places it felt easier to just use the X requests directly. I haven't looked at XCB much, but my understanding is that it's a much more direct binding to the X protocol, so if I were to write an X client in C using a client library, I'd almost certainly look at XCB rather than Xlib.


I remember seeing this on HN a few months ago! Looks pretty badass. I plan to play around with it when I get some time. I love Ruby, and have long loved the idea of a WM/desktop in Ruby. A friend of mine is diehard Xmonad user and the power of being able to write code to extend the WM is fascinating.

Also just wanted to let you know that I sent you an email. Don't feel obligated to respond or anything, just wanted to let you know because I sometimes get emails from people on HN but they go to an inbox I don't often check so I often don't see them until long after unless they tell me in a comment that they emailed me.


I’d love to check out your Ruby code. Link?


I'm not GP, but I'm pretty sure this is it: https://github.com/vidarh/rubywm

I remember seeing this on HN a few months ago and thought it was super cool.

Edit: https://news.ycombinator.com/item?id=39087609


Yeah, I've not been a fan of xlib. xcb is a much better library, IMHO, although it's not much more than talking X11 yourself. With Xlib, it seems like they tried to make things 'easier' by having high level concepts and a lot of synchronous apis, but they don't really fit on top of X, so you're fighting the library. Unfortunately, xcb came later and lots of things were already built on xlib, so lots of software has unnecessary synchronicity that makes things slow when you use the network stuff.


Using XCB is often a much better experience than xlib.


> I was very surprised to learn that it is actually just a “regular” network protocol for two parties to communicate like HTTP, FTP, IMAP, SMPT and etc.

With the important distinction that all those protocols very intentionally use human readable ASCII requests/responses. Of course, for performance sake, ease of debugability was sacrificed. Which increases the value of a standard library to hide the protocol.


You can run X apps in x11trace which translates commands and responses to ASCII. I assume Wireshark also has a parser.


> I was very surprised to learn that it is actually just a “regular” network protocol for two parties to communicate like HTTP, FTP, IMAP, SMPT and etc.

Typo: Simple Mail Transfer Protocol

And I suspect many of us tend to think of those as magical abstractions best dealt with through libraries as well:)


For years I checked usenet by just telnetting on port 119


I did some work on a MTA a while back and it was genuinely pretty neat to send legit emails to my real mail accounts by just connecting to a port and typing the raw SMTP commands into it. E.g. https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#...

Simpler times, when we thought we could trust everyone on the net :)


When I first started using UNIX (Ultrix on a DECStation), I was floored by the wizardry of "hackers" sending me email from Santa Claus. Mind blown!


Usenet gives full help over telnet, it's nice.


X11 Windows, not the obvious Windows


Yes, I don't know why they changed the title to make it confusing.


Can you do the same but in Wayland next?



I'd like to see exactly that. Hello World in Wayland is so much more complicated.


I think principally, it's not actually that much worse. However, there are some aspects of Wayland that are much better and much worse for people who prefer to go their own way.

With Wayland, the protocols are specified in machine-readable XML files, not unlike how XCB works, and typically a program called a "scanner" would read those XML files and perform code generation. So if you wanted to generate proper Wayland bindings, you'd need to write the scanner. The scanner is not terribly complicated. I did it in Go with around 750 SLOC, not counting the supporting code. It's not released, but here's just the code generation part, if you are curious. Last I used it, it could generate all of the Wayland protocols, and I was successfully using dozens of them from my own Go bindings.

https://gist.github.com/jchv/adcb6de1c9dc3d0112dea704d753803...

Of course, if you just want to implement Hello World, you don't need to use code generation, and the resulting amount of work to interface with X11 is ultimately similar to that of Wayland. But there are two huge caveats:

- Right now, the EGL interface for Wayland requires that you pass libwayland wl_surface structures. That means to get on-screen hardware accelerated contexts in Wayland, you have to link to libwayland (maybe making compatible structures would work if you're into that kind of thing, but definitely no easy options.)

- X11 has many server-side features that Wayland doesn't. Wayland's core protocols don't specify even window decorations or cursor shape, so a Wayland client has to implement drawing window decorations and loading cursor themes to properly support all Wayland compositors... even if they would prefer to use xdg-decoration and cursor-shape-v1. This is on top of things that no Wayland compositor has support for, like an equivalent to XRender or indirect rendering with OpenGL; those things just don't exist.

So it's far from perfect. Though, it's also not like X11 was free of cruft or design issues, either. I'm pretty sure with X you also have to implement cursor themes yourself, by virtue of linking to XCURSOR. And while you can use XRender and other server side rendering features, which will be way more efficient for remote usage than simply opening a GL or Vulkan context, most software (GTK, Electron) doesn't do this anymore, and either ships buffers over shared memory or opens a direct hardware accelerated context.

Personally, the problem I'd really like to see fixed is the EGL ABI one, but I'm pretty disillusioned by the bullshit in anything RedHat indirectly touches (e.g. Freedesktop.org), so even if there's a chance I could help, at this point, I'd rather spend my time working on other things.


I think Wayland works similarly, in a sense that you open a Unix domain socket and send and receive bytes, see https://wayland.freedesktop.org/docs/html/ch04.html.

But I'm pretty sure it's more complicated than I think. :-)


There are some things that are more compilcated, e.g. you need to bring your own libraries for font support and input methods and rasterization and so on, and you need to handle hotplugged input devices and such. But protocol-wise it’s not really more conplicated, no.


To someone that’s reasonably computer savvy (professional developer but Windows and macOS only), what does X11 do? What are the benefits of this approach? What’s the ‘equivalent’ in Windows or macOS (if there is one)?


A user on any X11 desktop can open a GUI from any X11 app, even one running in a datacenter. It's like a Javascript app in a browser today, but every Unix GUI app knew how to do it on late 1980s hardware, so you didn't even have to buy and admin a complete desktop computer for each user.

Windows sort of has this with RDP, but it's tied in with the app's GDI desktop, and I don't know whether it works without buying a bunch of video cards for the headless app server. NeXTSTEP had Display PostScript (remote rendering worked like printing!) but macOS lost support for it.


Somewhat misleading. Because these days, none of the network transparent primitives are used anymore. All the rendering happens server side, and bitmaps are sent over the wire. It's basically a crappy VNC. And at that point, just use RDP or VNC


That greatly depends on the type of applications you run and the toolkits and font rendering they use. My most commonly used applications (terminal emulator, Emacs) do their font rendering using the the glyph compositing functionality of the RENDER extension. Server side glyphs are created when a font is loaded and all the compositing is done on the server side based on the client's CompositeGlyphs requests. Same for images (in Emacs) using CreatePixmap and CopyArea.


Indeed. There are also plenty of terminals that uses the old server-side font rendering calls. For terminals, using client-side buffers is the exception - mostly the handful that use OpenGL.


That's really not true. Qt as of Qt 6 still supports using native X11 drawing commands and that covers a lot of apps. Tkinter too (and this covers many technical apps which are exactly the ones likely to be used over the wire).

Just last week I was debugging remotely an art installation which uses my software, https://ossia.io and was running on a Pi 5, I compared X11 and VNC and X11 was really much more useable even over the internet.


Ah, I guess I was mistaken


XPRA using h264 or h265 does a decent job in my experience in term of performance to increase over ssh -X.

On wayland from I am also getting good results with waypipe opening individual apps from VMs to make it a poor man's QubeOS without the complexity and without having to open the whole remote desktop.


Adding a video codec sounds expensive, how many concurrent users can an app server support that way?


> All the rendering happens server side, and bitmaps are sent over the wire. It's basically a crappy VNC.

Even if this were true (which it isn't), there's a lot more to a GUI than the G. A lot of nice interoperability is provided too, like clipboard integration, dragging and dropping, mixed windows on the same taskbar, etc. Far more pleasant to use than awkwardly going to a full screen thing to get a window out.


Yeah, I have nothing constructive to say about apps and toolkits that choose only local rendering with no hardware, but it's pretty funny to see Javascript apps beating them on performance.


AFAIK, Windows Remote Desktop always renders through virtual display adapter devices, though it can optionally make use of GPU resources to accelerate rendering to these devices.

I just tested in an ESXi VM[1], and Windows 10 will happily boot and accept an RDP connection without any (virtual) display hardware at all:

https://jasomill.at/HeadlessRDP.png

Whether a particular computer will boot Windows without display hardware depends on the firmware — in my experience, some workstations will, some won't, and I honestly haven't seen a modern x86 server without some sort of integrated graphics to support out-of-band management, often with a VGA port as the only physical video output.

[1] By hand-editing the VM's .vmx configuration file, setting

  svga.present = "FALSE"
and adding the undocumented option

  hwm.svga.notAlwaysPresent = "TRUE"
Interesting aside: I figured this out by running "strings" on the hypervisor executable, which turned up this suspiciously Scheme-like bit of configuration code:

https://jasomill.at/vmx.ss.html#L-808

In fact, two definitions and a couple trivial syntax fixes are enough to get Chez Scheme to evaluate it without warnings:

  $ awk -f - vmx.ss <<EOF | scheme -q
    BEGIN {
        print "(define / fxdiv)"
        print "(define (log . args) (for-each display args))"
    }
    {
        gsub(/\(\)/, "'()")
        gsub(/ 0x/, " #x")
        print
    }  
  EOF
  $
So it appears the VMware hypervisor contains an interpreter for a very Scheme-like dialect of Lisp.


Can a backend app use a single GPU (integrated or not) to support many concurrent users, so that software rendering doesn't become a bottleneck?


X11 is a distributed systems clustering protocol for asynchronous message passing that outputs graphics as a side effect. It's like Erlang's dist, but with a side channel to a display.

A more real answer, is the X server manages access to the input and output devices... Roughly speaking, it lets a client define a region (rectangle) and get clicks and mouse motion and keyboard input sometimes (maybe too often), and lets the client send things to be displayed. In modern times, that's mostly images, but it used to be lines and curves and letters and things, or like OpenGL display lists. The server can tell the client when it is exposed and need to redraw or it can use a backing store to keep obscured parts of the region local to the display. Additionally, clients can adjust other client's resources (this is what a window manager does) and clients can communicate with each other through the server (no full mesh like in Erlang dist) ... that part is a bit confusing.

On Windows and macOS (either one), there's no sense of a network involved, similar things happen, but mostly with system calls, I think. Otoh macOS X has all those mach ports? does UI go through that? But there are X11 servers for most platforms that integrate reasonably well, so it's not like you can't use X concepts there, it's just a bit more setup to get started. Windows also has RDP which can be used to run a program on one computer and display it on another.

The benefit of this approach is you can take better advantage of asynchrony... Many GUI libraries and toolkits run in a synchronous model where you do a request and can't continue until you get the answer. That's fine when everything is fast, but when there's a network between the client and server, it's better to send requests when you have them and only wait for the response when you need it. (See also xcb vs xlib)


Yeah on macos the communication between your app and the window server (which is conveniently called WindowServer) happens via mach ports. Most of it is undocumented, in fact anything more "low-level" than using AppKit is undocumented, although IIRC it is in principle possible to use undocumented CG* apis to create and manipulate windows yourself without going through the appkit layers. I think each CG* api is basically a thin shim that communicates to the window server, which has a corresponding CGX* implementation which does the actual logic. This article has some details https://keenlab.tencent.com/en/2016/07/22/WindowServer-The-p...


I might be wrong but I think the closest thing in windows land is RDP with regard to low level rendering of the display over the network


Afaik, Windows RDP used to do more complex remoting of drawing primitives so that the host computer would send the clients instructions for drawing individual elements.

But for many years now, they switched to an approach of rendering all the graphics on the host side and then sending a video stream to the client using standard video compression. I think this compression based approach scales better for the common use cases, especially office apps where the display is mostly static. I guess the approach has almost gotten good enough that you can play games remotely that way.


You got any source for this? I am only asking since I got the impression rdp was a superior protocol to vnc, nx etc because of the complex handling of graphical primitives. But I know next to nothing about the real technical details.


I've not looked at RDP, but I've implemented the X11 protocol and the VNC protocol.

The problem is that the more complex the UI, the sooner you reach a threshold where "just transmitting the bitmap" is faster and/or less data.

E.g. consider rendering a simple button with X11: You'd "just" send a request to render a rectangle, maybe fill it, and send the string for the label. 2-3 small requests. But then the moment you have a UI with a gradient, a drop shadow for the text, a differently shaped border, a shadow for the border, you suddenly add on enough requests that it's very easy for the numbers to look different. Especially because compressing these bitmaps reasonably well tends to be easy.

Modern X11 clients increasingly render into client-side buffers already because even when using X, that's often better when on a local machine and too few of us use it over the network often enough for that to be optimised for.

Having the option in the display subsystem of picking either based on what will perform best is a good place to be, but the more complex the UI the less often the simple primitives will be worth it.


It's also a non-trivial part of why modern UIs often appear to have higher latency than old ones...


It's a network transparent display protocol.

Real life scenario: you have a headless server that has GUI software installed at a remote site. You connect to the network from home with a VPN. Then SSH in to the server with "ssh -X user@server" and run the GUI program in that terminal. The GUI appears on your local display. SSH sends the X11 protocol traffic through an encrypted tunnel from the remote server to the local X11 display server, because of the -X.


In Windows, it's kinda split between the Windows Display Driver Model (WDDM) and the Desktop Window Manager (DWM). That's not a 1:1 match, though, as those two combined cover more components of a functioning whole than X11/XOrg itself does. X11 just split the components needed to draw everything you'd need for graphical environment into a different choice of layers.

X11 got network transparency out of the box (a sibling comment touches this), and the capability of switching out the components more easily, while Windows had less work to do to smooth out the overall desktop experience.


Ultimately X11 is the main component that you use to draw and interact with a GUI on a traditional Unix or on a Linux system (Wayland is a more recent alternative, and there have been other attempts in the past, but X11 is still the most commonly used).

There is no 1:1 correspondence on Windows. Some of what X11 does is part of Win32, other parts are in explorer.exe, others are built in at deeper layers; there are multiple alternative systems available on Windows too.

Ultimately what X11 gives you as an app is a way to draw something on the screen, and to get input events from users. X11 also coordinates this between multiple separate apps, so that you get movable windows and focus and other such behaviors without having to have each app coordinate manually with every other GUI app. Copy-paste is another similar cross-app functionality that X11 offers.

The way X11 does this is through a well defined protocol. Instead of relying on system calls, you open a socket to some known port and send drawing commands there, and receive input events from it (of course, many libraries abstract this for you). Because of this, it can work transparently regardless of whether you are drawing on the local machine or on a remote one. So, X11 itself can work as a Remote Desktop solution as well without the need of a separate program or protocol (though there are significant differences with pros and cons).


I am not very familiar with any of them at a low level but I think you can sum it up as Apple and Microsoft defined their display server as an api, a programing interface. MIT defined theirs as a protocol, a communication interface. With the intention that any api supplied should have a very flexible transport.


Same thing as Cocoa or the windowing API in User32.dll, but as a network-transparent client/server architecture (usually behind a library though, so doing things like opening a window is quite similar, except that Xlib isn't as ergonomic as Cocoa or even Win32).


Does anyone know of an example like this that shows a bitmap, without external libraries?

As far as I know, you have to ask the server for the active bit depth and send bitmap data in a matching way, meaning every client needs to have code to convert to every pixel format under the sun.

Maybe that's not true though, I'm not sure.


C also has structs, there is no need to assemble data structures in byte arrays.


With the caveat that structs are padded by default so that members are aligned on architecture-friendly boundaries.

For example, if a struct contains a char a followed by an int b, typically b would be at offset 4, even though sizeof(a) is 1.

To get tight packing, you need to explicitly tell the compiler to disable padding.


That's why where is __attribute__((packed))


There is no portable way to define structure alignment and padding in C. Your options are to make assumptions about how structs are aligned for a specific ABI, but which could be different on other ABIs or architectures resulting in broken code. Or to use compiler extensions to force packing which will make the alignment deterministic, but which could still blow up on other architectures if the alignment you used isn't valid on that architecture.

In some cases you could manually pad your structs to satisfy lowest common denominator architecture requirements, and use compiler extensions mark them as packed (which should be a noop on the architectures you've considered), if the protocol padding/alignment is compatible with those requirements. But the code still theoretically could break on ABIs you didn't consider.

Manually packing and unpacking bytes is the safest way to do things in production.


> There is no portable way to define structure alignment and padding in C

Of course there is __attribute__((packed)) and __attribute__((aligned(8)))


From product C code perspective, this code is of course bad. But here idea was to show data manipulation closer to how it is shown in documentation.

Also a lot requests have dynamic size and cannot be as easily packed and serialized to be sent over the "wire". So for example you could put pointer to string into struct, malloc required space, then assign pointer. After when you want to send data you would have to write serialization methods to put it in correct order. It is all fine and should be done when you build bulletproof system to manage X11. But here these techniques just draws too much attention away from X11.


Found the fresh grad


While under-researched statements aren't good, personal attacks don't help either. Prefer to educate instead.

C structs have packing, alignment, and endianness gotchas that can be portably addressed with byte-level de/ser.

Example: https://gist.github.com/cd089675a6088b0b8482c005a0e3897a


I think the “Windows” in the title should be written with small “w”


Yes, and the first sentence of the post should have "a window" rather than "a Windows", because as is it add to the title's confusion between X windows and MS Windows.


even if this is technical article, the title is very clickbaity.


Definitely, the actual article doesn't have a capital.


Dude, you lost me at "just use an array and fill it in".

Learn about struct packing and let the compiler fill in the "array" for you.

Anyway, a good way to learn stuff. Hard mode, FTW!


C structure packing isn't portable and doesn't account for host/network endianness. Buffer byte-oriented de/ser is the primary way of assembling data in C.


Yes. For production, struct are obvious choice. But here arrays were similar to how documentation specified data will be on the wire. So for educational purposes (to explain protocol) it seemed like a better choice. But I might be wrong and maybe structs would have been more approachable way to understand.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: