How to Achieve Polished UI

panic · on May 28, 2017

As careful as you may be, it’s practically impossible to write software that will remain perfectly fluid when the UI can be blocked by arbitrary processing.

Well, that's what they did! All UI events and layout on the original iPhone were handled on the main thread. I doubt asynchronous layout or event handling would have improved the experience on its single-core CPU.

The key technical advantage the original iPhone had was Core Animation, which composited the laid-out views and applied animations to them in a separate process. It ensured that all views would appear at the correct position in their animations each frame with no jitter, and kept most of the per-frame work in one place. But the animations were all initiated on the same main application thread that handled events, performed layout, and so on.

doomlaser · on May 28, 2017

Also no Java, so no arbitrary garbage collection sweeps causing hitches, and significantly less memory footprint overall.

bitmapbrother · on May 29, 2017

Android executes native ARM code. Additionally, average GC pause times are now 0.4ms in Android O. To clarify, that's not 0.4ms for every frame - that's every time the GC needs to do a STW. How many milliseconds does ARC lose during its reclamation of memory on iOS? As for memory usage, Android N did use a bit more memory than iOS apps, but that's also been reduced by Android O.

https://www.youtube.com/watch?v=iFE2Utbv1Oo&index=110&list=P...

As for your claim that iOS apps have a significantly less memory footprint overall, well, you would also be wrong about that.

https://youtu.be/lCFpgknkqRE?t=6m27s

flukus · on May 29, 2017

I've heard "java can be as fast as c" for 20 years now and "android is fast and smooth" for 10 years and my experience has always been the opposite. At the point I wouldn't believe them if it was true, there is no credibility left.

bitmapbrother · on May 29, 2017

ART isn't a JVM. And Objective-C isn't as fast C. There's a reason all of the games and apps that require high performance are written C/C++ on both platforms.

beerbaron23 · on May 29, 2017

Errrr Android ART is a VM and no you can't write games in C/C++ on Android. While technically it can be done it's pretty much impossible to get it off the ground and running because of the VM in place.

There is a website that lists the games people have created to try it successfully, they give links to couple of them which made it into the Android store and basically you can tell by all the feedback that the game won't load. You can try to run the games too and see if you're lucky, I wasn't....

Objective-C is very quite fast, several notches faster then Java apps

zigzigzag · on May 30, 2017

That's completely wrong I'm afraid. ART is a standards compliant JVM and Android has supported the native development kit (NDK) for C/C++ apps for years. You can't do everything you can with the Java APIs, but you can open windows and get a GL context with it, which is all most games care about.

The speed of Java is very dependent on the sophistication of the compilers used. Java will walk all over Objective-C if run on a fast server class machine with an advanced JVM like HotSpot.

bitmapbrother · on May 29, 2017

I didn't say ART was not a VM. I said ART is not a JVM. There is a difference.

>and no you can't write games in C/C++ on Android

Of course you can. All of the high end games are written in C/C++. Java, Kotlin, C/C++ are the official languages of Android.

>While technically it can be done it's pretty much impossible to get it off the ground and running because of the VM in place.

Apps that use NDK do not use ART.

>Objective-C is very quite fast, several notches faster then Java apps

No it's not. Please provide proof that Objective-C is several notches faster than Java apps.

krona · on May 28, 2017

The Core Animation animation/render/compositor thread has realtime priority.

yorwba · on May 28, 2017

> All UI events and layout on the original iPhone were handled on the main thread.

But what else happens on the main thread? The way I understand the article, UI and input are delegated to a dedicated compositor thread to prevent the heavy processing on the main thread from interfering with responsiveness.

I would assume that iOS also separates the timing-sensitive UI handling from anything that might take longer than a single frame to process. Either way, you end up with one thread doing the heavy lifting and another keeping the UI responsive.

jsiepkes · on May 28, 2017

I don't understand. Which GUI toolkit works differently then the way described in the article? Afaik all UI toolkits have a single GUI thread.

aaronbrethorst · on May 28, 2017

Texture, née AsyncDisplayKit, for one: https://medium.com/@Pinterest_Engineering/re-architecting-pi...

amirouche · on May 29, 2017

I am not familiar with this kind of framework. Where can I learn more about it?

zigzigzag · on May 28, 2017

I found the article a little confusing to be honest. I wonder if the author has written a traditional widget toolkit that isn't Firefox oriented.

In old widget toolkits, going back to the 90s here, there was a single UI thread per app that did all drawing and sending of commands to the graphics hardware. Keeping the UI responsive on such toolkits simply meant doing things as much as possible in the background. Touching the UI data structures from other threads was forbidden.

This architecture was adopted due to painful experiences with attempts to build thread-safe toolkits in the 80s such as Motif and the original Win32 widget library. None of it worked very well. Motif apps tended to be deadlock prone and Win32 was just a total API nightmare because it tried to hide the thread affinity of the underlying widgets, but didn't do a good job of it.

Some systems in the 90s like NeXT and BeOS started experimenting with moving the rendering into a separate process, the window server. Note that X Windows, despite having a window server, did not use "retained mode" rendering and still required the app to respond to do every repaint such as if an occluded window was moved to the top. Systems with this sort of retained mode rendering pushed "draw lists" into the window server so the OS could draw the window from memory without having to wait for the app to respond. This used more memory but meant that overall window UI stayed responsive and fluid even if apps were under heavy load. However, anything that could change the UI like needing to respond to user input, of course stayed in the app and on the UI thread.

MacOS X introduced a variant of the design, which I know less about, but I believe it basically just stored fully rendered copies of the image. Very RAM intensive and one reason MacOS X was considered very slow and heavy in the early days, but it made it possible to do things like the genie effect and exposé later on where the window server could animate the contents of windows without the app needing to be responding.

All that is OS level compositing. The app itself did not do any asynchronous compositing. So dragging windows around was fast, but animations inside the app didn't benefit.

So the next level of asynchronicity is toolkits that push app level rendering into a separate thread too. iOS, JavaFX, modern versions of Qt and modern versions of Android work this way. In these toolkits, the app's GUI is still constructed and manipulated on the primary/UI thread, but when the main thread "renders" the UI, it doesn't directly draw it, it constructs a set of draw lists for the apps own use. Again, these draw lists look a bit like this:

1. Clear this area of the window to this colour.

2. Draw a gradient fill from here to there.

3. Draw texture id 1234 with that shader at these coordinates, at 50% opacity.

4. Invoke remembered draw list 111.

5. Remember this set of instructions as draw list 222.

Once these lists are created they're handed off to a dedicated render thread which starts processing them and turning them into commands to the GPU via an API like OpenGL or Direct3D. Note that these APIs are, in turn, simply creating buffers of commands, which eventually get dispatched to the GPU hardware for actual rendering. Because the render thread doesn't run any callbacks into app code, and because it's cooperating with the GPU hardware to remember and cache things, it doesn't have that much actual work to do and can process simple animations very fast and reliably.

However, responding to user input is still done on the main thread. If you block the main thread, your UI will continue to repaint and may exhibit simple behaviours like hover animations, but actually clicking buttons won't work. That's because the most common thing to do in response to user input is change the UI itself in some way, and that must still be done on the main thread.

I hope that helps.

lenkite · on May 28, 2017

Thanks for this very informative post! However, I think you disparage good old win32 just a tiny wee bit. It was quite well designed for its time - perhaps a little too ahead of its time. Everything was async, to communicate between windows you needed to post events to queues, you could customize the window classes any way you wished - it was perhaps an extraordinarily sophisticated and flexible framework and lots of talented folks made it dance. It had a really good and long 20+ year run. (and still running strong in some desktop software)

derefr · on May 28, 2017

To me, Win32 was already falling away from the more intriguing model, which was Win16.

Effectively, a Win16 system is/was basically exactly equivalent to an Erlang node, but one where your "processes" just happened to be paired, component-wise†, with handles to structs of GUI properties held in window-manager memory.

Like an Erlang node, Win16 is/was:

• green-threaded — i.e. they both have very low-overhead in-memory structures containing a tiny heap/arena and a reference to an in-memory delegate code module, through which execution would pass in turn. In Erlang, these are "processes"; in Win16, these are windows—i.e. actual windows, but also "controls" in the controls library.

• cooperatively-scheduled (yes, Erlang is cooperatively scheduled—when you're writing C NIFs. When you're executing HLL bytecode in the Erlang VM, this fact is papered over by the call/ret instructions implicitly checking reduction-count and yielding; but if you're writing native code—like in Win16—you do that yourself.)

• Message-passing, with every process having a message inbox holding dynamically-typed message-structs that must be matched on and decoded, or discarded.

• Offering facilities to register and hold system-wide handles to large static data-blobs (Erlang large-binaries, Win16 rsrc handles);

• Capable of doing IPC only by having processes post messages to another process's queue, and then putting themselves into a mode that waits for a response;

• Based on a supervision hierarchy: in Erlang, processes spawn children (themselves processes) and then manage them using IPC; in Win16, root-level windows spawn controls (themselves windows) and then manage them using IPC.

And, crucially, in both of these systems, real machine-threads are irrelevant. Both Win16 and Erlang were written in an era when "concurrency" was a desired goal but multicore didn't yet exists—and so they don't really have any concept of thread affinity for processes/windows. Both systems are designed as if there is only one thread, belonging to a single, global scheduler—and then their multicore variants (SMP Erlang and Win32) attempt to transparently replicate the semantics of this older system (though in different ways: Erlang allows processes to be re-scheduled between scheduler threads, while Win32 pins windows to whatever scheduler-thread they're spawned on.)

Win32 later introduced an alternative model to take better advantage of multithreading: COM "multi-threaded apartments", allowing windows (or, as a generalization, COM servers, which could now execute and participate in IPC on a thread without spawning any window-instances) to interoperate across thread boundaries without requiring a message be serialized and passed through the scheduler/window-manager process.

† http://cowboyprogramming.com/2007/01/05/evolve-your-heirachy...

zigzigzag · on May 30, 2017

Well, I hope we're not getting sentimental about Win16. I've written more than my fair share of Windows API code and even though some of the concepts are now coming back into fashion again, largely due to the limits of browser engines, it really isn't an era I'd return to at all. Those models were all abandoned for solid reasons and Apple's failure to do so in time nearly killed the company.

I'm not sure your description of COM is quite right. The way I remember it, windows (HWNDS) were and still are objects with thread affinity. COM had the notion of a "single threaded apartment" which basically meant the COM server received RPCs using regular Window messages, and the MTA that you mention simply meant no inter-thread marshalling was done at all i.e. the object was inherently thread safe using locks or whatever. But Windows never changed to a model where the controls library was thread safe: changing the contents of an edit box from another thread, for instance, always required a context switch.

COM's usage and abusage of the window message system for fast inter-thread switching was only ever an ugly hack, which caused all kinds of weird problems and glitches. Most obviously it caused Windows' reliance on actually having a GUI layer to deepen considerably because now inter-thread/inter-process RPC - that on Linux and MacOS were well modularised into things like Mach IPC, SunRPC, DBUS etc - were totally tied to the windowing system.

IIRC the entire apartment concept was also stupidly designed, so there were constant problems with Microsoft using COM internally to implement some APIs, which would by default enter an STA and require the _caller_ of the API to pump the message queue otherwise the API they'd just used would silently fail to work. In the era I was working with it that fact wasn't always properly documented, I think.

derefr · on May 30, 2017

> COM's usage and abusage of the window message system for fast inter-thread switching was only ever an ugly hack

This is effectively the entire point I wanted to dispute in my original post above; I guess I didn't get it across clearly enough.

As can be seen from how pre-COM (e.g. DDE, OLE) IPC was achieved on Windows, Microsoft truly believed that sending messages through the window-manager was a good way to do IPC. Their designs just kept doing it, over and over. STA COM messaging wasn't a hack; it was more of the same, a doubling-down on a long-standing design paradigm. MTA COM messaging was the hack—a way to make everything continue to look like HWND messaging (with an abstraction layer added), but have it transparently optimize to SHM IPC in cases where that was beneficial [and where the developer had ensured their ADTs were compatible with it.]

> Most obviously it caused Windows' reliance on actually having a GUI layer to deepen considerably because now inter-thread/inter-process RPC - that on Linux and MacOS were well modularised into things like Mach IPC, SunRPC, DBUS etc - were totally tied to the windowing system.

And, driven by the "evidence" of this repeated doubling-down above, I would conclude that this was the point: Microsoft considered Windows to be about, well, windows.

As I was saying above, a "window" in the Win16 sense was effectively the same thing as an Erlang process, but with some extra (optional-to-use!) GUI data stuck to it. The "correct" way to achieve async parallelism in Win16 was literally to create a "background window" that would register a kernel timer to send it tick events, and then do work when it received one. Which is the same thing you do if you want to write an Erlang process to wake up and poll some data source every so often.

My point isn't just that there are parallels here; my point is that Microsoft expected you to use the "window" primitive in exactly the ways that Erlang expects you to use the "process" primitive. Windows are the "process" primitive of Win16—they're tiny, green-threaded processes, and the window-manager is their scheduler.

That statement should make Microsoft's views on IPC clearer. Of course Windows IPC is achieved by putting messages through the window-manager. The window-manager is the scheduler†; knowing about other window-processes and routing messages to them is its job. It is DBUS—and it is also, given DDE, the equivalent of macOS's LaunchServices daemon.

---

† ...or rather, the window-manager is the scheduler for anything that's not a DOS VM. Windows, from 2.x through to 9x, was effectively a two-layer system: a bare-metal hypervisor "kernel" (KERNEL.EXE/KRNL386.EXE) with one Windows dom0 and N DOS domUs; and then an OS "kernel" running in that Windows dom0. That dom0 OS kernel is GDI.EXE, and cooperative message-passing is its scheduling algorithm. It also happens to do graphics. (It's a paravirtualized kernel that relies heavily on the hypervisor kernel above it, yes, but it's still the kernel of the Windows domain.)

panic · on May 28, 2017

The part about what iOS does isn't quite correct—apps pass a high-level layer tree to the render server with properties like corner radius and shadow offset, not a list of low-level drawing commands involving textures and shaders. That lets you animate these high-level properties without sending over new commands. I think the basic history is right, though!

zigzigzag · on May 30, 2017

Thanks for the correction - that's neat. They really pushed a lot of high level detail into the draw lists!

renox · on May 29, 2017

> Note that X Windows, despite having a window server, did not use "retained mode" rendering and still required the app to respond to do every repaint such as if an occluded window was moved to the top.

My memory are quite fuzzy but I think that this is incorrect here, what you're describing was only the default behaviour: you could program your application to tell the X Window server to use a 'backing store' for your window and it would memorize your window to re-render it by himself.

zigzigzag · on May 30, 2017

There is/was an extension that did that at some point, but it was never used (on Linux) due to poor implementation and a desire to target low RAM machines. Mac's backing store implementation was quite heavily optimised - macOS can do on the fly memory compression and the window server uses shared memory to obtain the window bitmaps.

mjburgess · on May 28, 2017

Do you really mean single thread, or do you mean single core?

It seems doubtful it was on a single thread with those response characteristics.

dep_b · on May 28, 2017

It's quite useless without examples. I know some frameworks do all rendering in a separate thread but you almost completely forfeit all UIView based rendering. With the native UI toolkit you always go back to the main thread so I only put long running processes like network requests, image rendering or database access in a separate thread.

korijn · on May 28, 2017

Reminds me of common practises in game engines.

BatFastard · on May 28, 2017

They should really get game developers to write the underlying system and interface.

No one understands the issues associated with smooth fast rendering better than serious game devs. Instead most game devs learn to jump through fiery hoops to get around OS issues.

subb · on May 28, 2017

Yea. It's not very hard - if you want something responsive, you have a limited amount of time between when you read input and when you present something on screen.

This delta can be greater than the rate at which you present images. For example, most game engines take 30ms to read input and simulate on a thread, then take 30ms to renders on another thread while the next frame is simulated. This means that they present something every 30ms, but the delay between input and rendering is 60ms. This delay can be up to 100ms and not be noticed by players.

pikzen · on May 28, 2017

A 100ms delay is extremely noticeable, especially for faster paced games. That's 6 frames at 60FPS, or 3 at 30. While you may get away with that on 4X games, no FPS will ever get to more than 50ms input lag, because it seriously becomes unpleasant to play.

bartread · on May 28, 2017

I might be misunderstanding what you're saying here, but from what you've said it sounds like the game would run at ~30fps, whereas many games are designed ideally to run at 60fps, which would allow only ~16ms to take input and simulate?

retox · on May 28, 2017

In landscape mode on mobile there is a bar that takes up 20% (at a guess) of the screen just to contain a single icon.

Does everyone browse mobile in portrait? My eyes are horizontally arranged.

algesten · on May 28, 2017

There's a reason for reading text in columns – it's just easier!

Everything from newspapers to books do it.

And it's quite interesting to try layout a text width wise. Just paste some long text into an html-page or text editor and then go full screen width. I find it almost impossible to keep track of where I'm looking. I'm much slower reading like that.

retox · on May 28, 2017

I wrote a script a long time ago that turned a chunk of text into a pseudo Babylonian text. At the end of the line the letters of each word reversed so your eyes could scan back and forth to reduce eye strain and losing place.

I thought it was quite readable and beneficial but others didn't.

dr_zoidberg · on May 28, 2017

This needs further explanation, ideally a picture showing the effect. I don't really think I've understood what your script did.

zem · on May 28, 2017

the technical term is "boustrophedon" (etymologically "in the manner of an ox ploughing a field"), google it up for some nice examples

jankins · on May 28, 2017

I believe he means it would rewrite

Line one

Line two

As:

Line one

owt eniL

To save your eyes the travel distance to the line origin in order to read the next line.

retox · on May 29, 2017

Yes, mostly like this. Not sure why I thought it had anything to do with Babylon... My script had two modes, the one above and a different type which I preferred.

The first line was as normal

reversed was second the then

but only the order of the

the of letters the not words,

words.

dr_zoidberg · on May 29, 2017

Thanks, now I did get it. Guess for me it was easier seeing it than reading a description!

amenod · on May 28, 2017

Do you have a demo / image somewhere? Sounds interesting...

retox · on May 29, 2017

See above in this thread if you didn't already.

ivanhoe · on May 28, 2017

How do you hold your book, in landscape or portrait mode? Pages in almost all scripts are traditionally always in portrait orientation.

gurkendoktor · on May 28, 2017

Pages are also traditionally larger than a cellphone. On a 4"/5" smartphone, text in landscape orientation is much closer in font size and line length to a normal book than in portrait orientation.

retox · on May 28, 2017

A book can only be oriented portrait, I don't have a choice.

sugarpile · on May 28, 2017

Yes, everyone browses in portrait

dEnigma · on May 28, 2017

I mostly browse in portrait, unless the page has a weird layout; and I hardly ever see people browsing in landscape mode, but this is of course all anecdotal.

ricardobeat · on May 28, 2017

No such problem on Opera Mini (which doesn't do complex JS or position: fixed :)

wruza · on May 28, 2017

It is ironic how the author is speaking about polished iPhone UI and his blog has big stupid "to the top" button^ fixed at the bottom, covering two lines of the text. Every iphone user knows that he can go to the top by tapping above the address field, no buttons required.

^ not actually a button, but "polished" invisible plane that randomly cuts text.

Can we please, please return to the era when ui was humane?

PretzelFisch · on May 28, 2017

not every iphone user. magic UX like that is not very discoverable.

wruza · on May 29, 2017

I'm pretty sure it was described on small introduction booklet in iphone 4 package. "Shake to undo" was also there.

Do you guys ever read instructions? :)

PretzelFisch · on May 29, 2017

I don't recall paper instructions in the iphone 6 package. Maybe they put up an ill timed tutorial on screen with the first boot. I needed to skip that to finish setting up the phone in store. Sometimes I think getting a user to read instructions is the worlds hardest problem.

cheetos · on May 28, 2017

I didn't find out about the address bar thing for years. That said, I am not a fan of reproducing natively available functionality in the browser.

Burritamos · on May 28, 2017

>and his blog has big stupid "to the top" button^ fixed at the bottom, covering two lines of the text.

I don't see what you are talking about. There is no such button for me.

wruza · on May 29, 2017

I tried to say that it appears only on mobile, but now there is no such button! Seems that author changed blog theme, because floating "+" button also transformed into normal "menu" button at the top.

His blog appearance is now ideal in mobile ux. Pretty operative!

retox · on May 28, 2017

I posted the same, thanks for the vindication.

an27 · on May 28, 2017

Would anyone here have resources on writing similar highly-responsive GUI frameworks?

I'm considering writing a Rust-based one with Linux, BSD, macOS, Windows and Android backends.