Brittle systems

amsha · on June 27, 2015

Security and fault-tolerance are cost centers, so unless they are explicit features of a deliverable they will be ignored. Planes, cars, and banks all have anomaly detection and auditing because people demand it. They understand the risks of failure. Conversely, very few people are demanding remote loopback facilities for IP. That's not a criticism of fault-tolerance, but if it's important then we should communicate more effectively. Why is fault-tolerance important? What goes wrong without it? What catastrophes could have been averted if we had considered it?

Edit for clarity

nickpsecurity · on June 27, 2015

This is a problem that goes way back. Look at Burroughs B5000, System/38, KeyKOS, VMS clustering, NonStop... all architectures that prevented or easily recovered from all sorts of problems. Market almost always chose against them with only two still marketable. There are currently inexpensive CPU's, esp embedded PPC, supporting lock-step along with high reliability OS's. Avoided in most deployments even for important systems. The old security engineering techniques of specifying all good/bad states, simplified implementation, non-bypassable TCB's, covert channel analysis, use of guards, and so on have been largely ignored in security industry despite empirical evidence of their benefit. Tiny niche in defense & safety-critical, as author says, that still knows some of this stuff. Example [1].

Everything mainstream, proprietary or false, just seems thrown together for a variety of reasons with few exceptions. Even stuff that needs to be better doesn't get that way. Further, I'm not even sure most have ever heard of the approaches that work: can't even get a good start without a good foundation to build on. I think the only solution will be a killer app needing resilience that does its thing right, uses all the right techniques, is affordable, is easily extended, and creates awareness of good engineering practices when people try to imitate it.

I can't see anything else working. Btw HN readers, regarding Byzantine etc, the linked document shows Boeing's Survivable Spread leveraged a trusted component to reduce fault-tolerance cost f+1 replicas for f failures minus a few use cases. Do any readers up-to-date on FT research know of advances in past few years for similarly minimal-cost schemes for FT or BFT?

[1] http://www.dtic.mil/dtic/tr/fulltext/u2/a425566.pdf

vezzy-fnord · on June 27, 2015

It seems like the only real survivors of that era are IBM and their big iron, particularly the POWER architecture. Pretty much everything virtualization-related can be traced back to IBM and the containers that people are swooning over today can be traced to LPARs in OS/400, though earlier examples can be found if you stretch the definition.

The Burroughs B5000 blew my mind when I read about it after Alan Kay mentioned it. To think that it used ALGOL as its machine language (among other things) in 1961 makes you want to weep when you see things like MMX and SSE instructions being the hot thing of the present.

listic · on June 27, 2015

To Intel's defense, there is a reason they resort to their approach of gradual improvements. In 1981, after a couple of successful iterations on their CPU, they did make a clean break with the past with iAPX 432 [1] - an entirely new 32-bit CPU designed to be programmed entirely in high-level languages. Hardly anyone seems to remember it, and for a reason: it failed, hard. Intel seems to have learned the lesson and didn't do this anymore.

[1] https://en.wikipedia.org/wiki/Intel_iAPX_432

nickpsecurity · on June 28, 2015

A good point. I didn't cite it here because its implementation problems were severe and the reason it failed. Yet, I do give credit in more thorough discussions on INFOSEC history because the architecture was pretty awesome for Intel. Then, they watered that down with BiiN's i960MX. Then, even less radical (in security) Itanium. They lost hundreds of man years on the first, around a billion on second w/ Siemens, and at least $100 mil sunk into Itanium w/ HP.

Intel certainly tried to make up for their x86 monstrosity several times. They got smarter and smarter about how they did so in terms of market acceptance. A combination of engineering mistakes and foolish market choices (imho) made them pay. As you said, they learned to give customers the insecure garbage they wanted or risk extremely large losses. Links below for those interested in the specifics of their better work.

Intel i432 Capability System http://homes.cs.washington.edu/~levy/capabook/Chapter9.pdf

Intel/Siemens BiiN System (w/ i960MX manual) https://en.wikipedia.org/wiki/BiiN

Itanium Security Features http://www.intel.com/content/dam/www/public/us/en/documents/...

leoc · on June 27, 2015

And then there was the i860 https://en.wikipedia.org/wiki/Intel_i860 (the original target for WinNT!), and later of course Itanium. Intel seems to have embraced the "x86 everywhere" vision much later than others (maybe 2004? http://www.anandtech.com/show/2493/4 ), only uttering the phrase in early 2008 http://www.computing.co.uk/ctg/news/1819476/intel-x86-future afaics. (Talk about awkward timing...)

vezzy-fnord · on June 27, 2015

Interesting, though this shouldn't be an indictment on HLAs in general, rather the iAPX 432 in particular. Same way the Mach server was only the tip of the iceberg in the sphere of microkernels.

nickpsecurity · on June 28, 2015

True: the LISP machines, Wirth's Lilith w/ M-code processor, ASOS embedded Ada system, JOP embedded Java processor, and Azul Systems' Vega processors show HLA's can work just fine. Even better than competing offerings in ways. :)

nickpsecurity · on June 27, 2015

Outside the mainframes, the survivors are IBM i (System/38 descendant), HP NonStop, Boeing SNS Server, BAE's STOP OS on XTS-400/500, and maybe Aesec's GEMSOS. That's not many... And B5000 blew my mind as well: so far ahead of its time then, even now somewhat, that I can't see how they came up with it given designs of the time. Must have had a time machine that gave them brief glimpses into the future of computer science or at least one real wizard on the team. Occam's Razor is clearly no fun here. ;) The System/38 architecture was also brilliantly designed in that it chose right tradeoffs to have much robustness plus being very practical. Both described here [1] in detail with others.

There are new projects copying some of the lessons learned such as Sandia Secure Processor (SSP/Score), SAFE (crash-safe.org), CHERI (Cambridge), and quite a number of academic/proprietary works. I suggested on Schneier's blog we could do what Geer thought was impossible by straight up copying the old NonStop architecture (published in detail) while swapping legacy CPU's for security-enhanced variants like above w/ extra I/O security. Five 9's, linear scaling, immunity to most attacks, and support for higher-level languages. I'll take 10! In theory, it might get down to a few grand a unit for each logical processor with careful management of development costs & sacrificing multicore for first generation. Be a nice root of trust for other systems security, administration, and recovery needs.

[1] http://homes.cs.washington.edu/~levy/capabook/index.html

[2] https://www.schneier.com/blog/archives/2014/04/dan_geer_on_h...

nickpsecurity · on June 27, 2015

Btw, look at the Flex article and its links. It was some wild stuff. I'd love to see efforts to do a modern take on it with top security engineering principles used. All fitting in on SOC and cheap board, for instance.

https://en.wikipedia.org/wiki/Flex_machine

leoc · on June 27, 2015

May as well mention the B5500 emulator at http://retro-b5500.blogspot.ie/ .

nickpsecurity · on June 27, 2015

That's pretty badass. It would be a pain to try to work with now given much of it is strange to a modern user or admin. Cool that they built it. My takeaway from Burrough's is to do much interface protection at compile time, pointers + code protection in memory during runtime, bounds-checking on arrays, simplified assembler, dedicated I/O processor/programs, and good recovery of failed components. That plus what we know from other systems would make for quite a resilient system combined with a RISC processor such as Gaisler's Leon SPARC I.P. or the Rocket RISC-V CPU.

guard-of-terra · on June 27, 2015

In the future, almost every action user might do on an information system ought to be rolled back cleanly and easily once presented with higher priority key. That will make e.g. account takeover not worth the hassle.

But right now all the systems we use are frighteningly brittle.

btown · on June 28, 2015

Even if everything was one system, how do you handle side effects? An order I make on Seamless causes physical inventory to be consumed. That's just de nature of the system. No way to undo that. So there's still a reason for account takeover there, if someone else could consume on my dime by hacking my account temporarily.

thaumasiotes · on June 27, 2015

> In the future, almost every action user might do on an information system ought to be rolled back cleanly and easily once presented with higher priority key

This won't work well unless the world contains just the one information system. That scenario has other problems.

guard-of-terra · on June 27, 2015

Regarding the bitcoin: you can't take over the blockchain while thousands of clients around the world have the "common sense" to not accept invalid blocks.

This means rogue clique might roll back a few transactions, but they can't steal money from static accounts or create coins out of thin air.

pjc50 · on June 28, 2015

You can't roll back information leakage to another system.