wait, isn't the whole "don't let code and data sit in the same memory" the whole point of the no-execute bit, which AFAIK is hardware enforced on any remotely modern AMD or Intel CPU? Granted it takes OS support, too..
Strict separation of code and data contradicts the definition and purpose of "computers." Single-taped Turing machines do not seperate code and data (it's a single tape and a single "memory space"). Computers are (supposed to be) Turing complete within finite resource limits, meaning they can be used to simulate a single-tape Turing machine.
I dont' really want another single-purpose appliance, so it's kind of a bummer to see the TSA "security" arguments (the article, not you) working to break general-purpose computing... (I think the best we can do for now is a trust system and, for me at least, I still have to see the code to have faith in that.)
As for the "principal of least privilege", well, that was the point of microkernels. We're all using Linux and Windows and Mac OS X (none of which are microkernels), so even if microkernels are "better" they may not be very "practical" for the moment (the old Tanenbaum–Torvalds debate).
> Strict separation of code and data contradicts the definition and purpose of "computers."
I don't believe that your argument here is correct. Code and data sharing the same memory space is not a necessary condition of being Turing complete. Just because a Turing machine works that way doesn't require any Turing complete computer to work that way.
On the other hand, the distinction between 'code' and 'data' when running a simulator is a little bit arbitrary.
> As for the "principal of least privilege", well, that was the point of microkernels.
"Code and data sharing the same memory space is not a necessary condition of being Turing complete."
See: Church-Turing thesis. If you can simulate a Turing machine (which can clearly be said to hold code and data in one memory space) then the hosting machine can also be said to share code and data in the precise sense given by the mapping of the simulated Turing machine onto the host machine (it's implementation). A computation in the simulated Turing machine is just a computation on the host, with some overhead. Conversely, a simulated CPU can't compute faster than it's host's CPU; if a Turing machine was simulated, the host was Turing complete.
"Principle of least privilege" implies a sandbox and IPC for every process, all so things don't run within the kernel, or any other process when possible. That sounds like a microkernel to me. You might say another "point" of microkernels (I don't have an exhaustive list) was to make design and debugging easy too, but those are just the same ideals inflicted on the programmer: keep it simple and homogeneous; the less you do, the less you can do wrong.
As you said, where to draw the line of separation is arbitrary. One could certainly make code and data (and processes generally) "more separate," but it's all at the expense of programmability (the general in general purpose) and speed (IMHO, why microkernels aren't everywhere), which brings us to the current compromises: monolithic kernels, OSs that warn you before installing, NX flags, patch Tuesday, and antivirus programs.
We build up an environment of temporarily-appropriate limitations from a blank (general) slate. If I ever feel too restricted, I put on my black fedora and perform a "jailbreak" (I reboot). I'm still better off with a malleable computer than with many individual limited tools. (Not that you suggested otherwise, just my 2 cents.)
Technically NX is the same memory, just different permissions applied to portions. It is possible to have an architecture where code and data are completely separate - http://en.wikipedia.org/wiki/Harvard_architecture
Data - noun - Data you haven't labelled as code—yet.
A python program is data that's just a text file. So. You know. There's that. Other things that are "data" but aren't: PDFs, Javascript and CSS.
The idea that you're going to be able to wave a magic Harvard architecture wand and fix bad inputs causing things to do different things than intended is a misunderstanding of the problem.
You sacrifice an enormous amount of flexibility and extensibility by enforcing such distinctions. Much of the power and elegance of languages like lisp comes from blurring the boundary between code and data. To the C etc. mentality, it's unthinkable, but in lisp you can maintain (modify/extend/fix) a running application, without having to unload and reload everything.
Not really. The problem is what you see as data can often be used to control code without modifying it. It doesn't have to run directly on the processor to end up being as powerful as code. You often find don't have to modify code to use the code the code that's already there to do what you please. This is true of a surprisingly large portion of modern exploits. A strict Harvard Architecture or the NX bit only helps with what is becoming an increasingly narrow portion of the attack surface.
Again, see also: Python, PDF, Javascript, CSS for a few different types of "data" which end up being as powerful as code.
A Harvard architecture can help in the same way NX does (as well as boost performance for fixed-purpose applications that rarely need to be re-programmed), but for it to still be Turing complete it's going to need some way to modify the executable code, including the potential for exploits, albeit Harvard architecture-specific ones.
The Harvard architecture could be used as the basis for a more rigorous trust model (iff the owner of the system controlled the root of trust.) Democracy is out of fashion though, so we would undoubtedly get something similar to what we have today with Redhat (for all practical purposes) "having to" pay for the right to boot Linux on a system "certified for Windows 8"...
Re: the second question, the problem with NX is that it only protects you from overflows where the attacker jumps into the buffer.
Overflows are still exploitable with NX. The attacker instead jumps to a series of fragments of library code[1]. Since libraries will always be executable, there's no problem (aside from the difficulty of finding the right chain of "gadgets").
ASLR goes some way into preventing return oriented programming (ROP) attacks, but it isn't bulletproof.
Well, it can be argued that any security feature can be circumvented in theory, which is why super-secure networks are fond of air gaps. The NX bit isn't really an air gap, just like ASLR, DEP, and so forth.
The features of a processor designed to protect itself from memory are really just stop gaps on the way to the next paradigm that supplants von Neumann, is I think what Watson is saying.
1) We think (but have not proved) that factoring large numbers is hard. We use this for cryptography. In theory the crypto could be brute forced, or maybe find some new method for factoring. In practice brute forcing would take longer than the Universe will exist and there is unlikely to be a breakthrough in factoring large numbers.
2) We think that a single overwrite of a hard disc platter is enough to destroy the information. No software exists that claims to be able to recover information that has bee over written once. No companies exist that claim to be able to recover data that has been over-written once. No university research exists showing recovery of data that has had a single overwrite. No criminals have been prosecuted or convicted with evidence recovered from a disc that's had a single overwrite. Everything we know suggests that a single overwrite is fine. But, because a well funded government might be able to recover that data we suggest that people do 3 (or 8, or 30something if you're being silly) over writes, or if the data is really important that people destroy the platters. In theory the data might be recovered, and so people have decided that in practice they will destroy the drive or overwrite more than once.
When talking about security it's a good idea to assume that someone can break whatever you're doing, and then ask if you need to do more, or need to do things differently.
Ironically, One Time Pad is breakable in practice, due to mistakes made, shortcuts taken[0] and side-channel attacks[1].
Besides, it relies on securely distributing the pad itself before information exchange can take place, which in turn is prone to the usual array of physical insecurity, design errors (e.g. using publicly available randomness), or, if distributed by a digital channel, to failures of the encryption used.
http://en.wikipedia.org/wiki/NX_bit
If that isn't working well enough, why not? Too much legacy code?