As I understand it, this mode gains performance by sticking with 32bit pointers in user-space instead of 64bit pointers, while still allowing for taking advantage of all the new registers etc available in x86-64 ("amd64") CPUs. (Many programs doesn't really need more than 4gb virtual address space, and so using full 64bit pointers can cause quite the overhead - imagine doubly-linked lists with small data structures, for example)
x32 is clearly a good technical solution. If you're running on a 64 bit kernel, x32 is superior to i686 in every measurable way.
But it's also yet another architecture. It's not binary compatible with either i686 or x86_64. You need your whole userspace compiled to use it. Middleware with embedded assembly (there's a surprising amount of this in glibc, and of course things like ffmpeg, libjpeg, etc...) needs to be ported. You can't run 32 bit binaries from proprietary sources.
And frankly the benefit over straight x86_64 is quite modest. I don't see x32 taking off. It's just not worth the hassle.
I thought that the work that goes into multiarch support would allow you to run a single kernel and mix&match x32 and x86_64 binaries on the same system, but I might be wrong (Of course that would require a separate set of any lib/dependency).
Some numbers mentioned on the x32abi page hint on anywhere between 4% and 40% performance gains, if that is true then I'd think the benefits would outweigh the hassle of another architecture.
(Edit: Most middleware also ship straight-C versions of the routines; whether or not an x32 C compiler can measure up to handcrafted x86 or x86_64 assembly I don't know - but I'm guessing the much higher register count would help a lot. Regarding proprietary software: There are a great many server configurations that do not need anything beyond the standard open source packages available in Debian)
A standalone x32 binary will run fine on an x64 machine. But if you want to link to any libraries, the library will also have to be x32. So an x64 system, which probably has 32-bit legacy libs as well as normal 64-bit ones, will also need a complete set of x32 libs for x32 to be practical.
Sure. But after the porting work has been done by the distribution vendor, it's done. The package manager software should be able to do whatever is necessary to almost transparently ensure any necessary dependencies are installed for the required sub-architecture. So I would imagine that in most cases, end-users won't notice any hassle except having the option to choose between x32 and x86_64 per package during installation. I think that sounds kind of neat :)
OK, modulo taking up extra space on already-cramped CD distros, taking more time to download updates, taking up more space on production hard disks, and having to download a new version of the software if your dataset grows over 4GB, it sounds good. :)
And extra memory taken up at runtime by having to load the other versions of the libraries. And the I/O costs of reading them in. I'd think that'd outweigh the performance benefits many times over in nearly all cases.
Note that there is no need to port every single program. You can still run a 64 bit userspace (or a 32 bit one for that matter), but for the apps where there is a major benefit use x32. This also means you don't have to port every library etc.
I wonder if there could be an alternative in hardware.
Could the cpu have a mode/flag where all pointers in cpu registers were treated as 32 bits (high 32 bits ignored)?
Alignment issues might make this impossible (loading a 32 bit address in a compatible way from memory/cache...) but it would be neat if it could work. For instance, if memory layout in 32 bit chunks were A... B..., there would have to be a way to load both A and B into the low 32 bits of a register.
I am not sure this is true generally: the JVM has been able to use "Compressed OOPS" aka 32bit pointers to objects managed by the VM in 64 bit platforms for a few years, even without any support from the OS.
Off course it's better when language/VM implementer take this issue into account, but for those who doesn't, just building a VM/interpreter for x32 architecture can be the only workaround.
Flat real mode was a hack mode on old 32bit intel systems to enable flat memory mapping with 16bit code. You'd jump into protected mode, map the full 4gig address space flat, then jump back into 16bit real mode without remapping that memory space.
Afterwards you could access all your memory in a flat address space rather than using 64K segment:offsets. You had to prefix your memory access instructions with a 0x66 byte though. Also, you had to write your own runtime library.
This is the exact opposite, Giving you all the registers, but not the memory address space.
I'm curious about what you're thinking about when you say "write your own runtime library" - wasn't the whole purpose of this "unreal mode" and staying in 16bit such that you would still have access to DOS and its API? (compared to switching to full 32bit mode, where DOS went out the window unless you implemented or embedded a "DOS-extender"?).
Yes you're correct about purpose, but many of the runtime functions would then be limited to the old 64K limit.
I rewrote all the string handling, memory management, etc functions in the Borland pascal and c++ runtimes to let me use all the memory without worrying about segments/offsets.
So, you got the DOS functions, but you also got printf, port io, memory functions, a bunch of stuff I can't remember, etc.
Basically you got the primary benefits (for the needs of the time) of pmode - flat memory, without most of the pain. Seems like the same situation here - you get all the new registers, but without the pointer overhead.
EDIT: One point to note is that your Code, Data and Heap segments all stayed 64K. This bit me in the ass once when I was presenting. Two bits of code had been tested separately when brought together went over the 64K code and heap limits. That was embarrassing.
Yup, the x32 ABI is pretty neat. For most architectures the transition from 32 bit to 64 bit pointers resulted in reduced performance from the extra cache pressure the bigger pointers caused. In x86-64, though, the extra registers and guaranteed SSE2 meant that you actually saw a speed increase.
Of course, there are performance advantages to bigger pointers too, sometimes. For instance, it can be easier for a garbage collector to identify pointers is the ratio of memory addresses in use to total addresses is small.
This reminds me of some of the fun and games we used to have with 8-bit computers.
8-bit computers running chips such as the Z80 and 6502 were limited to 16-bit addressing and a total of 64k of addressable RAM. For most computers earlier on this wasn't a problem, but in the later 80s when machines with 128k started to become more common this meant that there were lots of strange quirks to be considered.
For example, the Sinclair ZX Spectrum (Timex TS2000 in the US) originally came with 16 or 48k of RAM and 16k of ROM (64k of total addressable memory). When the +128k came out in 1985, Sinclair used a fairly interesting way of getting around the 64k limit. The memory was turned into 16k banks, which were the manipulated via a port. A bit more info on how this worked is available here[1].
Incidentally someone built a 4Mb RAM upgrade for the Spectrum![2]
Also, swapping memory in and out of the address space is a rather costly operation. And it becomes infeasible if the process's working set grows larger than the available address space [thrashing].
I think that on Linux, the 4G/4G split never made it, so in practice the limit is around 64GB due the limited amount of lowmem available to the kernel in the regular 3G/1G split.
I dual boot Ubuntu 12.04 32-bit and Windows 7 32-bit on my HP laptop, it has 8GB of RAM. When I am running Ubuntu I see 7.8GB (I'm assuming this is 1000 vs 1024 issue and doesn't concern me) in my system settings and resource monitor. When I am running Windows I only see 3.8GB of RAM in resource monitor, but I see the full 8GB if I look in device manager. Can somebody explain this to me?
I'm not sure this is true for discrete cards. Onboard GPUs will utilize good chunks of system memory (and this is configurable in the system BIOS generally), but discrete cards don't use any afaik.
It has been said here before. Linux fully supports PAE whereas Windows only uses it for execution restriction. Therefore you are not able to see anything beyond the 4 GB limit on Windows.
> Linux fully supports PAE whereas Windows only uses it for execution restriction.
That is not correct. Windows also supports full PAE, but adds licensing-based restrictions. The Datacenter and Enterprise editions of Windows Server 2008 will provide access to all 64GB RAM PAE can provide.
You might be interested to know that Windows keys don't discriminate between 32-bit/64-bit. If you can find installation media for the same type of x64 distribution (Home, Pro, etc) as your 32-bit, you can install x64 and use your current key. May be worth the reinstall to get access to your RAM.
Of course, you might make sure that appropriate drivers for your laptop exist for x64. I would hope any laptop that can accommodate 8GB would have x64 drivers.
Most likely your Ubuntu uses PAE, and Windows does not. NT kernel certainly supports PAE, but likely MS has disabled it in client systems for some reason.
I'm not an expert on these matters, but as I understand it, Windows is not ignorant of the fact that you have 8GB of RAM, it is simply unable to address it. So it makes logical sense that it would show you having 8GB, but only 3.8GB of usable RAM.
That is not correct, Windows is actually fully aware of the physical RAM on the machine and is technically able to use it, but has a license-based restriction scheme only letting a subset of the RAM be used.
Am I missing something, or does the top answer not address the question? Memory tricks aside, it seems the questioner is asking whether the physical RAM itself will ever be used. Short of PAE, unless my understanding is wrong, I was under the impression that the answer to this is no.
Even with PAE, many Windows versions hard limit the available RAM to 4GB (or less). So his answer isn't even correct given the reality of many operating systems & hardware.
> Short of PAE, unless my understanding is wrong, I was under the impression that the answer to this is no.
PAE is used everywhere and has been around since 1995 Pentium Pros. So the answer is yes, a 32 bit OS/CPU can use 8 gigabytes (and more) of physical memory and an infinite amount of swap space. It all depends on the CPUs physical address size, not virtual.
My "64 bit" CPU has 36 bit physical addresses and 48 bit virtual addresses.
> My "64 bit" CPU has 36 bit physical addresses and 48 bit virtual addresses.
Just to clarify for anyone that doesn't catch what you mean and thinks PAE is the devil:
x86-64 CPUs must support in PAE when operating in long mode (64 bit mode). Further the version of PAE used is 'classic PAE' with an additional translation layer added to get some more physical address space. In other words even 64bit CPUs don't have a real 64bit physical address space, they still use PAE to access large amounts of ram.
Holy Baader-Meinhof phenomenon. I ordered 8 GB of RAM for my HP laptop last night and spent well over two hours reading about this exact subject to see if I needed to install Windows 64-bit when it arrives. I think I even stumbled upon this same page.
That being said, I still don't fully understand; Once I install the new RAM, will I only receive the benefits if I perform a new 64-bit install of Windows 7, or will upgrading just amplify the benefits? (I currently have 3 GB in 1 GB + 2 GB as the default factory configuration.)
Sorry if this question is dense of me, but my field is economics and the more technical aspects of HN are often lost on me
> Once I install the new RAM, will I only receive the benefits if I perform a new 64-bit install of Windows 7
That is correct, the alternative is to use a ($1000+) Server edition, some of which do give access to >4GB in 32b.
If you put 8GB RAM in a laptop running a 32b non-server Windows, you'll simply see 4GB.
You may see very small improvements nonetheless due to dual-channel activating (when pairing identical sticks of RAM, the CPU is able to "talk" to both at the same time rather than one at a time, doubling bandwidth), but I rather doubt it. The gains are usually insignificant. And you won't get any gains from actually increasing the amount of RAM in the machine.
This is not true. Did you read the answers? 32-bit OSs with PAE support can use all the RAM. Most processes will only be able to use 4GB, but that's per process, not for the whole system.
So AD1066 doesn't need a 64-bit Windows, just any 32-bit Windows newer than XP.
> So AD1066 doesn't need a 64-bit Windows, just any 32-bit Windows newer than XP.
No. Non-server 32b Windows will never give access to more than 4GB RAM to the user unless you patch the kernel. XP 32b, Vista 32b and 7 32b — including Ultimate for the latter 2 — all restrict "available physical memory" to 4GB in all situations. So do many server editions as well, for Windows Server 2008 x86 for instance, only the Enterprise and Datacenter editions will give access to more than 4GB RAM (up to 64).
A 2008 Enterprise license is ~$1500. I doubt ad1066 has any desire to pay such a price.
This seems like an appropriate place to ask this question. At work I have an i5 with 8 GB of ram. Unfortunately we're stuck with XP 32. So is there a away for me to use more memory over all? Specifically I have to have Outlook and my VirtualBox Linux machine running at the same. I'd like to use bank switching or PAE to give my VM 4 GB. While all of the other applications just fight out for the remaining 4.
Technically, all modern Windows versions have PAE enabled (the NX bit requires enabling PAE for instance), but Windows adds license-based memory restrictions on top of that.
That's how the 2GB limit on Starter, the 4GB limit on non-server and the various limits on server editions are implemented.
>> a process can have more memory than address space
How does that work? The best I can come up with is the author is referring to mmap? In which case the address space is still limited to 4Gb although I concede that with the OS pagefile it would be possible to have more than 4Gb in core at a time via judicious use of mmap.
You need to ask the OS to remap your address space to different area of physical memory. Windows has the Address Windowing Extensions[1], which allow processes to reserve multiple blocks of memory and switch between them.
How does one swap memory in and out of address space?
I understand this can be done with files via mmap but not sure about RAM, since in that case would you not need to know the physical memory addresses of what you wanted to swap in or out? Isn't this information deliberately hidden from userspace applications?
Most of the responses to the question seem to be "sort of" correct but missing important parts of the overall picture. Further, a complete answer to the question is going to be subject to the operating system the given application is running under as the specifics are going to be heavily dependant on OS implementation.
I can't speak for Unix systems, but on the Windows side, it's complicated and messy:
PAE (Physical Address Extensions)
Client editions of Windows do not support PAE while server editions do. If you want to physically address more than 4GB of RAM you need PAE. Realistically, you're unlikely to be able to address even 4GB under non-PAE x86 due to reserved memory ranges from hardware taking a chunk.
Funfact 1: As for why PAE isn't supported on client editions, this isn't a licensing problem (you can buy x64 editions for the same price that have massively increased RAM limits), but rather, Microsoft playing it safe with driver compatibility. Apparently, during testing a non-trivial number of drivers were found that made assumptions that don't hold under PAE and thus broke horribly. Things like using the high bit of a 32-bit pointer for data storage. Rather than risk a tsunami of bluescreens caused by poorly written drivers (which people would blame MS for out of naivety) the decision was made to simply not support PAE on client editions. Server editions do as there's the reasonable expectation that servers should be running higher quality drivers.
Funfact 2: Technically, PAE is supported on client editions, as it's required for hardware DEP support, so it's more a hard limit on addressable physical memory than it is a lack of support for PAE itself.
AWE (Address Windowing Extensions)
This is a Windows-specific capability (unlike PAE, which is a hardware capability provided by modern CPUs) which allows a process to address more memory than it has virtual address space. Further, it works hand-in-hand with PAE as you by definition need PAE support in order for the OS to address memory higher than 4GB in the first place. However, in order to use AWE, the application has to be explicitly written to utilise it as the usage of AWE requires the application itself to take a far more active role in managing its memory. AWE support is indicated via a flag in the PE header of the application binary. Needless to say, a tiny minority of applications have this support. The only one I could name right now would be Microsoft's SQL Server.
4GT (4-Gigabyte Tuning)
This is one other Windows-specific capability that isn't really related to a discussion on 4GB limitations but inevitably will be raised regardless. While PAE provides for the OS to address more than 4GB of physical memory, and AWE provides for an application to address more memory than in its virtual address space, neither of these features increase the amount of addressable memory at any given time in an application (ie. AWE allows you to change your view of memory, but not address more than 2GB of memory simultaneously). A Windows process in 32-bit has a 2GB virtual address space, with the remaining 2GB mapped to the kernel address space. 4GT, in contrast to PAE and AWE does allow you to modify the address space available to a user-mode application. By providing the appropriate bootloader parameter you can modify the usermode/kernelmode split to skew it up to a maximum of 3GB/1GB user/kernel-mode respectively. This used to be done quite a lot on large Terminal Servers for example, or to optimise database servers, however, the ramifications of doing so need to be carefully understood. You are reducing the amount of memory available to the kernel, and so, memory hungry drivers may now fail to load, or worse, outright crash the system if they haven't been coded to properly handle low memory conditions.
EDIT: I forgot to mention, that like AWE, an application won't even use the increased address space 4GT provides unless it is marked as capable in its PE Header. Merely enabling the setting won't get you anything unless your application supports this capability. Firefox, as an example, is large address aware and being 32-bit, can thus address up to 3GB instead of 2GB. While on 32-bit systems this capability is not likely used (how many 32-bit Windows users have modified their bootloader parameters to support 4GT?) it is fully used on 64-bit Windows and can use the full 3GB address space out of the box. And god knows Firefox needs it...
In summary, in the age of x64, I tend to view all of the above as far more trouble than it's worth. The virtual address space of a Windows 64-bit process is currently 8TB. Windows 7 Professional and up give you 192GB of physical memory, while Windows Server 2008 R2 Enterprise and up give you up to 2TB. So if at all possible, avoid 32-bit, 64-bit is the solution; PAE, AWE & 4GT were only ever workarounds.
If any of the above is incorrect, please correct me, as it's been a while since I studied much of the above, and if anyone can shed more light on the Unix situation, I'd be interested (in a very nerdy way).
> Client editions of Windows do not support PAE while server editions do.
That is not correct, client editions fully support PAE.
> As for why PAE isn't supported on client editions
It is supported (and enabled), the limits are implemented separately and independently.
> this isn't a licensing problem (you can buy x64 editions for the same price that have massively increased RAM limits)
Windows 7 Home Basic x64 will only make 8GB RAM available, Premium will restrict to 16GB, professional and up will allow up to 192GB. Meanwhile Server 2008 R2 Enterprise and Datacenter allow 2TB. These are very much licensing restrictions, as is Windows 7 Starter (x86)'s 2GB limit.
> but rather, Microsoft playing it safe with driver compatibility.
That is the excuse they give for it, yes. It just happens to make no sense when applied to the x64 limits I quoted above, to Starter's 2GB limit (and Vista Starter's even lower 1GB) or to e.g. Windows Server 2008 R2 Foundation's 8GB limit (2008 R2 only runs on x64 and Itanium so 32b isn't even remotely a factor)
> so it's more a hard limit on addressable physical memory
It's not "hard", since it's merely based on the licensing mode of the kernel. It can even be hacked out.
As I understand it, this mode gains performance by sticking with 32bit pointers in user-space instead of 64bit pointers, while still allowing for taking advantage of all the new registers etc available in x86-64 ("amd64") CPUs. (Many programs doesn't really need more than 4gb virtual address space, and so using full 64bit pointers can cause quite the overhead - imagine doubly-linked lists with small data structures, for example)