GameCube Emulation and Pixel Processing Problems

hibikir · on March 15, 2014

Since the Dolphin team puts playability above accuracy, the emulator can keep moving the goalposts towards perfect emulation, while still remembering that an unplayable emulator does nothing to preserve the source material.

Compare this to the MAME situation, where most early 3d arcade games are completely unplayable, and will remain like this for the foreseeable future, because the way computers are going, there is no way in hell we'll be able to emulate those old, custom graphics cards with full accuracy and full speed using only general purpose CPUs.

magila · on March 15, 2014

Last time I messed around with MAME pretty much everything which was playable ran at full speed, even stuff like STV, Namco System 22, and Seattle based games. This was on my 2600K running at 4GHz, which isn't exactly the latest and greatest.

From what I understand the most demanding thing to emulate isn't the 3D hardware anyways, it's the high clock rate CPUs. Most of the latest 3D hardware supported by MAME is implemented with ASICs which have a relatively high level interface. This allows the emulation to be pretty highly optimized since the low-level details aren't visible to the game software anyways. CPUs are a whole different story though. There you are stuck emulating individual instructions which gets pretty hairy when the CPU you're emulating can do 200+ MIPS.

sadfnjksdf · on March 15, 2014

> no way in hell we'll be able to emulate those old, custom graphics cards with full accuracy and full speed using only general purpose CPUs

I see your logic there. Claim there is no way in hell it will happen and it will significantly increase the probability that it will happen.

heydenberk · on March 15, 2014

It's interesting that this has ramifications for the emulated console(s) within the console. On one hand, it's an amazing technological achievement to emulate a system well-enough that it can emulate another. On the other hand, however, it's staggeringly inefficient. Emulating a system is inefficient, but it makes sense because it prevents the need to keep all kinds of hardware 0n hand. Emulating a system that emulates a system compounds the inefficiency and is unnecessary, but is a really cool achievement.

delroth · on March 15, 2014

Fun fact: Dolphin emulates some N64 games better than current PC N64 emulators do. For example, Mario Tennis (N64) is considered very difficult to emulate, but the official Nintendo emulator running in Dolphin has almost no problem with it!

heydenberk · on March 15, 2014

I'd have to imagine this is because the game executed by the virtual console has been simplified and improved by Nintendo since they have direct access to the source.

derefr · on March 15, 2014

Or, possibly, that each Virtual Console ROM ships with a set of shims or plugins to the emulator, to add extra logic and workarounds specific to each game.

Which is pretty much exactly how NES and SNES cartridges worked, come to think of it—except that the console they were patching was hardware, so they had to add new physical chips to do it. (Speaking of, I've always wondered why no console just ships with some FPGAs inside that are free for each game to program on startup.)

simcop2387 · on March 15, 2014

It'd be doable to have an FPGA there, it'd have to be an SRAM backed FPGA which will make it more expensive from what I've seen. Otherwise they're usually flash based which will wear out after a while. It'd also likely suffer a similar fate to the random processors in the playstation systems where they barely got any use in games to a serious extent since they were only on one system.

Everlag · on March 15, 2014

An fpga that would be useful is a relatively expensive piece of hardware, even at scale. Additionally, finding skilled professionals that would be able to work with them is much, much tougher than hiring a comparable graphics programmer.

There's also the issue of developing for an fpga being more difficult, and thus time consuming, than writing tight c.

Jasper_ · on March 15, 2014

Because it makes mass production a lot harder and a lot more expensive.

Jasper_ · on March 15, 2014

It's also really lucky that the Gamecube GPU, released well over 10 years ago, matches extremely well to the modern GPU pipeline.

Compare this with the PS2, which has a bunch of crazy programmable coprocessors to build an extremely flexible but hard-to-emulate system.

It's probably no coincidence, given that it was developed by ex-SGI engineers who founded their own company, ArtX, who got bought out by ATI to develop the R300, one of the first modern programmable GPUs.

malkia · on March 15, 2014

The PS2 had an unseen graphics VRAM speed (48 GB/s), and that's why it's hard to emulate it on PS3, and maybe even PS4 (early PS3 simply included PS2 in there).

jerf · on March 16, 2014

As far as I know, all reverse compatibility at initial release that any console has ever had is by virtue of either the new console being "just" the old console with some more hardware bolted on and higher clock speeds, allowing the new hardware to be the old hardware just by turning the new features off and slowing the clock, or by simply physically including the entire previous console in the new one. I don't know of anything that ever did it by emulation. Maybe something back in the first couple of generations. I post this partially so someone will correct me, because I'm interested in the correction.

ggreer · on March 16, 2014

The Xbox 360 is a different architecture (tri-core PowerPC vs single core x86), but it emulates the majority of Xbox games. There are bugs, but Microsoft fixed most of the issues with popular games.

malkia · on March 17, 2014

I never owned an Xbox, but I always thought that it was matter of recompiling (Xbox games to PPC) and re-releasing. I could be wrong....

TazeTSchnitzel · on March 16, 2014

While the Nintendo DS contained the CPU the GBA did, the Nintendo 3DS does not contain the DS's CPU. I assume that's either because, as they're both ARM chips, the instruction set is backwards-compatible, or that there's actually emulation happening.

nemeth88 · on March 16, 2014

The 3DS downclocks the ARM CPU and disables its second core when playing a NDS game. As a side effect, battery life is better when playing DS games on a 3DS than when playing native 3DS games on the same 3DS.

The 3DS processor is capable of emulating 8-bit systems, as it has virtual console NES and GB/GBC games. SEGA also ported some Genesis games to run on the 3DS, but I think they had to re-write parts in native ARM code as straight emulation wasn't fast enough (Google "Sega M2 GigaDrive" for a great series of interviews about this).

jevinskie · on March 16, 2014

Some PS3s shipped with hybrid SW/HW emulation where, IIRC, the PS2 CPU was emulated by the Cell processor and the PS2 GPU hardware was included.

AgathaTheWitch · on March 16, 2014

Wow, I didn't know this. I was so annoyed when the slim got rid of backwards compatibility but it makes more sense now. It's easy to underestimate the complexity of emulating older systems I guess.

Sarkie · on March 16, 2014

I bloody love all content on this blog, the hex math fail, the mobile drivers, always the content is great. Off to bed to read this on my tablet. Thanks for the link

joevandyk · on March 15, 2014

Why is floating point math faster than integer? (Seems like integer should be simpler)

neobrain · on March 15, 2014

Basically, what kevingadd said.

As a matter of fact, AMD GPUs are using the floating point ALUs to perform integer math (note: this might have changed with their GCN architecture). However, given that the mantissa of IEEE floats is just 24 bits long, the ALU also can just handle 24 bits, which is not sufficient for full 32 bit math. Hence, for "true" integer arithmetic, the ALUs need to be double-pumped or emulated via floats (i.e. the dirty tricks which we avoided need to be done in the driver instead - but at this stage it can actually be done reliably, even if it's still ugly).

I assume the situation is similar for Nvidia GPUs. Either way both vendors said their GPUs aren't designed for optimal integer performance - so that's why we expect the performance drawbacks of integer usage to become less and less of an issue in the future.

foxhill · on March 16, 2014

i'd like to see a source for that claim, regardless, pre-GCN GPUs had double precision support that wasn't purposefully knee-capped. 52 bits of mantissa would have been plenty.

as for nvidia knee-capping integer arithmetic on their GPUs.. i terribly doubt that is the case - pointer arithmetic (and hence memory access) requires integer operations, and i've seen very little evidence to suggest that there are any artificial issues with it.

neobrain · on March 16, 2014

There's no public source for the claim, it's what an AMD engineer told me via private e-mail (and I don't want to publish private mails for obvious reasons).

That said, there's a "fast" path on AMD GPUs for shader code which only requires 24 bits of integer precision. Those are actually exactly enough for GameCube/Wii GPU emulation, however I'm not sure if their shader compiler properly optimizes our code to use that path.

Retric · on March 16, 2014

Nvidia gimps the integer and double precision floating point math so they can sell ungimped chips for several times the price.

http://www.nvidia.com/object/quadro.html

Granted, most games don't use double precision or integer math so it's a reasonable choice for market segmentation.

kevingadd · on March 15, 2014

Modern graphics hardware is tuned for floating point math, and may actually have less capacity for integer arithmetic.

MaxGabriel · on March 15, 2014

Hah! I hadn't played Twilight Princess for many years before playing playing it on Dolphin, so I actually thought Midna was supposed to have those lava-arms!

Being on a mac, I'm tied to OpenGL so I'm hoping this doesn't hurt me too much.

archagon · on March 15, 2014

Do you have a Windows license lying around? Bootcamp might be worth it: I almost get 2x performance in Windows compared to OSX!

delroth · on March 15, 2014

A lot of that is because OS X OpenGL drivers are just not good. If you look at https://developer.apple.com/graphicsimaging/opengl/capabilit... the support is currently stuck at OpenGL 4.1, with virtually no recent extensions supported (things like ARB_buffer_storage which give a great boost to applications like Dolphin are unsupported, for example).

archagon · on March 15, 2014

I've heard that before, but oddly enough, performance in a Parallels VM is often close to native Windows for me, despite ultimately going through those same OSX drivers. Someone on HN hypothesized that this might be due to Parallels' shader optimization.

MaxGabriel · on March 15, 2014

I do actually, for windows 8. But I don't think they give you a disk image, making it a pain to install.

pilif · on March 15, 2014

The windows 8.1 installer you can download from the MS store can create an ISO image which the bootcamp assistant can work with

archagon · on March 15, 2014

You can actually download the ISO from a Microsoft CDN, but I don't have a link on me right now.

pbhjpbhj · on March 16, 2014

You can get ISOs for MS Windows 7, eg links at http://bratnin.narod.ru/Windows_7.html, from msft.digitalriver.com servers.

You can also get an upgrade application from http://windows.microsoft.com/en-US/windows-8/upgrade-product... to move Win7 up to Win8.

But AFAIK you can't get a MS Windows 8 ISO to download from Microsoft itself?

archagon · on March 16, 2014

That's what I meant: I'm pretty sure digitalriver.com is Microsoft's own CDN, used for App Store and MSDN downloads. You can even verify the hashes on an actual Microsoft subsite — possibly here? http://msdn.microsoft.com/en-us/subscriptions/downloads/

thefreeman · on March 15, 2014

You could try it in Wine

pantalaimon · on March 15, 2014

Wine will just translate the DirectX calls to OpenGL.

cookiecaper · on March 16, 2014

It can still be faster to use Wine and have it translate D3D->GL if the original application has a bad OpenGL implementation.

cookiecaper · on March 16, 2014

Wine won't run most DX10/DX11 applications. The article mentions that Dolphin's D3D9 backend was deleted, so I wouldn't expect to get anywhere with Wine.

Buge · on March 15, 2014

I don't understand those overflow equations.

In the first equation (it admits it's wrong), I plug in 1 and get 0.00390625.

In the second equation, I plug in 1 and get -0.992188.

I think the answer is supposed to still be 1, because there should be no overflow until it is 256.

I thought maybe I was misunderstanding and the equation isn't just to handle overflows but is supposed to add 1 then handle overflows. So I plugged in 0 into the equations and they both outputted 0. So they aren't trying to add 1.

Wouldn't the correct equation be frac(value / 256.0) * 256.0 ?

neobrain · on March 15, 2014

That was actually a mistake in the article. I had exchanged 255 and 256 accidently. It's fixed now (and should hopefully make sense), thanks for the finding :)

EDIT: So uh, as a quick example of where things go REALLY wrong with the first equation - try value=-0.0000001 ;)

anon4 · on March 15, 2014

The value is encoded as a fraction out of 256. So 1 is 1/256, up to 255 being 255/256.

It's still not completely correct, some values passed through that function come back as +/- 1e-16 and 255/256 becomes 0. Showing us again how floats are bloody hard to work with for the average programmer.

Edit: saw neobrain's comment, please disregard my post. Still, shouldn't there be a round in there too?

neobrain · on March 15, 2014

"Still, shouldn't there be a round in there too?"

I don't think so, since that particular code was just meant to emulate integer overflows (in contrast to the limited decimal precision of integers). If you were to emulate the precision as well, it would likely need an additional round around everything indeed, i.e. something like round(value - 2.0 * round(0.5 * value * (255.0/256.0)) * 256.0).

If anything, this discussion shows that it's getting annoyingly complex to find the correct formula though, especially if all corner cases are supposed to be handled correctly. Oh right, and the real fun begins when you try to emulate 24 bit integers, for which the proposed method doesn't work at all because floats only have 23 bits of mantissa :)

Chances are there are simpler ways to emulate this stuff, I really don't know. Would be interesting to hear from GPU driver developers how integers are emulated within the driver via floats if hardware does not support integers :)

pauldacheez · on March 15, 2014

My body is not ready for the amount of performance complaints this'll cause on the forums.

At least I have an excuse to buy a GTX 780 now.

ginko · on March 15, 2014

Replacing several FP operations with a single integer OP can only improve performance.

tasty_freeze · on March 15, 2014

It depends on the context. In the case of a GPU, floating point resources are great and integer operations are limited, your statement is false.

Jasper_ · on March 15, 2014

Depends on the GPU, too. Some of the more popular mobile GPUs don't even have integer ALUs anymore.

neobrain · on March 15, 2014

Some GPUs need to emulate integers with floating point operations, so you basically end up with the same shader bytecode like before (or even worse) when using integers.

anon4 · on March 15, 2014

At least we can all rest easy that GPU manufacturers will provide drivers which, if they claim to support integer math, will emulate it with uncompromising accuracy at the highest performance possible.

randartie · on March 15, 2014

Your statement is only true for CPU operations. Modern GPUs have their hardware and drivers optimized for floating point operations.

anonymousab · on March 15, 2014

Very cool. I have to wonder if they've ever been contacted by Nintendo over this though - many recent-ish games have been in a playable state for several years now and you'd think functional Wii emulation attract negative attention.

neobrain · on March 15, 2014

As far as I can tell, none of the developers have been approached by Nintendo so far. And I guess by now they should have noticed us in some way.

EDIT: Fixed incorrect double-negative

Guvante · on March 15, 2014

They have certainly noticed, but likely won't do anything about it. IANAL but I am not even sure if emulation of this kind can be considered illegal. And if I were in their shoes I would much prefer a group of random people create a playable open source emulator of my old hardware. If they ever decide to leave the hardware business it makes going down that path themselves much easier.

keeperofdakeys · on March 16, 2014

As long as you are doing black-box reverse engineering. If you tried to disassemble the gamecube or wii software, then you likely are breaking the law.

A similar situation comes up with Gnash, the GNU adaptation of flash. They require developers to have never installed flash, which requires signing the EULA, which includes a clause about reverse-engineering the program.

cookiecaper · on March 16, 2014

Disassembly is actually explicitly legalized for the purposes of reverse-engineering. It's just distributing any software that circumvents copy protections that's illegal, whether it's the result of a disassembly or not.

People avoid disassembly in clean-room implementations out of an abundance of caution. If it's evident that the logic was ripped from disassembled executables, you'll have a harder time defending against patent claims or frivolous copyright claims.

IANAL, but this is my understanding.

mintplant · on March 15, 2014

On a related note, the branch browser feature of the Download Page [1] is completely broken, displaying only "Branch list" in place of what I assume should be an actual branch list.

[1] https://dolphin-emu.org/download/branches/

delroth · on March 15, 2014

Yeah, we recently moved away from Google Code to GitHub and updating the website hasn't been in my priorities. We moved from a "branches in the main repository" model to a more classic, GitHub style, "everyone has his fork" model, so tracking branches it not really even possible anymore.

I'll probably just remove this link from the downloads page.