Could you go into a little more detail on how the process is for adding sounds and sprites to the game?
I am very interested in this type of work and have been hacking at trying to build a game for the SNES. All I've been able to do so far is to collect all relevant documentation for developing it: https://github.com/bttf/snes_dev
I'll start with the graphics part, since that's the easier part of the two.
Since the original GameBoy has only 4 shades of gray, all "bitmaps" are 2 bits per pixel so you can squeeze a total of 4 pixels into 1 byte, the graphics are converted from PNG files via some JavaScript routines which can be found in the convert.js file in the repository.
The GameBoy uses 8x8 pixel hardware tiles on a scrollable "background" and supports up to 40 sprites which can be either 8x8 or 8x16 pixels in size.
There are 2 tile buffers in VRAM which partly overlap and only the lower one can be used for sprites, which gives you 128, 8x8 tiles for use with sprites and 256 for use with backgrounds.
Sprites can be mirrored, so this somewhat reduces the amount of graphics needed for the player character.
In order to reduce the map data, each room uses 80 16x16 blocks which are served from a meta table in memory, where each index maps to 4 8x8 tile definitions of the block. When drawing the screen the code resolves the individual 8x8 tiles and sets their value into the current VRAM buffer for the screen (there are two, which are swapped, to minimize draw artefacts).
Since the meta table is limited to 256 16x16 blocks too (due to the fact that the map format only uses 1 byte per block tile) there is some additional pre-computation magic going on when creating the map data. Basically, the block tiles are split into 4 x 8 rows, each screen then maps onto 4 of these "rows". By then packing often uses block tiles together, you can have more details for certain rooms since they don't need to use all of the different rows available and can basically swap some of them out.
Also, all graphics and map data are additionally compressed with a custom LT Type compression routing and are decompressed on the fly. Except for player graphics, which are stored into a RAM buffer at start in order to avoid graphic glichtes when switching out the sprite tile reference indexes and the graphics at the same time.
Overall, the timing is really important since you don't have too much time during the VBlank period (yes, it emulates a vblank!)
Now as for the sound, right now I'm really just setting the sound channel register values pretty much "Per hand", meaning, that I pick some decent values (tested in some GBA sound test ROM for gameboy sound channels I found somehwere - yes the GBA still has the original GameBoys 4 channels, since it's backwards compatible).
The sound data is stored in some kind of JSON format for easier editing, all the bit field stuff is than generated at compile time and baked into the ROM.
I've also gotten a prototype of a music engine running (not yet in the repository) where you'd play pre-defined sound "patterns" on virtual channels and "mix" them down. The sound / music stuff really is a whole story of its own, especially since there are nearly no references to find, except for some old, undocumented ASM sources of some Rare Ware GameBoy ports of Banjo / Jet Force Gemini which apparently got uploaded by one of the Sound Engineers at Rare.
I am very interested in this type of work and have been hacking at trying to build a game for the SNES. All I've been able to do so far is to collect all relevant documentation for developing it: https://github.com/bttf/snes_dev