2015/08/12

Self-Imposed Restrictions: The Sound of a Generation

As I hinted in my last post, I've been researching sound technology so I could finally complete the rule set simulating technical restrictions of the SNES and GBA. The fundamental problem I should have noticed the first time around is that, while the two systems had really similar graphical capabilities, their sound systems are radically different.

Which Generation, though?

The GBA had to be backwards compatible with the Game Boy and Game Boy Color, so it came with all the hardware of the GBC inside it, including its sound system: four analog channels, one for white noise, two for square waves with varying duty cycles, and one "programmable channel", which was essentially a four bit digital channel. It would have been a waste not to use these for GBA games as well, so all that was added for those were two digital 8 bit channels.
Compare that to the S-SMP, the audio processing unit in the SNES, which sported eight digital 16 bit channels, at 32 kHz, which is almost CD quality (which was never utilized, because of limited memory and lossy compression). A cheesy echo effect was famously added as well, which saw quite a bit of overuse.

The two consoles encouraged completely different approaches to sound design. One had all the utensils for high quality chip tunes, the other was designed for sampled music. Lo-fi samples, but samples nonetheless.
This is the reason people barely complained about the music in Donkey Kong Country 3, which had its sound track remade entirely for the GBA port, while almost everyone complained about the "shitty audio" in, say, Final Fantasy VI.

When designing the rules for graphical limitations, I cherry picked the better properties from both systems. My virtual controller has an X and Y button; my virtual screen is modeled after the SNES resolution; and I won't emulate the GBA's display, at least not by default.
The choice seems obvious: I'll design my rules around the S-SMP, leaving out GBA tech. Even if I decided to use a chip tunes sound track instead - you can make some kick-ass tunes with eight sampled channels, even more so than with four analog and two sampled ones.

The Rules

The rules themselves are quite simple, actually:
Since there is only eight channels, you can't play more than eight different samples at a time. Since you need a channel or two for sound effects, that leaves you with two options: reserve a channel or two, or drop one out dynamically whenever an effect plays. I won't bother programming the latter, so we'll stick to the former: No more than six notes may be played at any one time in a song, though I there will be no limit to the number of sound effects playing at a time. It does, however, include the tails of notes: samples were generally chosen such that even when played in rapid succession, they wouldn't overlap with themselves.
The S-SMP's memory was also limited to a mere 64 Kilobytes, and after compression, 9 bytes of data in there correspond to 32 bytes of sample data. Leaving some space for instructions, this means that no more than 110 Kilobytes of sample data are permitted per song.
With so little memory, multisampling, i.e. using multiple samples of the same instrument playing different notes, wasn't really done. So only one sample per instrument. On the other hand, sounds based purely on white noise, such as a synthesized snare drum, do not count towards the data total, as the S-SMP came with a noise synthesizer.
As to the quality of the samples, the SNES could play back any sample rate up to 32 kHz, but because of the strict memory limitations, none of the games I looked at used sample rates above 8 kHz. Also noteworthy is the compression algorithm used: Bit Rate Reduction produces very unique, audible artifacts. Luckily, I found a tool that BRR compresses sample data.
We'll also make good use of the famous echo effect. It could be enabled independently for all channels, but they had to share the same settings. The delay could be set to any multiple of 16ms from 16 to 240ms. Even echo feedback is supported.
Finally, the eight channels could be "mastered": There's registers for pitch modulation and ADSR envelopes, as well as panning and volume.

With all that in mind, I'll use a tracker to compose the music and render it into uncompressed *.wav files, and possibly master them with an equalizer. This would be wholly unauthentic, but it's what Shovel Knight did, and that game had a great sound.
Either way, this is entirely in asset production, so my sound engine code can be quite simple, and I can focus on other things, which is always good.

Problem is that I don't know jack about music theory, so I'll use inauthentic place holder sounds until I (a) find someone willing to work under these conditions; or (b) learn music theory.

No comments:

Post a Comment