Bringing Yourself Up to Speed with AAC, MP3, and Digital Audio - Why Do You Need to Compress Music?
(Page 2 of 16 )
In short, you must compress music files because they’re huge. For music files to sound passable, they need to contain a lot of data. For them to sound perfect to the human ear, they need to contain a huge amount of data. (More on what such perfection entails and on the precise quantities of data in “What Does CD-Quality Audio Mean?,” later in this chapter.)
As you’ll know if you’ve surfed the Web using a slow Internet connection, different types of content require different amounts of data to represent them. Typically, text requires the smallest amount because of the (relatively) minuscule number of possible characters in the Western European character set: letters (in uppercase and lowercase), letters with modifications signs (such as å and ÿ), numbers, punctuation, symbols, and so on.
So, to represent text, you need to represent only the sequence of letters and any necessary formatting information. For text, basic ASCII uses seven bits of data for each character, giving 128 different permutations (2×2×2×2×2×2×2). Extended ASCII uses eight bits of data for each character, giving 256 different permutations (2×2×2×2×2×2×2×2)—enough for the vast majority of Western languages. (By comparison, Chinese and Japanese—each of which has many thousand characters—require larger numbers of bits. But Unicode takes care of them by using a relatively extravagant 64 bits of data for each character.) Then you need to represent the font name, size, and so on, but this information can be represented in text as well, so it doesn’t require much more data.
Graphics, audio, and video require far more data than text because they contain so many more variables. A text character can be any of the, say, 256 characters in the extended ASCII character set. However, a graphic can show anything visible, a moment of audio can represent any sound, and a moment of video can represent a combination of the two.
Some of this data is more compressible than others. For example, high-resolution graphics contain a huge amount of data, but they can be highly compressible. To give a crude example, a compressed graphics format (or a compression program, such as StuffIt or WinZip) can give an instruction such as “use blue of the hue 0,64,192 for the next 2000 pixels” instead of saying “blue of the hue 0,64,192” 2000 times in succession. (Actually, the file just says “0,64,192” because it knows it’s talking about colors using Red, Green, Blue [RGB] data—but you get the idea.) The program that opens the compressed file then restores the compressed information to its former state, essentially re-creating the full picture. This is called lossless compression—the compressed file contains the information necessary to re-create the complete original file.
But audio is much harder to compress than graphics, because it’s less static. (Video is even worse, because it has not only audio but also images changing at 25 to 30 frames per second—but luckily video isn’t our problem in this book.) You can perform some audio compression by reducing repeated information to a set of instructions for repeating it—hold this sound for three seconds, and so on. But to significantly reduce the amount of data needed to represent audio, usually you need to discard some of the data. This is called lossy compression, because the data discarded is lost. As you’d think, to make compressed audio sound as good as possible, the compression format must discard the data that’s least important to the listener. Hold that thought—we’ll get back to how encoding schemes such as AAC and MP3 do this in a page or two.
In the meantime, there is one means of stripping down music to a minimal set of information— a means other than writing it down as a score on paper. That way is called MIDI (Musical Instrument Digital Interface), a music format for representing music as a set of computer instructions. Essentially, MIDI assumes that each particular instrument sounds a given way and provides a set of instructions for how each of the instruments in the song should play.
For example, you can tell a MIDI file to play a piano note for a given time with given attack (how hard the note is struck), given sustain (how long the note is held), and so on. In this way, a minimal amount of data can produce a full-sounding track. Depending on the set of instruments used, the MIDI track will sound different on different equipment—but so will the same score played by the same pianist on a different Steinway, the same Beethoven symphony played by different orchestras (leaving the conductors out of the equation), or the same Beatles instrumental standard covered by different substandard tribute bands.
That all works well enough for music that can use generic instrument sounds—for example, a grand piano, a distorted electric guitar, or a regulation snare drum. But it doesn’t work for vocals, because you can’t effectively synthesize a voice. Even if you could, you couldn’t synthesize the right voice. And even if you could synthesize the right voice, you would need to describe the tone, expression, and delivery. You can’t exactly say “just like Bowie doing ‘Ziggy Stardust’ but with different words” or “Justin Timberlake an octave higher than sounds comfortable” and expect a computer to deliver it. So, compression for music needs to take a subtler approach—knocking out the less important parts of the music while preserving as much as possible.
But we’re getting ahead of ourselves here. First, let’s consider what audio quality is and what data is required to deliver it.
This is chapter three of How to Do Everything with Your iPod & iPod Mini, by Guy Hart-Davis (McGraw-Hill/Osborne, ISBN 0072254521, 2004). Check it out at your favorite bookstore today.
Buy this book now. |
Next: What Determines Audio Quality? >>
More Software Articles
More By McGraw-Hill/Osborne
| Recommended by Dev Hardware |
|---|
|