What MP3s do to Music

When I encoded this track to mp3 I noticed that it particularly degraded it’s quality, more so than other tracks. It got me thinking that a lot of people may not realise what effect mp3 compression has on a track because they would tend to only have the mp3 version. Similarly very few people ever listen to music higher than CD quality, with the exception of movie soundtracks on DVDs or Blurays.

So I decided to make a comparison that people can download and try out.

When I produce a track it is done at at least 96kHz. The software I use runs at 32/64-bit floating point (I think it switches to 64-bit for some specific tasks). Such high bit depth is necessary because of the amount of summing that has to happen. A track could be made of dozens of channels, each itself doing summing internally (sound generators may involve summing many channels and effects processors will have things like dry / wet mixes).

I personally do the final mix down through an analog mixing desk. The result is a 32-bit 96kHz wav file ‘master’. This is then downgraded to 44.1kHz 16 bit using r8Brain. That file is than turned into several mp3 files (I happen to use FL Studio for that, which is the only thing I have that can encode mp3s, though it probably isn’t the best application for that). NB: the file might not play an all sound cards. You’ll have to check. It’s also 94MB.

The above file contains the same 16 bars repeated 4 times:

1) The full 32-bit 96kHz version @ 6144 kbps

2) The CD quality version 16-bit 44.1kHz @ 1411 kbps

3) MP3 @ 320 kbps

4) MP3 @ 128 kbps (this is the quality SoundCloud uses)

These were sequenced and the upscaled back to the 96k/32-bit file you can download.

The first thing to note is that this track has a much bigger dynamic range than most modern music. The final full track has an average dynamic range of 12 (measured by this), most modern tracks probably have a dynamic range of <1dB or something ridiculous. MP3s seem to sound ‘better’ when there’s more going on and less dynamic range just because there’s more stuff to distract you and less dynamic range means less actual information. Aliasing does become an issue with mp3s but mostly at lower bit-rates.

The first thing I noticed is that you can hear a slight difference between 1 and 2. 1 has a high frequency granular texture to it that is more rounded off in 2, which makes sense as 2 must have had some low-pass filtering done. You can hear it if you focus on the reverb sound between the kicks, right in the middle of the stereo field. It’s hard to pick that out without knowing what to listen for though.

When you listen to the mp3s the things you should be aiming your attention at are the transients. These are the short snappy sounds that usually happen at the onset of a sound. You’ll notice that mp3s really do ‘pixelate’ them. In the 128 kbps version the transients sound almost like they’ve been passed though a resonant envelope filter and have a tonal quality added to them, like a pitched poppy, zappy sound that is completely unintended. This is important because transients portray most of the rhythmic structure of the music. It sounds exactly like the audio analogy of the type of compression artifacts you get on compressed movies: details smudged, generalised and relocated.

The other area that you should listen to is around the low frequencies. There are all kinds of things happening in that area that are just added by the MP3 codec. The whole area is much more muddy, and filled with random pulses and booms of sound that, again, aren’t intentional, and have their own tonal characteristics.

