At FeedPress we use and rely on our own product every day to host our podcasts and provide analytics. For those unaware, I run my own podcast network called Hologram Radio. If you’ve been following my tweets lately, it’s no secret that I’ve been very vocal about the general lack of consistency and just plain bad mastering going on with a bulk of the podcasts I review. In this article, I’m going to enumerate on some of my observations and provide good and bad examples of podcast audio.
Spoken word characteristics
There are lows and peaks of varying degrees in human speech. Let’s start with the most egregious of issues I see in podcasts—clipped audio. In the analogue realm, there can be headroom above 0 dB (dB = decibels, how we measure sound level). In the digital realm, there is zero headroom above 0 dB. The absolute max peak you should ever reach before clipping is -0.1 dB, though you should target a lower true peak value for additional headroom when exporting to a lossy format like MP3 (somewhere between -1 and -3 dB).
There can be clipped audio that you aren’t even aware of when looking at a waveform or just by looking at a meter. Say you’re hitting 0 dB and you think you’re fine—until you check the true peak or have a true peak programme meter, you could be sending out bad audio with audible distortion.
What is Loudness Compliance?
Loudness compliance means mastering your audio to a set of establish standards for audio loudness. This is something that’s now being addressed in broadcasting, TV, and film. This has now expanded into the medium of podcasting. There is a subset of the ITU BS.1770-3 standard that many adhere to, including NPR, for podcasting. I’ll enumerate the details. You’ve likely found yourself in situations many times where you’re constantly adjusting the volume of your speakers or headphones. Be it in the car listening to the radio, watching TV, or listening to music and podcasts. This is what compliance aims to achieve—a more consistent and comfortable listening session for the end user.
The current recommended target for integrated loudness (the averaged and measured perceived loudness of the entire file) in podcasts or anything mobile audio (meaning, consumed on a mobile device, so that includes YouTube) is -16 LUFS for stereo files, -19 for mono files.
What the heck are LUFS?
LUFS= Loudness Units Relative to Full Scale, where 1 LU (Loudness Unit) equals 1 dB. If you’re wondering why mono has a different target, it’s because a -3 dB offset is applied because of the perceptible difference in volume of a mono audio source, panned centre.
There are numerous third party plugins that can analyze your audio and help you output a compliant audio file, ready for podcast consumption. These plugins are not cheap. You can find them by Zotope, Nugen Audio, TC Electronic, and Waves Audio. They range anywhere from $200-$500 each. Auphonic makes an affordable desktop client for Mac/Windows that can process a compliant audio file for you. Another less expensive option is to use Adobe Audition CC.
Adobe was nice enough to licence a scaled down version of TC Electronic’s LM2 meter plugin. The loudness meter is useful to add into your master bus to see what the average perceived loudness of your project is. I recommend Audition to many podcasters on a budget, since you can pay the monthly subscription feed and have access to a powerful editing tool.
Here’s a couple of screenshots of how I’ve configured Adobe’s loudness meter. Note the “Peak” box at the top right. This will light up if the meter detects that any portion of the audio has gone over the max true peak you defined in its settings.
How do you fix the problem?
An engineers goal should be to optimally set input gain to avoid overloading an A/D (analog to digital ) converter. If there are further issues after the recording, then processing must be applied to properly prepare the audio file before any loudness normalization occurs.What processing am I talking about? Namely, compression.
Compression was originally invented as a means to more efficiently control the overall level of a recorded source. Back in the day, engineers would have to ride the fader on their mixing consoles, in real-time, to prevent unnecessarily loud audio from being recorded. Compression solves this problem by attenuating the audio across the board, creating a more uniform sounding recording.
The loudest parts of the audio no longer will sound completely out of place in comparison to the softest parts. Compression, when used properly, is an incredibly useful and must-have tool.
Here’s an example of a typical single band compressor (read below the image for an explanation of settings):
Threshold: The point at which the compressor actually starts working. I have mine set fairly low at -25 dB. Compression is largely subjective and setting a lower threshold, as in the illustrated example I provided, can add some desirable characters to your sound (you have to experiment with this, depending on the compressor you’re using).
Gain: Often referred to as make-up gain or just as “output.” This is additional gain applied after compression, to make up for the overall attenuation of the recorded audio. If you’re using heavy compression and your processed audio is significantly quieter, you may need to apply some make-up gain.
Ratio: The amount of gain reduction you wish to have. In this example, I have things set to 2:1, which means for every 2 dB, we only allow 1 dB through. The higher the ratio, the more gain reduction is applied. Sometimes additional gain reduction is needed, depending on the programme material. For example, a news or sports announcer may talk considerably louder than someone who’s carrying a conversation in a podcast, so you may set the ratio to something like 6:1. Since this is a podcast I’m dealing with and it’s two people talking at typical speech volume when in close proximity to each other, I opted for a 2:1 ratio since I only need a little gain reduction to smooth things out.
Attack: Is the amount of time it takes for the compressor to reach 100% attenuation (gain reduction). I recommend a relatively quick attack for vocals. Be careful about setting it too quick or too slow, as that could negatively impact the natural transients in your voice that you would want to keep (you know, those natural peaks in your voice, as spoken word is highly dynamic audio).
Release: Is the amount of time it takes for the attenuation (gain reduction) to cease with the signal returning to its original level. You can play around with the release, which is typically in ms (milliseconds).
Loudness normalization and compliance
After your file has been properly processed with compression, it’s ready for loudness normalization. There are various things that happen during this stage: your entire audio file is analyzed and the integrated loudness, which describes the overall program material average–from the softest part to the loudest part, is a value that any loudness tool will provide.
Since loudness measuring is based on an algorithm that builds on a study of subjective perception, in theory, program material that complies with the determined LRA and Program Loudness of a certain broadcast standard can in fact overload if normalized the traditional way (quasi-peak or sample-peak). Therefore, normalization is also part of many broadcast standards, and to comply, broadcasters must use a true-peak meter. — TC Electronic
In Adobe Audition, a true peak value can be specified. I have mine set to -2 (others set it anywhere from -1 to -3 dB), which gives me ample headroom for exporting to a lossy format such as MP3. The reason why we need addition headroom when exporting to MP3, is because intersample peaks may be introduced during the encoding.
What happens to the final product to bring it up to spec depends on the state of the original file. If your bounced track is below or above the integrated -16 LUFS target for your programme material, a gain subtraction or addition may need to be applied during normalization to ensure it’s brought up to the level that it should be.
Note: Be careful about your source audio and any noise levels present in the background. If your source material has a lot of background noise and it’s quieter than the targeted loudness, when gain is added, you’ll only amplify that noise. Care must be taken during recording to eliminate background noise!
With respect to my own podcasts, I’ve been focusing my efforts on creating a unified audio standard across my podcast network. I think it’s crucial to ensure listeners receive the most intelligible podcast we can possibly produce, at the same bit rates, and at the same loudness. I’ve opted for 192 kbps stereo MP3, 16 bit/44.1 kHz. Too often I see podcast networks that fail to accomplish this. Some episodes are stereo, others are mono. Some podcasts are 96 kbps, others 64 kbps. This is no good.
Examples of popular poorly mastered podcasts
Below is an example of two poorly mastered podcasts from a very popular podcast network. I’m not going to mention the name because it’s not important. I wanted to point out what the issues are and make note that even people who have been podcasting for a long time have a lot to learn about the final mastered product they send out into the world.
These two examples are poorly optimized, and as far as I’m concerned, completely unacceptable to be released into the world. Earlier in this post, I mentioned how there’s zero headroom above 0 dB in the digital realm. These are prime examples of excessively loud audio with many unreasonably loud peaks that are creating distortion (clipped audio). You can clearly tell here where the problems are. The clipped audio peaks can be very clearly identified if you look at where 0 dB is in the waveform. All of those sudden spikes in loudness are touching or exceeding where 0 dB is.
As for the integrated target, which previously mentioned for podcasts is -16 LUFS, I’ll give them a pass for being only slightly off the mark.
Example of an optimized podcast
Below is an example an optimized podcast. If you compare the waveform to one of the previous two examples, you can see there’s a clear and very stark contrast between them. The overall level of this waveform is far more tame and more uniform, with zero clipped audio and meets our ideal integrated loudness target (-16 LUFS).
I’m passionate about helping others improve the quality of their podcasts. Even with inexpensive gear, it’s still possible to produce well optimized audio with no perceptible distortion, that’s highly intelligible and comfortable to listen to. If you want to learn more, stay tuned as we’re going to be releasing a series of tutorials on how to produce podcasts.