Loudness normalisation for podcast streaming

Jon Schorah

Author: Jon Schorah

Published 1st April 2015

by Jon Schorah Issue 99 - March 2015

For media companies steeped in traditional workflows, the exploding demand for over-the-top (OTT) content on many different devices is offering up significant new opportunities as well as challenges. If content owners can re-purpose their assets effectively, they can reach existing and new audiences through channels that didn\'t even exist. decade ago. Although multi-screen video content is currently making all the headlines, there\'s also tremendous demand for audio-only OTT programming ranging from music and audiobooks to podcasts of popular radio shows.
Re-purposing audio for streaming and podcasting that was originally mixed for broadcast is not for the squeamish. There are key challenges related to audio quality and listener satisfaction, not the least of which is intelligibility in often less-than-ideal listening environments (think subway cars and park benches). Other factors come into play, such as the quality of the user\'s ear buds or laptop speakers, or lack thereof. Also, podcasts often utilize data compression techniques in order to maximize the use of limited available bandwidth, which can lead to distracting artefacts if measures aren\'t taken at the production stage. In this article, we\'ll discuss these challenges in more detail and describe several new tools that were designed specifically for re-purposing of audio over streaming services.
The Balancing Act: Exciting Audio With Clear Dialog
One of the biggest challenges with audio re-purposing. especially for podcasts. is to ensure dialog clarity while reducing the dynamic range of material that was originally mixed for. much more optimal sound environment, such as home radio/TV or. cinema. Programs with the highest dynamic range, i.e. the widest difference between the softest and loudest sounds, are consequently, some of the most difficult to repurpose. Since most people don\'t listen to podcasts in the quiet of their living rooms, ambient noise such as. passing subway train or. blaring siren require the listener to turn up the volume to hear the soft sections, which results in discomfort during the louder sections. And, as we\'ve mentioned, the wide variation in the quality of playback equipment is. major factor in the overall quality of the listening experience.
Podcast content with commercial breaks presents another layer of complexity. In one loudness normalisation method, an anchor element. most typically the human voice. is used as. loudness reference. However, for the most exciting mixes with the widest dynamic range,. variance exists between program loudness and the average level of voices. Depending on the balance of louder sections with average voice level in. mix, the average voice level can drop within the mix considerably after dynamic repurposing. Once again, the viewer reaches for the volume knob to make the spoken dialog more understandable. and interrupting commercials that have been correctly set to programmed loudness are now perceived as irritatingly loud.
Until now, the best way to address these challenges has been to assign. mix engineer to remix the audio; i.e. manually go through the mix and turn the volume up for soft sounds and turn it down for louder sounds, however not only is this process expensive and time-consuming, it can be difficult to achieve. satisfactory result if the original audio stems are not available and traditional compression techniques are employed.

Loudness Normalisation Goals for Podcasts
One of the most important goals for content owners is to maintain control of loudness normalisation, because if they don\'t do it, the OTT services will. Many steaming services now employ loudness normalisation techniques (for instance, Soundcheck for iTunes). If non-compliant audio is submitted to these services, the resulting processing can lead to transmission of audio that was not as originally imagined.
It is therefore in the content owners best interest to submit audio that\'s compliant and suitable for the target platform from the outset. This requires the ability to measure for and adjust dynamic range based on loudness range (LRA) parameters as well as the peak-to-loudness ratio. In terms of loudness target,. rule of thumb for podcasts is -16 LUFS, compared to the -23 or -24 LUFS specified in the ITU-R BS.1770 loudness recommendations for television.
Another important consideration is true peak. For television, ITU-R BS.1770 allows for. maximum peak program level of -1dBTP, but most streaming compression schemes require. lower level of around -3dBTP Max to avoid downstream distortion after the employment of the chosen codec.
Dynamically Adapting Audio
Dynamic adaptation algorithms introduce. new promising technique that accounts for the complex loudness normalisation requirements of podcasts. In short, dynamic adaptation algorithms address the challenges of re-purposing film for TV or TV/radio for streaming while respecting dialog levels and maintaining transitions in the full context of loudness compliance. NUGEN Audio has applied the concept in DynApt,. proprietary algorithm that automatically analyzes the audio and then carefully reduces the loudness range according to several measures, while preserving the short-term dynamics and "space\" in the audio and ensuring intelligibility for the dialog levels. The end result is audio that can be re-purposed for streaming without damaging the content\'s original sound and feel, maintaining intentional dramatic transitions and preserving dialog clarity.
Dynamic adaptation checks and controls both voice level and program loudness in. way that restricts maximum variance. making the outcome more robust and suitable for automatic application in comparison with traditional multi-band compression techniques. Providing. new option for adapting content in. fast and efficient manner, the algorithm automatically adapts dynamic audio appropriately for different listening environments and playout systems. By enabling audio re-purposing and complex loudness-compliant dynamic adaptation (including LRA targeting) within the NLE or as an offline process, dynamic adaptation supports the creative process with minimal workflow disruption, and without the need to employ heavy-handed blind processing at broadcast or the need to re-mix the work several times for each target platform.
To summarize, dynamic adaptation techniques can overcome the key challenges of adapting audio content for re-purposing: namely, how to provide optimal quality of experience in terms of dynamic audio contrasts within the context of loudness compliance and. last but not least. how to achieve these objectives in. highly automated, efficient, and cost-effective workflow. With dynamic adaptation, content owners can preserve the original creative intent of the audio while also ensuring that audio is suitably re-purposed for different listening and playout contexts And importantly,. dynamic adaptation workflow can fit seamlessly next to the usual EBU R128/CALM-based normalisation.

Related Articles

Related News

Related Videos

© KitPlus (tv-bay limited). All trademarks recognised. Reproduction of this content is strictly prohibited without written consent.