Audio basics

Larry Jordan#

Author: Larry Jordan#

Published 1st June 2015

by Larry Jordan Issue 101 - May 2015

It is an absolutely true statement that the best way to improve the quality of your image is to improve the quality of your sound. Viewers will happily watch grainy, black-and-white images and consider it "art.\" However, nothing you can say will keep them in their seats if the audio is too loud or unintelligible.
Yet, far too often, video editors consider audio an after-thought; or something someone else will worry about when the "real work\" of editing is complete. It doesn\'t have to be that way. An understanding of basic audio concepts and terminology will help any editor improve their projects.
Images show us what\'s going on. Audio tells us how we should feel about it. Don\'t think so? Watch any scene that\'s heavy on effects with the sound turned off. The effects may be stunning, but they will feel lifeless and a bit hokey until you turn the sound back on.
NOTE: To prevent bar fights, I need to establish that video editing systems display peak audio levels. Audio engineers generally work with average audio levels, which are about 20 dB lower than peak levels. Both are valid ways to measure audio, but the numbers are not the same and cause all kinds of confusion. In this article, I\'m using peak values measured as dBFS (deciBels Full Scale).
AUDIO 101
Let\'s start with some basics. Everything we hear is caused by pulses of air pressure vibrating against our ear drums. These vibrations, called "Hertz (Hz),\" vary in the number we hear per second. Low-pitched sounds vibrate slowly, while high-pitched sounds vary quickly.
We describe "normal human hearing\" as a range of pulses from 20 to 20,000 Hz, where 20 Hz feel more like a vibration than a tone and 20,000 Hertz sounds more like wind in a pine tree than a pitch. "Normal\" is a relative term. A three-year-old will hear sounds far beyond both ends of this range. Older individuals often lose the ability to hear high frequencies.
NOTE: For comparison, the base frequency of the lowest note on a piano vibrates at 27.5 Hz, while the highest note vibrates at 4,186 Hz. Overtones will extend into higher frequencies.
What makes this frequency range even more interesting is that every sound we hear - speech, music and noise - falls somewhere within this vibration range. In fact, these overlap, which makes it really tricky to adjust one without affecting the other, because all sounds are intermixed within this single range of frequencies.
Audio frequencies are logarithmic; which means that when frequency of a sound doubles the pitch goes up by an octave. Human hearing encompasses a ten-octave frequency range.
Although human hearing spans from 20 - 20,000 Hz, human speech does not. Human speech ranges from about 200 - 6,000 Hz for a man and about 400 - 8,000 for a woman. Vowels are low frequency sounds that give a voice its character, richness and sexiness. Consonants are high frequency sounds that provide diction and clarity.

For instance, say the letters "F\" and "S.\" Both are formed the same way, with air hissing between the tip of the tongue and the roof of the mouth. If you can hear the hiss, it\'s an "S.\" If you can\'t, it\'s an "F.\" For a man, that hiss is located about 6,000 Hz, for a woman, its closer to 8,000 Hz.
NOTE: Many individuals with hearing loss hear low frequency sounds perfectly fine, but have trouble with higher frequencies. This means that they can\'t tell the difference between an "F\" and an "S,\" because they can\'t hear the frequencies in an "S,\" making both letters sound the same.
If you are creating a program for pre-schoolers, you have a wide latitude in your audio mix, because those kids can hear anything. If, on the other hand, you\'re creating programs for the retired set, you\'d be well-advised to boost the higher frequencies to help them follow the dialog more easily.
CONNECTING A COMPUTER
Now, in the real world, audio waves work just fine. But not for computers. This is principally because computers don\'t have ears. Instead, then need to convert all these pressure waves into something a computer can understand and store. Worse, computers don\'t like things like waves that have smooth curves and infinite variations. They like measuring and storing things in chunks.
Engineers measure audio pressure waves as a variation in voltage from -1 to +1 volt. If you think of a sine wave, the low point of the wave is -1 volt, while the high point of the wave is +1 volt. Samples are used to convert waves into chunks of data that the computer can store.

A sample measures the average voltage for a very short period of time; for example, 1/48000 of a second. This sample is then stored by the computer for later playback. The cool part about this is that the Nyqvist-Shannon sampling theorem states that dividing the sample rate by 2 equals the maximum frequency response for an audio file. So, a 48,000 sample rate yields a maximum frequency response of 24,000 Hz; which encompasses the full range of human hearing.
However, audio is more than just frequencies, it also varies in loudness. That\'s where bit-depth comes in. Bit-depth determines the range between the softest and loudest portions of a digital audio clip. (I\'d explain how, but I have a 1,000 word limit in this article. Trust me, it does.)
OTHER COOL THINGS TO KNOW
Just as audio frequencies are logarithmic, so also are audio levels. 0 dB in a digital system represents the loudest your audio can be. Let\'s call that audio level 100%.
For every 6 dB you lower peak audio levels, the perceived volume of the audio is cut in half. So, reducing peak levels to -6 dB reduces perceived volume to 50%. Lowering levels another 6 dB reduces perceived volume to 25% of maximum. And so on in 6 dB increments.
NOTE: The absolute #1 rule of audio is that peak audio levels during export must not exceed 0 dB. Doing so causes audio distortion which is really, really difficult to fix once the damage has been done.
TYING IT ALL TOGETHER
So, in summary, here\'s how this applies to us:
- Sample rate determines frequency response.
- Bit-depth determines the range between the softest and loudest portions of digital audio clip
- The human voice is a subset of the range of human hearing.
- Doubling the audio frequency raises the pitch by one octave
- Reducing levels by 6 dB cuts the perceived volume in half
We can use this knowledge to improve our mixes, boost intelligibility, integrate music without making narration hard to understand, prevent distortion, and keep our audience glued to their seats. I\'ll write more about audio in future issues.

Related Listings

Related Articles

Related News

Related Videos

© KitPlus (tv-bay limited). All trademarks recognised. Reproduction of this content is strictly prohibited without written consent.