While 2008 may not be fondly remembered as a classic year if you work in certain industries, it's a fascinating time to be working in video media production or broadcast television. The steady move to high-definition at all stages of the video production process, is causing the biggest shake-up in technology and working practices since the introduction of colour television in the late 1960s. Of course, much is made of how this affects the visual requirements of broadcasters and film-makers, but there's an equally interesting story to be told about the effect that the move to HD is having on sound.
5.1 & The Future Of Broadcast Audio
From being something of a latecomer to moving pictures in the late 1920s, sound for picture has of course come on a great deal over the last 80 years, with cinemas now offering varying types of so-called 'surround sound' for realistic-sounding 3D effects, and for the last couple of years, even television, thanks to the introduction of high-definition broadcast, has begun to offer viewers what is known as '5.1' surround sound in the comfort of their own homes (assuming they have a home cinema system). Change is continuing to come thick and fast — at this year's IBC show in Amsterdam, the Japanese state broadcaster NHK, in association with the Italian state broadcaster RAI and our very own BBC, demonstrated Super Hi Vision, a glimpse of 'the TV broadcast of the future'. You'll no doubt have read elsewhere all about Super Hi Vision's unprecedented visual resolution of 7680 pixels on 4320 lines, which is 16 times the resolution of 1080p HD video, or its 60fps frame capture rate, but were you aware that it also offers viewers 24-channel so-called 'immersive' audio, via a lower layer of three speakers at audience level, a middle layer of 10 speakers, and a top layer up by the ceiling of a further nine speakers (plus two so-called 'sub-woofers' for the reproduction of the deepest bass tones)? Not only does Super Hi Vision's sound system literally surround viewers with sound, it also creates a sense of height in the sound with its triple-layer arrangement, instead of only operating in a flat plane like previous widely adopted cinema or broadcast surround systems.
But don't go and order yourself a couple of dozen speakers just yet. Clearly, it will be quite a while before the majority of programmes are being made in anything approaching Super Hi Vision — if ever. Even NHK, who invented it, admitted at IBC 2008 that they wouldn't begin regular tests of Super Hi Vision for six or seven years, and that public broadcasts could not be expected in the format until at least 2025. For the moment, the agreed sound standard for HD broadcast is six-channel surround (popularly known as '5.1'), and even this is a recently agreed convention which many broadcasters have not yet embraced. After all, it’s only a few years since two-channel stereo took over from mono as the standard for UK TV (back in the early 1990s).
Back To Mono
But let's back-track a moment, if you're like many video-orientated types, even talk of 'two-channel' stereo or 'six-channel surround' may have lost you. It's worth a recap, as the systems that enable us to create surround soundtracks for broadcast today all have their roots in developments in the field of audio recording and reproduction from the 1930s and 1940s.
The earliest experiments in sound recording in the latter half of the 19th century were all mono: that is, recorded by a single microphone and reproduced through a single speaker, and recorded sound was almost exclusively mono for the next half-century. But as early as 1881, a French audio engineer, Clment Ader, who was using telephone receivers to transmit audio from opera performances in Paris to listeners situated elsewhere, noticed that the experience was much more realistic when he used two receivers and transmitted the results to two earpieces, one for each ear. The idea of using two receivers or microphones to reproduce a musical performance via twin earpieces or loudspeakers, one for left and one for the right, continued to interest people sporadically over the next 50 years, presumably because two microphones and two speakers accords with our natural provision of two organic listening devices — our ears! The breakthrough came in the late 1920s and early 1930s, when Alan Blumlein, a British engineer at the recording company EMI, worked out proper methods for recording with two microphones and reproducing the results as 'two-channel' sound. Blumlein initially called two-channel audio 'binaural' sound, but within a few years, most people, including him, were calling it ‘stereophonic’, or stereo for short, from the Greek for 'solid sound'.
Coincident Or Spaced?
Almost immediately, recording engineers began to disagree about the best way to record in stereo. Mono recording was relatively simple — with only one microphone, you just set it up to capture the best possible sound you could and began recording. But stereo was a different ball game. You can completely change the sound of a stereo recording depending on how you set up your microphones — whether you put them closely together or space them apart, and depending on what angle you place them at relative to one another. Blumlein, after extensive experimentation, concluded that the most realistic stereo recordings were to be obtained by angling a pair of microphones at 90 degrees to one another and placing them as closely together as possible, so that their capsules (the recording part of the microphone) were as near to being in the same place as was physically possible. This is called recording with a 'coincident stereo pair' of microphones. However (perhaps again influenced by the way our ears are physically arranged) many engineers began to favour using microphones placed some distance apart (the so-called 'spaced pair' approach), claiming that it gave a more realistic-sounding result. And others still hedged their bets, and began using arrays of multiple microphones to make their stereo recordings, some of which would be in coincident angled pairs in accordance with Blumlein's ideas, while others were arranged in spaced pairs. From these differences of opinion stem the variant approaches to making surround-sound recordings for broadcast today.
Blumlein's argument was partly mathematical, and had to do with the concept of phase. This can be hard to understand, but basically, if a sound is recorded simultaneously from more than one location, and the signals from each microphone are added together, they can partly cancel out due to the differences in the time it has taken for the sound to reach each microphone. This can create a hollow, thin-sounding recording lacking in bass if your phase problems are bad enough. Far better, Blumlein argued, for recordings to be made from the same point in space, which completely eliminates the possibility of phase problems. Of course, getting two different microphones to occupy exactly the same point in space is physically impossible, even if they're angled at 90 degrees as in Blumlein's coincident pair, but that's why he suggested that the capsules of the recording mics were to be placed as closely together as possible. The counter-argument was that the human hearing system makes good use of phase differences to help localise sounds, and that the phase differences that stem from recording with spaced mics create a better sense of space than a coincident pair. This can also be true. However, it is certainly fair to say that the more spaced mics you use, the greater the potential is for phase problems to affect the sound, especially if the two-channel stereo signal is combined to mono at some point.
How Many Speakers?
At around the same time as these arguments were taking place, cinema sound engineers in the USA also began to experiment with systems comprising more than just two loudspeakers, in a bid to create a sound system that could completely surround audiences with music, dialogue or sound effects, like natural 360-degree hearing. Walt Disney's multimedia extravaganza Fantasia, from 1940, wins the record for being the first ever surround-sound moving-picture production. The film was originally designed to be played through the custom-built three-channel 'Fantasound' sound system, which fed two front speakers, two rear speakers, and a centre-front speaker, although the Fantasound system was constantly being revised during its short life. The custom sound system was abandoned when the USA entered World War II, and subsequent releases of the film were simplified into standard stereo. By the 1950s, there were several mutually incompatible multiple-speaker cinema surround sound formats in the USA, such as WarnerSound, Cinerama, and CinemaScope — all of which eventually failed commercially. Through a slow process of gradual evolution from stereo, cinema sound developed by the 1990s into a format quite similar to the original Fantasound, where a left and right speaker were joined by a centre-front speaker for film dialogue, a left and right rear speaker pair were added to allow sounds to emanate from behind, and a simple low-frequency speaker was introduced to handle really deep bass effects, such as earthquake rumblings and explosions. This made the six-channel surround sound we know today: five full channels, plus a limited bass channel, or a '5.1' system. There have been many variants on this idea with extra speakers, including so-called 6.1, 7.1, 9.1, 11.2 systems, and now the 24-channel 22.2 system proposed for Super Hi Vision, but 5.1 became the most commonly recommended choice of format to accompany HD video for broadcast just a couple of years ago.
How Do You Create Six Channels?
75 years on, the crossed-pair/spaced pair recording arguments echo on in surround recording techniques. Some surround recording microphone systems, such as those made by SPL and Microtech Gefell, have developed the concept of the spaced pair, and comprise spaced, angled arrays of five microphones or microphone capsules, with each one effectively providing an output to one of the five main loudspeakers to make six-channel 5.1 audio (the sixth, the 'point one' bass channel, is then made by filtering a copy of the others to leave only the deepest bass frequencies). Just as with spaced-pair stereo recordings, however, the quality of six-channel surround audio captured with such multi-capsule spaced-array systems can suffer if the six channels are combined together to make stereo or mono signals.
A different way to approach surround recording has been pioneered by the company SoundField. It also involves microphones with multiple capsules — effectively several microphones in one — but operates by means of an ingenious extension of Blumlein's coincident pair concept. The concept of the SoundField microphone was originally designed in the 1970s by the late Dr Michael Gerzon (see separate box). Gerzon's design, which he developed with fellow Oxford maths graduate Peter Craven, completely eliminates phase problems and also cleverly allows the output of the microphone to be decoded and converted into any audio format, from mono to stereo, 5.1 and beyond if required, and simultaneously if necessary. This format flexibility has made SoundField's microphones a popular choice amongst broadcasters wishing to capture audio in 5.1 for HD broadcast, particularly if they also need to supply stereo sound at the same time for backwards-compatible standard-definition broadcasts.
This article is designed merely to scratch the surface of surround audio and its implementation for HD broadcast. However, there are plenty of free on-line resources to be found detailing stereo and surround mic techniques, the development of multi-channel cinema sound systems, and Michael Gerzon’s groundbreaking work. Happy Surfing!
Box: Michael Gerzon & The SoundField Concept
A brilliant mathematician and quantum physicist, Michael Gerzon was fascinated by music and recording techniques as a hobby while at Oxford, and put his mind to work applying advanced mathematics to improving the recordings he made with a like-minded group of recording engineers.
Gerzon was a fan of Alan Blumlein's microphone techniques, and while still a student wrote impassioned but well-argued articles for the recording magazines of the day about the technical and subjective superiority of coincident miking. Together with Peter Craven, he worked out the minimum theoretical microphone requirements for capturing a three-dimensional soundfield in its entirety, including height information, despite the fact that there were no loudspeaker systems capable of reproducing such a recording with height information in those days. He concluded that an array of three directional microphones arranged at right angles to one another, one covering each three-dimensional axis (ie. up/down, left/right, and back/front), plus a fourth non-directional microphone providing a reference signal, would work for the capture of 3D sound, as long as the mics used were perfectly coincident.
Of course, placing four microphones in exactly the same position is a physical impossibility, so Gerzon and Craven worked out a mathematical way to recreate the impossible. They designed a tetrahedral array of four perfectly matched identical capsules, each of which is arranged a precise, very short distance from the exact centre of the array. Because the capsules' orientation and distance from the array's centre is a known constant, it is possible to mathematically derive and apply audio processing to the output of each capsule and thereby recreate the sound that would be captured by each of the four capsules if they were at the heart of the array.
The four processed signals from the array are collectively known as SoundField B-Format, and unlike the signals from spaced microphone arrays, they can be combined without any phase problems, as they have effectively been recorded from the same location in space — they are said to be 'phase-coherent'. Using the volume information in the fourth signal and the directional information in the first three, audio recorded in SoundField B-Format can recreate a three-dimensional image of the acoustic space around the array with great accuracy, and with further processing, audio can be derived in any format, from phase-coherent mono through stereo, 5.1 surround and beyond. This provides another advantage over recordings made with spaced surround arrays, which can provide decent enough 5.1 recordings, but cannot easily be reconfigured into other formats. SoundField's B-Format, however, simply needs to be decoded correctly. It is an audio storage format that is independent of its output format, and as such, is neatly future-proof.
The theory of SoundField microphones was just the beginning of an entire ambitious system of three-dimensional recording part-formulated by Michael Gerzon, with the help of others, during the 1970s, and called Ambisonics. It never took off commercially, and Gerzon tragically died following a severe asthma attack at the age of 50 in the mid-1990s. But his innovative work lives on today, redesigned using 21st-century, low-noise components, in the series of microphones developed from his basic concepts by SoundField Ltd. These range from affordable models that use computer workstation-compatible software plug-ins to do the necessary processing and decoding, to rugged, digital microphone systems with separate hardware decoder boxes for use in professional recording and broadcast applications.
A typical layout for a six-channel 5.1 audio monitoring setup, with five full-bandwidth speakers