Broadcasting Audio in 5.1 Format

Author: Dennis Lennie

Published 1st January 2010


One of the positive consequences of digital television transmission is the ability to include fully embedded multi channel audio with suitable metadata to control both channel displacement and even sound levels within the domestic environment.
While at the receiving end of the transmission chain there are many innovations and protocols to make life easier for the consumer to set up and enjoy surround sound, life is considerably more taxing at the recording and post-production end of the chain.
For example, to quote Dolby
‘Dialnorm (dialogue normalization) is a metadata encoding protocol within Dolby Digital used to control loudness. "Dolby Digital has always included this mechanism and it is used by broadcasters to control the reproduce level in the home.’
It is all very well to add controls and enhancements to the broadcasting system but it affects the way that sound is heard after the original mix is completed unless provision is made for monitoring and assessing all elements of the ‘mastering’ process.
With the growth of interest in surround sound encoding there has been a great deal of confusion regarding the best way to set up a mastering facility to monitor accurately the audio channels in a given format.
It is useful to return to the roots of surround sound in order to define exactly what is expected of the new media, in terms of the creation of a three dimensional sound picture.
The modern form of surround sound was created in mid 70s by Dolby Labs in San Francisco. The trick was to use the optical sound track space on 35mm film to fit an encoded, analogue, stereo signal which could then be processed to produce a centre front channel, left and right stereo, plus a discrete sub bass channel and a ‘surround’ channel.
Several benefits were derived from the new format which apply equally well to current multi channel modes
  • The centre channel contained all the dialogue and those on screen effects which need maximum spatial stability. This is the equivalent of the original mono optical track and the Dolby Stereo track is mono compatible to this day. The single sound source ensures that there is no phasing or ‘combing’ of the dialogue, off axis of the centre line of the theatre, which would be detrimental to intelligibility and naturalness of reproduction. The centre channel is the absolute reference source and all other channels must be phase and amplitude aligned to it.
  • The left and right channels reproduce the music and sound effects in such a way as to create a sense of space and distance which matches the on screen action as closely as possible. Some effects are also fed to the centre channel for maximum impact. Occasionally dialogue will be panned into the stereo ‘picture’, to create the effect of movement or drama.
  • The surround channel was, and is, used for off screen sound effects and atmosphere, such as wind and rain, spooky echoes, etc. The bandwidth of the surround channel was limited to 7 kHz for improved noise and distortion performance but with 5.1 formatting this is not the case. The whole point of the surround channels is to create a sense of involvement in the action or atmosphere of the picture. It is not ‘rear stereo’.
  • The sub channel was used to good effect in movies such as Earthquake but it was always considered as an option as far as film processing was concerned and the use of a sub speaker in domestic formats is mostly to allow reduction in size of the main speakers by filtering all signals below 120 Hz and recombining them in a single channel. With large cinema systems the sub channel has come into its own and the advent of Dolby SR and SRD allowed much more use of dynamic range and high energy at low frequencies. Little of this quest for power is transposed to the average home system and most subs struggle to compensate for the pathetically small main speakers. There are exceptions but at a price.

The point I am making is that 5.1 is a format born of the movie industry that has been shoe horned into the world of television and therefore is something of a cuckoo in the cozy nest of the BBC and other broadcasting organizations. This may explain to some extent how badly the whole process is approached and delivered by the industry.
It is not my place to comment on any individual organization as to the way they handle the whole business of surround sound but let’s just say it’s not often impressive.
Monitoring Format
The essential question at the moment is how do the smaller control room environments used for television mixing compare with the traditional film dubbing theatres that have been mixing 5.1 for several years?
To judge this it is important to recognise the relationship between the direct and reverberant sound in each room.
The person judging the sound balance and quality must hear the correct blend of direct energy from each speaker and a pre determined amount of room reflection and reverberant energy. The ratio between the two can be controlled by the following factors
  • Speaker directivity over the required frequency range
  • Distance from the speaker to the listening position
  • Room volume and geometry
  • Room acoustic and absorption

It is by manipulating these factors that it possible to obtain a balance in a smaller room of say 50 sq. metres that is similar to a much larger theatre with a highly directional horn, at least in the speech frequency band, which is the most important from an intelligibility point of view. This is particularly important in the reproduction of sound in 5.1 format. The object of the exercise is to recreate a realistic sound field based on what we perceive as natural. That means discrete visually interactive sounds from the front channels and indirect or ambient sound from the rear, with the occasional directional effect to give interest to the mix.
This is a far cry from some of the audio-only 5.1 mixes I have heard which could cause severe neck ache, not to mention a low irritation threshold once the novelty of the lead guitarist hanging on the back wall begins to wear off. I am reminded of the early days of stereo LPs, when ping-ponging sounds and trains panning across the living room were the order of the day. The most pleasing results will occur when the room (monitoring or listening) is not too dead and the speakers are very open and actually given some space to radiate before early reflections contrive to destroy the spatial images of the mix. I have noted that multi channel digital sound has created a demand for mixing theatres which are if anything more live that the equivalent old analogue rooms. I can only explain this in terms of naturalness and adherence to audio fidelity, which I think bodes very well for the future.
Many people are unsure about the best way to set out speakers for multi-channel mastering. I can only give a clear answer in the context of mixing to picture and for this format I strongly recommend the layout shown by Dolby, in their excellent literature.
As DVD mastering uses the Dolby Digital format for most applications it is logical to set up a room and monitoring system as they recommend. The ITU-R specification has also been adopted by broadcast engineers and it is used for critical listening rooms.
This internationally recognised document is full of obscure references to methods of appraising audio but is recommended for its depth and weight!
The front three speakers should be positioned equi distantly from the mix position with an angle of 60 degrees subtended by the left and right. All three should be identical models with a matched response within +/- 3dB in any 1/3 octave band between 250 Hz and 2 kHz, measured at the mix position. The limits can be slightly relaxed above and below this range but for mastering purposes this would not be advisable.
The surround speakers should be positioned at the same distance to the mix position as the main front speakers, at a angle of 110 degrees from the centre line. If this is not possible then a digital delay should be used to bring the speakers back into coincidence. Ideally the surround units will be the same as the front ones but if that is not possible they should be of the same manufacturer using identical driver technology. The polarity and phase summing of each speaker should be such that any two together will produce an increase in sound level of between 3 and 6 dB at any frequency up to 10 kHz. Above that frequency it is difficult to place the microphone so precisely that phase coincidence is perfect (but many purists will insist on trying).
The whole point is that by producing a phase coherent monitoring environment it is possible to find and eliminate phase anomalies in a 5.1 mix and then, ironically, introduce the required degree of diffusion and chaos to replicate a given environment.
Several people have asked me how to ensure that all five speakers sound the same and the answer is basically down to a few key factors
  • The room must be symmetrical about the centre speaker axis, with windows and doors placed so as to steer sound reflections away from the mix position. Early reflections (within 15ms of the direct sound) should have amplitude of no more that 10dB below the direct sound, especially in any band between 250 Hz and 2kHz.
  • Each speaker should be placed at least 1 metre from any wall and not equidistant to two walls (or they should be completely flush with the wall for large soffit mounted systems).
  • The room acoustic should be as diffused as possible with a reverberation time of 0.25(V/100)1/3) seconds. In other words the reference value is 0.25 seconds for a 100m3 room, increasing with room size. My favourite spaces are often about this value so I would argue that the ISO/MPEG/SMPTE guys have got it about right.
  • Room proportions should not be vastly different but never the same and cubic is a disaster. Room modes will always dominate the low frequency performance, closely followed by first reflections from floor and ceiling. 7m by 5m by 3m is a good starting point for mastering rooms.
  • Background noise should be judged according to material and the final medium.

Film and pop music are often mixed in rooms with a noise floor of NR30 because a mix that is too wide in dynamic range will cause problems on playback (noisy theatres, car radios, Walkman in the street etc.). A critical listening room needs to be very quiet so as to judge the limits of other systems’ dynamics and NR10 to NR15 might be called for in some circumstances.
5.1 for television programmes is a new an exciting medium with huge potential for extending the audio experience beyond anything the average consumer has experienced outside the cinema. The industry is suffering greatly at the moment because production inexperience and consumer electronics wars are diluting the message. There have been too many half cocked proposals put about lately so lets just get on with it before gangrene sets in.

Related Articles

Related News

Related Videos

© KitPlus (tv-bay limited). All trademarks recognised. Reproduction of this content is strictly prohibited without written consent.