In 2012, the Oculus Rift Kickstarter campaign burst onto the scene, initiating a new wave of public interest in sense-enveloping immersive experiences. 5 years later, the consumer reality is mixed with some very public let-downs like Google Glass (which is coincidentally enjoying a re-birth at the time of writing, now as a technical tool in the workplace), and other technologies such as Dolby Atmos® becoming almost commonplace experiences. What does this mean for the audio professional and how is the near future shaping up in 2018?
Now that the initial hype has settled down, it's clear that some immersive experiences fit well as a natural extension of existing technologies, while others require the consumer to embrace new equipment and services, with the latter being somewhat slower to come to practical fruition than the former.
Perhaps the most successful application today is that of movie theatre sound. Generally, the domain of the seasoned professional, with an accompanying budget, it seems that the adoption of technologies such as Dolby Atmos, DTS-S and Auro-3D® has been both rapid, and welcomed by the consumer. While it's true to say that initial production of audio in these formats has been something of a custom set-up, popular editing software is catching up fast and with Pro Tools® now enabled for native 7.1.2 tracks, the process has become much easier. However, there is still work to be done here. Most modern NLE/DAW systems do not support native object tracks, so at the moment the technology remains as an opportunity appealing to the more dedicated or adventurous producer.
If a significant contributing factor to the widespread adoption of immersive audio in the cinema market can be put down to the negligible effort required on the part of the consumer (who doesn't want an improvement for nothing?), then the same factor could explain the somewhat stalled start that appears to have materialized in the gaming market. It could be argued, of course, that game audio has always been object-based and to some extent immersive, in that the sound has been delivered in a manner that reflects the position of the players character on screen. At first inspection, gaming and headset sound and vision would seem to be a perfect fit and initial consumer interest was certainly in this direction. However, mass market adoption, even in this narrow arena has yet to materialize. Reasons for this are open to debate, but cost is a significant factor. Game developers are certainly more than capable of delivering a quality experience, but the cost to the consumer seems to remain elusively high.
360 video on the other hand, is immensely accessible, with viewers needing nothing more than a smartphone or tablet and a pair of earbuds to enjoy an immersive experience for, in most cases, very little additional outlay. Samsung recently released figures that indicated 50 percent of its VR content consumed was in fact 360 video, with gaming actually on the decrease, at around 35 percent. It's easy to produce content for this medium with ambisonic audio getting a new lease of life as the dominant format for producing a fully immersive 3D audio environment. As first order, ambisonic audio is generally employed, precise localization is somewhat difficult. However, in mobile settings, the listening environment is often compromised and the sense of space, even at this level, can deliver a real improvement over stereo.
In between the movie theatre and 360 mobile video lies TV. Here, several factors combine - production, delivery mechanisms and consumer equipment. On the production side, in theory, audio produced in an object-based format will automatically translate to the home environment as the decoders configure the audio to the available speaker arrangement at the set top box. This, however, pre-supposes the availability of an object-based mix. It's still early days, however, the tide is changing with many post-production studios installing 7.1.2 and 7.1.4 rigs in preparation for the inevitable transition away from a channel-based workflow to the more flexible object-based proposition, even as the editing software rushes to keep pace with the demand.
Production pre-dates consumer adoption, but even if an immersive soundtrack is available, the systems needed to replay this have to be in the home there is little to be gained from an immersive soundtrack in a stereo playback environment. New speaker developments, including upward firing technologies that avoid the requirement for ceiling speakers, and specially developed sound bars may indicate a way forward, bringing a true immersive enhancement for the everyday home listener.
What does all this mean for today's audio post-production engineer? With some applications moving forward at high speed and others still in their infancy it can be difficult to know where to invest your time and energy. As with all emerging technological transitions, the key is flexibility. DAW and NLE audio technologies are still catching up with the demands of immersive audio production. Concepts are wide and varied with object, ambisonic, high channel count and binaural audio all enjoying their own application. Within these fundamental choices there are many formats as well as myriad encoding options to deliver the resulting mix to the consumer.
Translating a mix from stereo to a 3D setting itself can be a challenge, as much of the recorded source material is stereo, as are most sound design libraries, digital instruments and FX. New tools are emerging rapidly to support these requirements, including our own Halo Upmix, now available with a 3D extension capable of upmixing to 7.1.2 and ambisonic formats. Tools such as Halo Upmix are now making 3D immersive audio possible within a traditional post-production workflow and DAW/NLE template. The need to configure a complex hybrid system is fading into the past and immersive audio is at the edge of mainstream production for those who need it, where these applications will ultimately take hold in the consumer market still remains to be seen.