

2013/11/15 Next Generation Audio for UltraHD
By Michelle Abraham
Just as there are decisions that remain for video specifications for UltraHD such as whether to use greater dynamic range, higher frame rates, and a wider color gamut, there is ongoing discussion around the audio formats as well. The goal for UltraHD audio is to improve the experience to draw the audience in even more deeply into the story.
Today, audio for movies and TV, whether on disc or delivered OTT or via a traditional TV network, is often based on delivering a certain number of separate audio channels. It may be stereo audio with two channels or surround sound with 5.1 channels.
The audio tracks are mixed into the various channels with placement of the audio at a specific time in a specific channel if it needs to sound like an object is traveling across the listening space. The mixer mixes for the midplane of the room rather than for the entire space both above and below the midplane. The audio mixer can do this to the best of their ability but they cannot control how the speakers are set up in the home. If speakers are not where they should be, the audio does not sound as the mixer intended. Equipment like sound bars is designed to simulate the impact of those channels.
New technologies have been developed to enable a higher quality audio experience. The cinema is at the forefront of new technologies with Barco Auro 11.1 and Dolby Atmos competing for the theaters. Over 200 theatres are currently equipped for Dolby Atmos, with about 50 films mixed for Dolby Atmos. By the end of 2013, there will be over 20 movies released with audio mixed in Auro 11.1, with about 40 theatres equipped. Looking at these new technologies will provide a sense of what is likely to be used in the home.
Barco’s Auro 3D provides 11.1 channels with height speakers and overhead speakers adding to the traditional 5.1 channel midplane.
Figure 1. Barco’s Auro 3D Sound System
Source: Barco
The choice is based on studies showing that height speakers at 30 degrees above the ears will create a more immersive sound experience similar to what humans experience in their normal life.
Figure 2. Auro 11.1 in a Cinema Setting
Source: Barco
Dolby’s Atmos enables the use of object audio into the traditional channel method. In addition to objects, Atmos also offers channels, referred to as beds. A total of 128 tracks, a combination of objects and beds, can be delivered. The renderer can send the output to as many as 64 speakers.
Object audio regards the sound as an object that will move through space and time. The sound is combined with metadata that describes how the sound should be reproduced to form the object. The renderer in the space receives the objects and determines how to render them based on the speaker set-up in the space. This option is more customized to the space where the content will be viewed, placing the mixing at the end-user part of the chain in the living room or in the headphones rather than at the front-end.
Object-oriented audio provides more options for the creative team putting the audio together. The dialog can be an object in order to enable it to be changed at the time of viewing. New business models may be made available, for example, choosing a different language for the dialog can be sold separately. The object metadata could allow the viewer more control over the audio than previously offered.
Every sound in an audio track will not be made an object, so there is likely to be a channel base for the background sounds that do not need to be located in space. Audio objects can be laid on top. Object audio will require more bandwidth than used by audio today. That need for bandwidth will have to be balanced against the other bandwidth demands in any linear production.
Digital Cinema Initiatives (DCI) does not use proprietary standards so it specifies uncompressed audio today. It uses Pulse Code Modulation (PCM) which converts analog audio to digital audio. DCI standards allow up to 16 uncompressed audio channels in the PCM format in one digital cinema package. Dolby Atmos object audio files are part of the Digital Cinema Package when Atmos is used.
Specifications are needed for something like PCM audio in the multi-dimensional audio space but it is still likely to be uncompressed for digital cinema. Barco has partnered with Auro Technologies and DTS to develop an open format to enable object-based immersive sound in the cinema.
In order to offer more immersive audio in the home, just increasing the number of channels is unlikely to work since few homeowners will want to install a large number of speakers. The diagram below shows additional room speakers that would be required.
Figure 3. Speaker Configuration for Immersive Audio in the Home
Source: DTS
That’s where it is helpful to have object-oriented audio. The object data will need to be placed alongside the channels. For delivery to the home, the audio will need to be compressed. Some of the delivery networks are very bandwidth sensitive so the audio will need to fit in a small pipe.
Dolby is working on what can be done to bring the more immersive Atmos experience of the cinema to other distribution channels both stationary and mobile. DTS’ multi-dimensional audio authoring platform also supports audio objects and channels.
In January 2013, DTS announced a solution for UltraHD called DTS UHD that integrates Neo:X and Headphone:X. Headphone:X enables listening to a surround sound experience replicated on headphones. Neo:X provides up to 11.1 upmixed channels with speakers placed above the listener in the front and the remaining channels at seat height to the front, sides, and rear of the room.
Standards setting organizations are looking at these new technologies and will determine what will be used in the broadcast and disc arena. The standards are expected to be published in 2015 and 2016. The standards bodies may chose a single format or chose to define a toolbox that offers several options. Different broadcast standards bodies do not all tend to choose the same thing. One group, Future of Broadcast TV (FOBTV) would like to develop a worldwide digital TV standard but they need the agreement of many, many stakeholders.
In OTT and traditional pay-TV, the audio format will be determined by the provider. For digital downloads, it will be up to the content owner to determine how the content is sold. 4K resolution video may be accompanied by today’s 5.1 channel audio with higher bit rates for improved quality. Some providers do deliver content with 7.1 audio. The compression chosen is likely to be one considered in the standards committees but the providers really control the choice based on whose solution they like better. File delivery does not have the same bandwidth constraints as linear delivery so higher bandwidth audio could show up in file first.
MRG Analysis: The migration to HEVC video compression is driving the industry to consider new audio compression technology as well. However, it is not only a matter of choosing a compression technology. There is also the matter of choosing the type of immersive audio used.
Only adding more channels is unlikely to work outside the cinema or public viewing places because consumers are unlikely to install that many speakers in their viewing area at home. Object-oriented audio seems to be the solution, but will require greater bandwidth and therefore a new way to compress the audio stream is necessary.
Audio technology companies are working with standards committees to propose their solutions for immersive audio to accompany 4K video in UltraHD standards. Those standards decisions take time and standards are unlikely to be ready before 2016.
OTT and pay-TV providers will make their own decisions on the audio they provide with 4K video. They may continue to offer only 5.1 or 7.1-channel surround sound with 4K video until standards have been determined and upgrades to audio/video receivers for the home can be made. It is a wait and see game at this point.