Object Based Audio (OBA) provides a new kind of audio experience, delivered to the audience to personalize and customize their experience of listening and to give them choice of what and how to hear their audio content. OBA can be applied to different platforms such as broadcasting, streaming and cinema sound. This paper presents a novel approach for creating object-based audio on the production side. The approach here presents Sample-by-Sample Object Based Audio (SSOBA) embedding. SSOBA places audio object samples in such a way that allows audiences to easily individualize their chosen audio sources according to their interests and needs. SSOBA is an extra service and not an alternative, so it is also compliant with legacy audio players. The biggest advantage of SSOBA is that it does not require any special additional hardware in the broadcasting chain and it is therefore easy to implement and equip legacy players and decoders with enhanced ability. Input audio objects, number of output channels and sampling rates are three important factors affecting SSOBA performance and specifying it to be lossless or lossy. SSOBA adopts interpolation at the decoder side to compensate for eliminated samples. Both subjective and objective experiments are carried out to evaluate the output results at each step. MUSHRA subjective experiments conducted after the encoding step shows good-quality performance of SSOBA with up to five objects. SNR measurements and objective experiments, performed after decoding and interpolation, show significant successful recovery and separation of audio objects. Experimental results show that a minimum sampling rate of 96 kHz is indicated to encode up to five objects in a Stereo-mode channel to acquire good subjective and objective results simultaneously.
翻译:对象音频(OBA)提供了一种全新的音频体验,使听众能够个性化定制自己的听觉体验,并自主选择音频内容的内容和收听方式。OBA可应用于广播、流媒体和影院音效等多种平台。本文提出了一种在制作端实现对象音频的新方法,该方法引入了逐样本对象音频(SSOBA)嵌入技术。SSOBA通过特定的音频对象样本排列方式,使听众能够根据自身兴趣和需求轻松个性化选择音频源。作为一项附加服务而非替代方案,SSOBA同时兼容传统音频播放设备。其最大优势在于无需在广播链路中增加任何特殊硬件,因此易于部署,可为传统播放器和解码器增强功能。输入音频对象数量、输出通道数和采样率是影响SSOBA性能并决定其为无损或有损编码的三个关键因素。SSOBA在解码端采用插值技术补偿被剔除的样本。通过主观与客观实验对各阶段输出结果进行评估,编码阶段后的MUSHRA主观测试表明,SSOBA在最多支持五个对象时仍保持良好音质。解码与插值后的SNR测量及客观实验显示,音频对象实现了显著的成功恢复与分离。实验结果表明,在立体声通道中编码最多五个对象时,需采用96 kHz的最低采样率,方能同时获得良好的主观与客观效果。