Dialogue Enhancement (DE) enables the rebalancing of dialogue and background sounds to fit personal preferences and needs in the context of broadcast audio. When individual audio stems are unavailable from production, Dialogue Separation (DS) can be applied to the final audio mixture to obtain estimates of these stems. This work focuses on Preferred Loudness Differences (PLDs) between dialogue and background sounds. While previous studies determined the PLD through a listening test employing original stems from production, stems estimated by DS are used in the present study. In addition, a larger variety of signal classes is considered. PLDs vary substantially across individuals (average interquartile range: 5.7 LU). Despite this variability, PLDs are found to be highly dependent on the signal type under consideration, and it is shown that median PLDs can be predicted using objective intelligibility metrics. Two existing baseline prediction methods - intended for use with original stems - displayed a Mean Absolute Error (MAE) of 7.5 LU and 5 LU, respectively. A modified baseline (MAE: 3.2 LU) and an alternative approach (MAE: 2.5 LU) are proposed. Results support the viability of processing final broadcast mixtures with DS and offering an alternative remixing that accounts for median PLDs.
翻译:对话增强技术能够在广播音频中重新平衡对话与背景声音,以满足个人偏好和需求。当无法直接获取制作源中的独立音频分轨时,可对最终音频混合信号应用对话分离技术来估计这些分轨。本研究聚焦于对话与背景声音之间的偏好响度差。以往研究通过使用制作源原始分轨的听音测试来确定偏好响度差,而本研究采用对话分离估计的分轨,并考虑了更多样化的信号类别。不同个体之间的偏好响度差差异显著(平均四分位距:5.7 LU)。尽管存在这种变异性,但研究发现偏好响度差高度依赖于所考虑的信号类型,且中位数偏好响度差可通过客观可懂度指标进行预测。两种基于原始分轨的现有基线预测方法分别产生7.5 LU和5.0 LU的平均绝对误差。本研究提出了一种改进基线方法(平均绝对误差:3.2 LU)和一种替代方法(平均绝对误差:2.5 LU)。研究结果表明,对最终广播混合信号应用对话分离技术,并基于中位数偏好响度差提供替代混音方案具有可行性。