Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is "unguided," and requires source separation. In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technologies include spatio-level filtering (SLF) and deep-learning based dialog classification and denoising. Using subjective listening tests, we show that DeepSpace demonstrates significantly improved overall performance relative to state-of-the-art systems available for testing. We explore the feasibility of using existing automated metrics to evaluate unguided DE systems.
翻译:对话增强是一种允许用户在电视或电影内容中相对于非对话声音提高对话电平的功能。当仅有原始混音可用时,该增强属于"无引导"模式,需要依赖源分离技术。本文描述的DeepSpace系统通过结合动态空间线索与源线索实现源分离,以支持无引导对话增强。其核心技术包括空间电平滤波、基于深度学习的对话分类及去噪处理。主观听力测试结果表明,相比当前可测试的最先进系统,DeepSpace在整体性能上展现出显著提升。同时我们探究了利用现有自动评估指标衡量无引导对话增强系统的可行性。