This work aims to promote Chinese opera research in both musical and speech domains, with a primary focus on overcoming the data limitations. We introduce KunquDB, a relatively large-scale, well-annotated audio-visual dataset comprising 339 speakers and 128 hours of content. Originating from the Kunqu Opera Art Canon (Kunqu yishu dadian), KunquDB is meticulously structured by dialogue lines, providing explicit annotations including character names, speaker names, gender information, vocal manner classifications, and accompanied by preliminary text transcriptions. KunquDB provides a versatile foundation for role-centric acoustic studies and advancements in speech-related research, including Automatic Speaker Verification (ASV). Beyond enriching opera research, this dataset bridges the gap between artistic expression and technological innovation. Pioneering the exploration of ASV in Chinese opera, we construct four test trials considering two distinct vocal manners in opera voices: stage speech (ST) and singing (S). Implementing domain adaptation methods effectively mitigates domain mismatches induced by these vocal manner variations while there is still room for further improvement as a benchmark.
翻译:本研究旨在推动中国戏曲在音乐与语音领域的研究,重点克服数据资源匮乏的挑战。我们提出KunquDB——一个规模较大、标注完善的音视频数据集,包含339位说话人及128小时内容。该数据集源自《昆曲艺术大典》,按唱词段落精心组织,提供明确的注释信息,包括角色名称、说话人姓名、性别信息、发声方式分类,并附有初步文本转录。KunquDB为以角色为中心的声学研究以及语音相关领域的进展(包括自动说话人验证ASV)提供了通用基础。除了丰富戏曲研究外,该数据集还架起了艺术表达与技术创新之间的桥梁。作为中国戏曲ASV领域的开创性探索,我们针对戏曲嗓音中两种截然不同的发声方式(舞台念白ST与演唱S)构建了四组测试任务。实验证明,域自适应方法能有效缓解发声方式差异带来的域不匹配问题,但作为基准测试仍有进一步改进的空间。