This paper introduces the system submitted by dun_oscar team for the ICPR MSR Challenge. Three subsystems for task1-task3 are descripted respectively. In task1, we develop a visual system which includes a OCR model, a text tracker, and a NLP classifier for distinguishing subtitles and non-subtitles. In task2, we employ an ASR system which includes an AM with 18 layers and a 4-gram LM. Semi-supervised learning on unlabeled data is also vital. In task3, we employ the ASR system to improve the visual system, some false subtitles can be corrected by a fusion module.
翻译:本文介绍了dun_oscar队伍在国际模式识别会议多模态说话人识别挑战赛(ICPR MSR Challenge)中提交的系统。针对任务1至任务3分别描述了三个子系统。在任务1中,我们开发了一个视觉系统,该模型包含一个光学字符识别(OCR)模型、一个文本追踪器以及一个用于区分字幕与非字幕的自然语言处理(NLP)分类器。在任务2中,我们采用了一个自动语音识别(ASR)系统,该系统包含一个18层声学模型(AM)和一个4元语言模型(4-gram LM)。对未标注数据的半监督学习同样至关重要。在任务3中,我们利用该ASR系统改进视觉系统,通过一个融合模块纠正部分虚假字幕。