Audio description (AD) is a crucial accessibility service provided to blind persons and persons with visual impairment, designed to convey visual information in acoustic form. Despite recent advancements in multilingual machine translation research, the lack of well-crafted and time-synchronized AD data impedes the development of audio description translation (ADT) systems that address the needs of multilingual countries such as Switzerland. Furthermore, since the majority of ADT systems rely solely on text, uncertainty exists as to whether incorporating visual information from the corresponding video clips can enhance the quality of ADT outputs. In this work, we present SwissADT, the first ADT system implemented for three main Swiss languages and English. By collecting well-crafted AD data augmented with video clips in German, French, Italian, and English, and leveraging the power of Large Language Models (LLMs), we aim to enhance information accessibility for diverse language populations in Switzerland by automatically translating AD scripts to the desired Swiss language. Our extensive experimental ADT results, composed of both automatic and human evaluations of ADT quality, demonstrate the promising capability of SwissADT for the ADT task. We believe that combining human expertise with the generation power of LLMs can further enhance the performance of ADT systems, ultimately benefiting a larger multilingual target population.
翻译:音频描述(AD)是为盲人及视障人士提供的一项关键无障碍服务,旨在以听觉形式传递视觉信息。尽管多语言机器翻译研究近期取得了进展,但缺乏精心制作且时间同步的AD数据,阻碍了面向瑞士等多语言国家需求的音频描述翻译(ADT)系统的发展。此外,由于现有ADT系统大多仅依赖文本输入,整合对应视频片段的视觉信息是否能提升ADT输出质量仍存在不确定性。本研究提出了瑞士ADT——首个面向三种主要瑞士语言及英语实现的ADT系统。通过收集涵盖德语、法语、意大利语和英语的精心制作AD数据及增强视频片段,并借助大语言模型(LLMs)的能力,我们旨在通过将AD脚本自动翻译至目标瑞士语言,提升瑞士多语言群体的信息可及性。我们通过自动评估与人工评估相结合的ADT质量综合实验结果表明,瑞士ADT在ADT任务中展现出显著潜力。我们相信,将人类专业知识与LLMs的生成能力相结合,可进一步提升ADT系统性能,最终惠及更广泛的多语言目标群体。