Developing tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers. While previous work on this problem has focused exclusively on the text modality, here we explore the utility of the audio modality as an additional input. We create a new multimodal dataset (text and audio in English) containing 48 hours of speech from past political debates in the USA. We then experimentally demonstrate that, in the case of multiple speakers, adding the audio modality yields sizable improvements over using the text modality alone; moreover, an audio-only model could outperform a text-only one for a single speaker. With the aim to enable future research, we make all our data and code publicly available at https://github.com/petar-iv/audio-checkworthiness-detection.
翻译:开发能够自动检测政治辩论和演讲中值得核查的主张的工具,可以极大帮助辩论主持人、记者和事实核查人员。以往关于此问题的研究仅聚焦于文本模态,而本研究探索将音频模态作为额外输入的效用。我们构建了一个包含美国过往政治辩论中48小时语音的新多模态数据集(英文文本和音频)。通过实验证明,在多说话人场景下,添加音频模态相对于仅使用文本模态能带来显著性能提升;此外,在单一说话人场景中,纯音频模型的表现可能优于纯文本模型。为促进未来研究,我们将所有数据和代码公开于 https://github.com/petar-iv/audio-checkworthiness-detection。