The process of debating is essential in our daily lives, whether in studying, work activities, simple everyday discussions, political debates on TV, or online discussions on social networks. The range of uses for debates is broad. Due to the diverse applications, structures, and formats of debates, developing corpora that account for these variations can be challenging, and the scarcity of debate corpora in the state of the art is notable. For this reason, the current research proposes the DEBISS corpus: a collection of spoken and individual debates with semi-structured features. With a broad range of NLP task annotations, such as speech-to-text, speaker diarization, argument mining, and debater quality assessment.
翻译:辩论过程在我们的日常生活中至关重要,无论是在学习、工作活动、简单的日常讨论、电视上的政治辩论,还是社交媒体上的在线讨论中。辩论的用途范围广泛。由于辩论的应用、结构和形式多样,开发能够反映这些变化的语料库可能具有挑战性,而现有技术中辩论语料库的稀缺性尤为显著。因此,本研究提出了DEBISS语料库:一个具有半结构化特征的口语化及个体化辩论集合。该语料库包含广泛的自然语言处理任务标注,例如语音转文本、说话人日志、论据挖掘以及辩手质量评估。