The automatic annotation of direct speech (AADS) in written text has been often used in computational narrative understanding. Methods based on either rules or deep neural networks have been explored, in particular for English or German languages. Yet, for French, our target language, not many works exist. Our goal is to create a unified framework to design and evaluate AADS models in French. For this, we consolidated the largest-to-date French narrative dataset annotated with DS per word; we adapted various baselines for sequence labelling or from AADS in other languages; and we designed and conducted an extensive evaluation focused on generalisation. Results show that the task still requires substantial efforts and emphasise characteristics of each baseline. Although this framework could be improved, it is a step further to encourage more research on the topic.
翻译:直接引语自动标注(AADS)在书面文本中常用于计算叙事理解。已有研究探索了基于规则或深度神经网络的方法,主要针对英语或德语。然而,针对我们的目标语言法语,相关研究尚不充分。我们的目标是建立一个统一框架,用于设计和评估法语AADS模型。为此,我们整合了迄今为止规模最大的法语叙事数据集,并按词级别标注了直接引语;针对序列标注任务或跨语言AADS方法,我们适配了多种基线模型;同时设计并开展了以泛化能力为重点的全面评估。结果表明,该任务仍需大量努力,且各基线模型的特点得以凸显。尽管本框架仍有改进空间,但它为推动该领域的进一步研究迈出了重要一步。