The automatic annotation of direct speech (AADS) in written text has been often used in computational narrative understanding. Methods based on either rules or deep neural networks have been explored, in particular for English or German languages. Yet, for French, our target language, not many works exist. Our goal is to create a unified framework to design and evaluate AADS models in French. For this, we consolidated the largest-to-date French narrative dataset annotated with DS per word; we adapted various baselines for sequence labelling or from AADS in other languages; and we designed and conducted an extensive evaluation focused on generalisation. Results show that the task still requires substantial efforts and emphasise characteristics of each baseline. Although this framework could be improved, it is a step further to encourage more research on the topic.
翻译:书面文本中直接引语的自动标注(AADS)已广泛用于计算叙事理解。基于规则或深度神经网络的方法已被探索,尤其适用于英语或德语。然而,对于目标语言法语,相关研究尚不多见。我们的目标是构建一个统一框架,用于设计和评估法语的AADS模型。为此,我们整合了迄今为止规模最大的法语叙事数据集,该数据集以逐词粒度标注了直接引语;我们调整了用于序列标注或来自其他语言AADS的多种基线方法;并设计开展了以泛化能力为重点的全面评估。结果表明,该任务仍需大量努力,同时凸显了各基线的特征。尽管该框架尚可改进,但已为鼓励相关研究迈出了重要一步。