This document, based on feedback from UMR TETIS members and the scientific literature, provides a generic methodology for creating annotation guidelines and annotated textual datasets (corpora). It covers methodological aspects, as well as storage, sharing, and valorization of the data. It includes definitions and examples to clearly illustrate each step of the process, thus providing a comprehensive framework to support the creation and use of corpora in various research contexts.
翻译:本文档基于UMR TETIS团队成员的反馈和科学文献,提供了一套用于创建标注指南和标注文本数据集(语料库)的通用方法论。该方法论涵盖方法论层面,以及数据的存储、共享和价值实现。文档通过定义和实例清晰阐释流程的每个环节,从而为不同研究场景下语料库的创建与使用提供完整的框架支持。