This paper has two goals. First, we present the turn-taking annotation layers created for 95 minutes of conversational speech of the Graz Corpus of Read and Spontaneous Speech (GRASS), available to the scientific community. Second, we describe the annotation system and the annotation process in more detail, so other researchers may use it for their own conversational data. The annotation system was developed with an interdisciplinary application in mind. It should be based on sequential criteria according to Conversation Analysis, suitable for subsequent phonetic analysis, thus time-aligned annotations were made Praat, and it should be suitable for automatic classification, which required the continuous annotation of speech and a label inventory that is not too large and results in a high inter-rater agreement. Turn-taking was annotated on two layers, Inter-Pausal Units (IPU) and points of potential completion (PCOMP; similar to transition relevance places). We provide a detailed description of the annotation process and of segmentation and labelling criteria. A detailed analysis of inter-rater agreement and common confusions shows that agreement for IPU annotation is near-perfect, that agreement for PCOMP annotations is substantial, and that disagreements often are either partial or can be explained by a different analysis of a sequence which also has merit. The annotation system can be applied to a variety of conversational data for linguistic studies and technological applications, and we hope that the annotations, as well as the annotation system will contribute to a stronger cross-fertilization between these disciplines.
翻译:本文旨在实现两个目标。首先,我们介绍了为格拉茨朗读与自发语音语料库中95分钟会话语音所构建的对话轮转标注层级体系,该标注资源已向科学界开放。其次,我们详细描述了标注系统与标注流程,以便其他研究者能将其应用于自身的会话数据。该标注系统的设计兼顾跨学科应用需求:需遵循会话分析的序列化准则,适用于后续语音学分析(因此采用Praat进行时间对齐标注),同时需满足自动分类要求(这需要连续语音标注及规模适中、标注者间一致性较高的标签集)。对话轮转标注包含两个层级:话轮间静默单元和潜在完成点(与转换关联位置类似)。我们详细阐述了标注流程、切分准则与标注规范。通过对标注者间一致性与常见混淆点的细致分析表明:IPU标注的一致性接近完美,PCOMP标注的一致性达到显著水平,且标注分歧多属于部分不一致或源于对序列的合理不同解读。本标注系统可广泛应用于各类会话数据的语言学研究与技术应用,我们期待该标注资源及标注体系能进一步促进跨学科领域的深度融合。