Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.
翻译:词义在共时和历时层面都极难捕捉。本文描述了基于10万个人工语义邻近度标注构建的最大规模分级语境化历时词义标注资源,涵盖四种不同语言。我们详细阐述了多轮增量标注流程、将用法聚类为义项的算法选择,以及该数据集在历时和共时层面的潜在应用价值。