We present an analysis of mutual intelligibility in related languages applied for languages in the Romance family. We introduce a novel computational metric for estimating intelligibility based on lexical similarity using surface and semantic similarity of related words, and use it to measure mutual intelligibility for the five main Romance languages (French, Italian, Portuguese, Spanish, and Romanian), and compare results using both the orthographic and phonetic forms of words as well as different parallel corpora and vectorial models of word meaning representation. The obtained intelligibility scores confirm intuitions related to intelligibility asymmetry across languages and significantly correlate with results of cloze tests in human experiments.
翻译:本文对罗曼语族内相关语言间的相互可理解性进行了分析。我们提出了一种基于词汇相似度的新型计算度量方法,该方法通过相关词汇的表层相似性与语义相似性来估计可理解性,并运用此方法测量了五种主要罗曼语言(法语、意大利语、葡萄牙语、西班牙语和罗马尼亚语)的相互可理解性。研究同时比较了使用词汇正字形式与语音形式、不同平行语料库以及多种词义向量表示模型所得的结果。所获得的可理解性分数证实了关于语言间可理解性不对称性的直观认知,并与人类实验中的完形填空测试结果呈现显著相关性。