The statistical over-representation of phonological features in the basic vocabulary of languages is often interpreted as reflecting potentially universal sound symbolic patterns. However, most of those results have not been tested explicitly for reproducibility and might be prone to biases in the study samples or models. Many studies on the topic do not adequately control for genealogical and areal dependencies between sampled languages, casting doubts on the robustness of the results. In this study, we test the robustness of a recent study on sound symbolism of basic vocabulary concepts which analyzed 245 languages.The new sample includes data on 2864 languages from Lexibank. We modify the original model by adding statistical controls for spatial and phylogenetic dependencies between languages. The new results show that most of the previously observed patterns are not robust, and in fact many patterns disappear completely when adding the genealogical and areal controls. A small number of patterns, however, emerges as highly stable even with the new sample. Through the new analysis, we are able to assess the distribution of sound symbolism on a larger scale than previously. The study further highlights the need for testing all universal claims on language for robustness on various levels.
翻译:语言基本词汇中语音特征的统计过度表征常被解释为反映了潜在的普遍语音象征模式。然而,这些结果大多未经过明确的可复现性检验,且可能因研究样本或模型偏差而产生偏误。该领域的许多研究未能充分控制样本语言间的谱系与地域依赖性,从而削弱了研究结果的稳健性。本研究检验了一项近期关于基本词汇概念语音象征性研究的稳健性,该研究分析了245种语言。我们采用Lexibank数据库中2864种语言的数据构建新样本,并通过在原始模型中添加对语言间空间与谱系依赖性的统计控制来改进模型。新结果表明,先前观察到的大部分模式并不稳健,事实上在加入谱系与地域控制变量后,许多模式完全消失。然而,仍有少量模式在新样本中表现出高度稳定性。通过新分析,我们得以在更大规模上评估语音象征性的分布特征。本研究进一步强调,所有关于语言普遍性的主张都需要在不同层面上进行稳健性检验。