Bayesian approaches to reconstructing the evolutionary history of languages rely on the tree model, which assumes that these languages descended from a common ancestor and underwent modifications over time. However, this assumption can be violated to different extents due to contact and other factors. Understanding the degree to which this assumption is violated is crucial for validating the accuracy of phylolinguistic inference. In this paper, we propose a simple sanity check: projecting a reconstructed tree onto a space generated by principal component analysis. By using both synthetic and real data, we demonstrate that our method effectively visualizes anomalies, particularly in the form of jogging.
翻译:贝叶斯方法重构语言进化历史依赖于树模型,该模型假设这些语言源自共同祖先并随时间发生演变。然而,由于语言接触及其他因素,这一假设可能在不同程度上被违反。理解这种假设违反的程度对于验证语言谱系推断的准确性至关重要。本文提出一种简单的合理性检验方法:将重建的树投影到主成分分析生成的空间中。通过使用合成数据与真实数据,我们证明该方法能有效可视化异常现象,尤其是以"抖动"(jogging)形式出现的异常。