Speech-based clinical tools are increasingly deployed in multilingual settings, yet whether pathological speech markers remain geometrically separable from accent variation remains unclear. Systems may misclassify healthy non-native speakers or miss pathology in multilingual patients. We propose a four-metric clustering framework to evaluate geometric disentanglement of emotional, linguistic, and pathological speech features across six corpora and eight dataset combinations. A consistent hierarchy emerges: emotional features form the tightest clusters (Silhouette 0.250), followed by pathological (0.141) and linguistic (0.077). Confound analysis shows pathological-linguistic overlap remains below 0.21, which is above the permutation null but bounded for clinical deployment. Trustworthiness analysis confirms embedding fidelity and robustness of the geometric conclusions. Our framework provides actionable guidelines for equitable and reliable speech health systems across diverse populations.
翻译:基于语音的临床工具在多语言环境中日益普及,然而病理语音标记是否在几何上可与口音变异保持分离仍不明确。系统可能误判健康的非母语使用者,或遗漏多语言患者的病理特征。我们提出一个四度量聚类框架,用于评估跨六个语料库和八种数据集组合中情感、语言及病理语音特征的几何解耦程度。结果显示出一致的层次结构:情感特征形成最紧密的聚类(轮廓系数0.250),其次是病理特征(0.141)和语言特征(0.077)。混淆分析表明病理-语言特征重叠度始终低于0.21,该值虽高于置换检验的零假设基准,但仍处于临床部署的可控范围内。可信度分析证实了嵌入表示的保真度及几何结论的稳健性。本框架为构建面向多样化人群的公平可靠语音健康系统提供了可操作的指导原则。