Speech signals encode emotional, linguistic, and pathological information within a shared acoustic channel; however, disentanglement is typically assessed indirectly through downstream task performance. We introduce an information-theoretic framework to quantify cross-dimension statistical dependence in handcrafted acoustic features by integrating bounded neural mutual information (MI) estimation with non-parametric validation. Across six corpora, cross-dimension MI remains low, with tight estimation bounds ($< 0.15$ nats), indicating weak statistical coupling in the data considered, whereas Source--Filter MI is substantially higher (0.47 nats). Attribution analysis, defined as the proportion of total MI attributable to source versus filter components, reveals source dominance for emotional dimensions (80\%) and filter dominance for linguistic and pathological dimensions (60\% and 58\%, respectively). These findings provide a principled framework for quantifying dimensional independence in speech.
翻译:语音信号在共享的声学通道中编码情感、语言及病理信息;然而,解耦效果通常仅通过下游任务性能间接评估。本文提出一种信息论框架,通过将有限神经互信息估计与非参数验证相结合,量化手工声学特征中的跨维度统计依赖性。在六个语料库中,跨维度互信息始终较低,且估计界限紧密(< 0.15 nats),表明所考察数据中统计耦合较弱;而源-滤波器互信息则显著更高(0.47 nats)。归因分析(定义为总互信息中可归因于源分量与滤波器分量的比例)显示,情感维度以源主导为主(80%),而语言维度和病理维度则以滤波器主导为主(分别为60%和58%)。这些发现为量化语音中的维度独立性提供了一个原理性框架。