Variational decomposition autoencoding improves disentanglement of latent representations

Understanding the structure of complex, nonstationary, high-dimensional time-evolving signals is a central challenge in scientific data analysis. In many domains, such as speech and biomedical signal processing, the ability to learn disentangled and interpretable representations is critical for uncovering latent generative mechanisms. Traditional approaches to unsupervised representation learning, including variational autoencoders (VAEs), often struggle to capture the temporal and spectral diversity inherent in such data. Here we introduce variational decomposition autoencoding (VDA), a framework that extends VAEs by incorporating a strong structural bias toward signal decomposition. VDA is instantiated through variational decomposition autoencoders (DecVAEs), i.e., encoder-only neural networks that combine a signal decomposition model, a contrastive self-supervised task, and variational prior approximation to learn multiple latent subspaces aligned with time-frequency characteristics. We demonstrate the effectiveness of DecVAEs on simulated data and three publicly available scientific datasets, spanning speech recognition, dysarthria severity evaluation, and emotional speech classification. Our results demonstrate that DecVAEs surpass state-of-the-art VAE-based methods in terms of disentanglement quality, generalization across tasks, and the interpretability of latent encodings. These findings suggest that decomposition-aware architectures can serve as robust tools for extracting structured representations from dynamic signals, with potential applications in clinical diagnostics, human-computer interaction, and adaptive neurotechnologies.

翻译：理解复杂、非平稳、高维时变信号的结构是科学数据分析的核心挑战。在语音和生物医学信号处理等诸多领域，学习解耦且可解释的表征对于揭示潜在生成机制至关重要。包括变分自编码器（VAEs）在内的传统无监督表征学习方法，往往难以有效捕捉此类数据固有的时域与频域多样性。本文提出变分分解自编码（VDA）框架，该框架通过引入强信号分解结构偏置扩展了VAEs。VDA通过变分分解自编码器（DecVAEs）实现，即仅包含编码器的神经网络，其结合信号分解模型、对比自监督任务和变分先验近似，以学习与时间-频率特性对齐的多个潜在子空间。我们在模拟数据及三个公开科学数据集（涵盖语音识别、构音障碍严重程度评估和情感语音分类任务）上验证了DecVAEs的有效性。实验结果表明，DecVAEs在解耦质量、跨任务泛化能力及潜在编码可解释性方面均优于当前最先进的基于VAE的方法。这些发现表明，具备分解感知能力的架构可作为从动态信号中提取结构化表征的稳健工具，在临床诊断、人机交互和自适应神经技术等领域具有潜在应用价值。