Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of time-scales exhibited by sources in time series data. Existing methods typically rely on a preselected window size that limits their capacity to handle multi-scale sources. To address this issue, instead of operating in the time domain, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering covariances that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial Gaussian-mixture variational autoencoder that is trained to (1) probabilistically cluster sources at different time-scales and (2) independently sample scattering covariance representations associated with each cluster. Using samples from each cluster as prior information, we formulate source separation as an optimization problem in the wavelet scattering covariance representation space, resulting in separated sources in the time domain. When applied to seismic data recorded during the NASA InSight mission on Mars, our multi-scale nested approach proves to be a powerful tool for discriminating between sources varying greatly in time-scale, e.g., minute-long transient one-sided pulses (known as ``glitches'') and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes. These results provide an opportunity to conduct further investigations into the isolated sources related to atmospheric-surface interactions, thermal relaxations, and other complex phenomena.
翻译:无监督源分离涉及通过混合算子记录的一组未知源信号,在关于源的先验知识有限且仅能获取信号混合数据集的情况下进行解耦。该问题本质上是不适定的,且因时间序列数据中源信号呈现的多时间尺度特性而进一步面临挑战。现有方法通常依赖预设窗口大小,限制了其处理多尺度源的能力。为解决此问题,我们提出一种基于小波散射协方差的无监督多尺度聚类与源分离框架。小波散射协方差可为随机过程提供低维表示,从而区分不同非高斯随机过程。在该表示空间的嵌套中,我们开发了因子化高斯混合变分自编码器,其训练目标为:(1) 在不同时间尺度上对源进行概率聚类;(2) 独立采样与各聚类相关的散射协方差表示。以各聚类样本为先验信息,我们将源分离问题转化为小波散射协方差表示空间中的优化问题,从而在时域得到分离后的源信号。将该方法应用于NASA洞察号火星任务期间记录的地震数据后,我们的多尺度嵌套方法展现出强大能力,可区分时间尺度差异显著的源信号,例如持续数分钟的瞬态单侧脉冲(即“毛刺信号”)与大气活动导致的、持续数十分钟的结构性环境噪声。这些结果为深入探究与大气-地表相互作用、热弛豫及其他复杂现象相关的孤立源信号提供了契机。