Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of time-scales exhibited by sources in time series data. Existing methods typically rely on a preselected window size that limits their capacity to handle multi-scale sources. To address this issue, instead of operating in the time domain, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering covariances that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial Gaussian-mixture variational autoencoder that is trained to (1) probabilistically cluster sources at different time-scales and (2) independently sample scattering covariance representations associated with each cluster. Using samples from each cluster as prior information, we formulate source separation as an optimization problem in the wavelet scattering covariance representation space, resulting in separated sources in the time domain. When applied to seismic data recorded during the NASA InSight mission on Mars, our multi-scale nested approach proves to be a powerful tool for discriminating between sources varying greatly in time-scale, e.g., minute-long transient one-sided pulses (known as ``glitches'') and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes. These results provide an opportunity to conduct further investigations into the isolated sources related to atmospheric-surface interactions, thermal relaxations, and other complex phenomena.
翻译:无监督源分离旨在通过混合算子记录的一组未知源信号中分离信号,但对源的先验知识有限,且仅能获取信号混合数据集。该问题本质上是病态的,且时间序列数据中源信号呈现的不同时间尺度进一步增加了挑战。现有方法通常依赖预选窗口大小,这限制了其处理多尺度源的能力。为解决这一问题,我们不直接在时域操作,而是提出一种无监督多尺度聚类与源分离框架,利用小波散射协方差提供随机过程的低维表示,该表示能够区分不同的非高斯随机过程。在此表示空间内嵌套开发了因子高斯混合变分自编码器,该编码器经过训练可:(1) 在不同时间尺度上概率性聚类源信号;(2) 独立采样与各聚类相关的散射协方差表示。利用每个聚类的样本作为先验信息,我们将源分离转化为小波散射协方差表示空间中的优化问题,最终在时域中得到分离后的源信号。将该方法应用于NASA洞察号火星任务期间记录的地震数据时,我们提出的多尺度嵌套方法展现出强大能力,可区分时间尺度差异显著的源信号,例如持续数分钟的瞬态单侧脉冲(即“毛刺信号”)和由大气活动引起的通常持续数十分钟的结构化环境噪声。这些结果为深入研究与大气-表面相互作用、热弛豫及其他复杂现象相关的孤立源信号提供了可能。