Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources. Existing methods typically rely on a preselected window size that determines their operating timescale, limiting their capacity to handle multi-scale sources. To address this issue, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering spectra that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial Gaussian-mixture variational autoencoder that is trained to (1) probabilistically cluster sources at different timescales and (2) independently sample scattering spectra representations associated with each cluster. As the final stage, using samples from each cluster as prior information, we formulate source separation as an optimization problem in the wavelet scattering spectra representation space, aiming to separate sources in the time domain. When applied to the entire seismic dataset recorded during the NASA InSight mission on Mars, containing sources varying greatly in timescale, our multi-scale nested approach proves to be a powerful tool for disentangling such different sources, e.g., minute-long transient one-sided pulses (known as ``glitches'') and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes. These results provide an opportunity to conduct further investigations into the isolated sources related to atmospheric-surface interactions, thermal relaxations, and other complex phenomena.
翻译:无监督源分离旨在通过混合算子记录的一组未知源信号中解析出各个源成分,其前提是对源信号仅有有限先验知识,且仅能获取信号混合数据集。该问题本质上是病态的,而源信号呈现的不同时间尺度进一步增加了挑战。现有方法通常依赖预设的时间窗口大小确定其操作时间尺度,限制了处理多尺度源信号的能力。针对该问题,我们提出一种基于小波散射谱的无监督多尺度聚类与源分离框架。小波散射谱能够提供随机过程的低维表示,并区分不同非高斯随机过程。在该表示空间内,我们进一步构建了因子化高斯混合变分自编码器,其训练目标为:(1) 在不同时间尺度上对源信号进行概率聚类;(2) 独立采样与各聚类对应的散射谱表示。最后,以各聚类的采样结果作为先验信息,我们将源分离问题转化为小波散射谱表示空间中的优化问题,从而在时域实现源信号分离。将该方法应用于NASA"洞察号"火星任务记录的全套地震数据集(其中包含时间尺度差异极大的源信号)时,我们的多尺度嵌套方法展现出强大性能,能够有效分离这些差异化源信号,例如持续数分钟的瞬态单边脉冲(即"毛刺信号")与由大气活动引发、通常持续数十分钟的结构化环境噪声。这些结果为进一步研究孤立源信号(涉及大气-地表相互作用、热弛豫及其他复杂现象)提供了重要契机。