Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources in time series data from planetary space missions. As such, a systematic multi-scale unsupervised approach is needed to identify and separate sources at different timescales. Existing methods typically rely on a preselected window size that determines their operating timescale, limiting their capacity to handle multi-scale sources. To address this issue, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering spectra that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial variational autoencoder that is trained to probabilistically cluster sources at different timescales. To perform source separation, we use samples from clusters at multiple timescales obtained via the factorial variational autoencoder as prior information and formulate an optimization problem in the wavelet scattering spectra representation space. When applied to the entire seismic dataset recorded during the NASA InSight mission on Mars, containing sources varying greatly in timescale, our approach disentangles such different sources, e.g., minute-long transient one-sided pulses (known as "glitches") and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes, and provides an opportunity to conduct further investigations into the isolated sources.
翻译:无监督源分离涉及通过混合算子记录的未知源信号集合的解析,对源信号的先验知识有限,仅能访问信号混合的数据集。该问题本质上是病态的,并且由于行星空间任务时间序列数据中源信号表现出的多时间尺度特性而更具挑战性。因此,需要一种系统的多尺度无监督方法来识别和分离不同时间尺度的源信号。现有方法通常依赖于预选的窗口大小来确定其操作时间尺度,这限制了其处理多尺度源信号的能力。为解决这一问题,我们提出了一种利用小波散射谱的无监督多尺度聚类与源分离框架,该框架能够提供随机过程的低维表示,并区分不同的非高斯随机过程。在此表示空间内,我们嵌套开发了一个因子变分自编码器,该编码器经过训练以概率方式对不同时间尺度的源信号进行聚类。为实现源分离,我们使用通过因子变分自编码器获得的多尺度聚类样本作为先验信息,并在小波散射谱表示空间中构建优化问题。当应用于NASA InSight火星任务记录的整个地震数据集时(该数据集包含时间尺度差异极大的源信号),我们的方法成功分离了这些不同的源信号,例如持续数分钟的单边瞬态脉冲(称为"毛刺")以及通常持续数十分钟的大气活动产生的结构化环境噪声,并为对分离出的源信号进行进一步研究提供了可能。