Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence of potential distributional shifts, we propose a frequency-aware masked autoencoder ($\texttt{bio}$FAME) that learns to parameterize the representation of biosignals in the frequency space. $\texttt{bio}$FAME incorporates a frequency-aware transformer, which leverages a fixed-size Fourier-based operator for global token mixing, independent of the length and sampling rate of inputs. To maintain the frequency components within each input channel, we further employ a frequency-maintain pretraining strategy that performs masked autoencoding in the latent space. The resulting architecture effectively utilizes multimodal information during pretraining, and can be seamlessly adapted to diverse tasks and modalities at test time, regardless of input size and order. We evaluated our approach on a diverse set of transfer experiments on unimodal time series, achieving an average of $\uparrow$5.5% improvement in classification accuracy over the previous state-of-the-art. Furthermore, we demonstrated that our architecture is robust in modality mismatch scenarios, including unpredicted modality dropout or substitution, proving its practical utility in real-world applications. Code will be available soon.
翻译:利用生物信号中的多模态信息对于构建人体身心状态的综合表征至关重要。然而,多模态生物信号在预训练与推理数据集之间常呈现显著分布偏移,这种偏移源于任务规格变化或模态组成差异。为在潜在分布偏移下实现有效预训练,我们提出了一种频率感知掩码自编码器($\texttt{bio}$FAME),该模型学习在频率空间中对生物信号表征进行参数化。$\texttt{bio}$FAME 采用频率感知Transformer,通过固定尺寸的傅里叶算子实现全局令牌混合,且该过程独立于输入的长度与采样率。为保持各输入通道内的频率分量,我们进一步采用频率保持预训练策略,在潜在空间中进行掩码自编码。所构建的架构在预训练阶段有效利用多模态信息,并在测试时可无缝适配不同任务与模态,不受输入尺寸与顺序限制。我们在单变量时间序列的迁移实验中评估了该方法,分类准确率较先前最优方法平均提升$\uparrow$5.5%。此外,我们证明了该架构在模态不匹配场景(如不可预测的模态缺失或替换)中具有鲁棒性,验证了其在现实应用中的实用价值。代码即将开源。