Limited availability of labeled data for machine learning on biomedical time-series hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without labels. However, current SSL methods require expensive computations for negative pairs and are designed for single modalities, limiting their versatility. To overcome these limitations, we introduce CroSSL (Cross-modal SSL). CroSSL introduces two novel concepts: masking intermediate embeddings from modality-specific encoders and aggregating them into a global embedding using a cross-modal aggregator. This enables the handling of missing modalities and end-to-end learning of cross-modal patterns without prior data preprocessing or time-consuming negative-pair sampling. We evaluate CroSSL on various multimodal time-series benchmarks, including both medical-grade and consumer biosignals. Our results demonstrate superior performance compared to previous SSL techniques and supervised benchmarks with minimal labeled data. We additionally analyze the impact of different masking ratios and strategies and assess the robustness of the learned representations to missing modalities. Overall, our work achieves state-of-the-art performance while highlighting the benefits of masking latent embeddings for cross-modal learning in temporal health data.
翻译:生物医学时间序列上机器学习标注数据的有限性制约了该领域的进展。自监督学习(SSL)是一种无需标签即可学习数据表示的有效方法。然而,当前的SSL方法需要为负样本对进行昂贵的计算,且针对单模态设计,限制了其通用性。为克服这些局限,我们提出CroSSL(跨模态SSL)。CroSSL引入两个创新概念:对来自模态特定编码器的中间嵌入进行掩码,并通过跨模态聚合器将其聚合成全局嵌入。这使得能够处理缺失模态,并在无需先验数据预处理或耗时的负样本对采样的情况下,实现跨模态模式的端到端学习。我们在多种多模态时间序列基准上评估了CroSSL,包括医疗级和消费者生物信号。结果表明,与先前的SSL技术及使用极少量标注数据的监督基准相比,我们的方法实现了更优性能。我们进一步分析了不同掩码比率和策略的影响,并评估了所学表示对缺失模态的鲁棒性。总体而言,我们的工作在时间序列健康数据的跨模态学习中实现了最先进性能,同时凸显了掩码潜在嵌入的优势。