Rethinking Large Language Models For Irregular Time Series Classification In Critical Care

Time series data from the Intensive Care Unit (ICU) provides critical information for patient monitoring. While recent advancements in applying Large Language Models (LLMs) to time series modeling (TSM) have shown great promise, their effectiveness on the irregular ICU data, characterized by particularly high rates of missing values, remains largely unexplored. This work investigates two key components underlying the success of LLMs for TSM: the time series encoder and the multimodal alignment strategy. To this end, we establish a systematic testbed to evaluate their impact across various state-of-the-art LLM-based methods on benchmark ICU datasets against strong supervised and self-supervised baselines. Results reveal that the encoder design is more critical than the alignment strategy. Encoders that explicitly model irregularity achieve substantial performance gains, yielding an average AUPRC increase of $12.8\%$ over the vanilla Transformer. While less impactful, the alignment strategy is also noteworthy, with the best-performing semantically rich, fusion-based strategy achieving a modest $2.9\%$ improvement over cross-attention. However, LLM-based methods require at least 10$\times$ longer training than the best-performing irregular supervised models, while delivering only comparable performance. They also underperform in data-scarce few-shot learning settings. These findings highlight both the promise and current limitations of LLMs for irregular ICU time series. The code is available at https://github.com/mHealthUnimelb/LLMTS.

翻译：重症监护病房（ICU）的时间序列数据为患者监测提供了关键信息。尽管近期将大语言模型（LLMs）应用于时间序列建模（TSM）的研究进展显示出巨大潜力，但其在具有极高缺失值特征的不规则ICU数据上的有效性仍未被充分探索。本研究深入探讨了LLMs在TSM中取得成功背后的两个关键组成部分：时间序列编码器和多模态对齐策略。为此，我们建立了一个系统性测试平台，在基准ICU数据集上，将多种基于LLM的先进方法与强大的监督及自监督基线模型进行对比，评估这两个组件的影响。结果表明，编码器设计比对齐策略更为关键。显式建模不规则性的编码器实现了显著的性能提升，其平均AUPRC比原始Transformer提高了$12.8\%$。虽然影响较小，但最佳的对齐策略——基于语义丰富的融合策略——相比交叉注意力机制仍取得了$2.9\%$的适度提升，同样值得关注。然而，基于LLM的方法需要至少10$\times$于最佳不规则监督模型的训练时间，却仅能提供与之相当的性能。在数据稀缺的少样本学习场景中，它们的表现也不尽如人意。这些发现凸显了LLMs在处理不规则ICU时间序列方面的潜力与当前局限。代码可在 https://github.com/mHealthUnimelb/LLMTS 获取。