In this paper, we propose the first VL$\underline{\textbf{M}}$ $\underline{\textbf{a}}$gentic $\underline{\textbf{r}}$easoning framework for few-$\underline{\textbf{s}}$hot multimodal $\underline{\textbf{T}}$ime $\underline{\textbf{S}}$eries $\underline{\textbf{C}}$lassification ($\textbf{MarsTSC}$), which introduces a self-evolving knowledge bank as a dynamic context iteratively refined via reflective agentic reasoning. The framework comprises three collaborative roles: i) Generator conducts reliable classification via reasoning; ii) Reflector diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator; iii) Modifier applies verified updates to the knowledge bank to prevent context collapse. We further introduce a test-time update strategy to enable cautious, continuous knowledge bank refinement to mitigate few-shot bias and distribution shift. Extensive experiments across 12 mainstream time series benchmarks demonstrate that $\textbf{MarsTSC}$ delivers substantial and consistent performance gains across 6 VLM backbones, outperforming both classical and foundation model-based time series baselines under few-shot conditions, while producing interpretable rationales that ground each classification decision in human-readable feature evidence.
翻译:在本文中,我们提出了首个面向小样本多模态时间序列分类的视觉语言模型智能体推理框架($\textbf{MarsTSC}$),该框架引入了一个自演进知识库,通过反思式智能体推理迭代优化动态上下文。该框架包含三个协作角色:i)生成器通过推理执行可靠分类;ii)反思器诊断推理错误的根本原因,针对生成器忽视的时间特征生成判别性洞察;iii)修改器将验证后的更新应用于知识库,以防止上下文崩溃。我们进一步提出了一种测试时更新策略,实现对知识库的谨慎持续精炼,以缓解小样本偏差和分布偏移。在12个主流时间序列基准上的大量实验表明,$\textbf{MarsTSC}$在6种视觉语言模型主干上均取得了显著且一致的性能提升,在小样本条件下优于基于经典方法和基础模型的时间序列基线,同时生成可解释的推理依据,将每个分类决策锚定于人类可读的特征证据。