Plots Unlock Time-Series Understanding in Multimodal Models

Mayank Daswani,Mathias M. J. Bellaiche,Marc Wilson,Desislav Ivanov,Mikhail Papkov,Eva Schnider,Jing Tang,Kay Lamerigts,Gabriela Botea,Michael A. Sanchez,Yojan Patel,Shruthi Prabhakara,Shravya Shetty,Umesh Telang

from arxiv, 49 pages

While multimodal foundation models can now natively work with data beyond text, they remain underutilized in analyzing the considerable amounts of multi-dimensional time-series data in fields like healthcare, finance, and social sciences, representing a missed opportunity for richer, data-driven insights. This paper proposes a simple but effective method that leverages the existing vision encoders of these models to "see" time-series data via plots, avoiding the need for additional, potentially costly, model training. Our empirical evaluations show that this approach outperforms providing the raw time-series data as text, with the additional benefit that visual time-series representations demonstrate up to a 90% reduction in model API costs. We validate our hypothesis through synthetic data tasks of increasing complexity, progressing from simple functional form identification on clean data, to extracting trends from noisy scatter plots. To demonstrate generalizability from synthetic tasks with clear reasoning steps to more complex, real-world scenarios, we apply our approach to consumer health tasks - specifically fall detection, activity recognition, and readiness assessment - which involve heterogeneous, noisy data and multi-step reasoning. The overall success in plot performance over text performance (up to an 120% performance increase on zero-shot synthetic tasks, and up to 150% performance increase on real-world tasks), across both GPT and Gemini model families, highlights our approach's potential for making the best use of the native capabilities of foundation models.

翻译：尽管多模态基础模型现已能够原生处理文本以外的数据，但在分析医疗、金融和社会科学等领域中大量存在的多维时间序列数据时，这些模型仍未得到充分利用，这错失了获得更丰富数据驱动见解的机会。本文提出一种简单而有效的方法，利用这些模型现有的视觉编码器通过图表来"观察"时间序列数据，从而避免额外且可能成本高昂的模型训练。我们的实证评估表明，该方法优于将原始时间序列数据作为文本输入的方式，并具有额外优势：视觉化时间序列表示可使模型API成本降低高达90%。我们通过复杂度递增的合成数据任务验证了该假设，从清洁数据中的简单函数形式识别，逐步推进到从噪声散点图中提取趋势。为证明从具有清晰推理步骤的合成任务到更复杂现实场景的泛化能力，我们将该方法应用于消费者健康任务——具体包括跌倒检测、活动识别和就绪状态评估——这些任务涉及异构噪声数据和多步推理。在GPT和Gemini模型系列中，图表性能相较于文本性能的整体优势（零样本合成任务性能提升最高达120%，现实任务性能提升最高达150%）凸显了我们这种方法在充分利用基础模型原生能力方面的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日