Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in terms of parameters and computational requirements limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, which trains the model to generate Python programs for numerical calculations, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module, which gradually merges most similar vision tokens. Extensive experiments demonstrate that our 3B TinyChart achieves SOTA performance on a variety of chart understanding benchmarks including ChartQA, Chart-to-Text, Chart-to-Table, OpenCQA, and ChartX. It outperforms several chart understanding MLLM with up to 13B parameters such as ChartLlama and ChartAst, and close-sourced general-purpose MLLM GPT-4V on ChartQA. It also demonstrates its superior efficiency with higher throughput during inference due to a smaller model scale and more efficient vision encoding. Our code and model are available at https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/TinyChart.
翻译:图表在呈现和解释复杂数据关系方面具有重要作用。近年来,多模态大语言模型在各类图表理解任务中展现出卓越能力。然而,此类模型在参数规模和计算需求上的巨大体量限制了其在资源受限环境中的应用。本文提出TinyChart——一个仅含3B参数的高效图表理解多模态大语言模型。TinyChart通过两项关键创新突破高效图表理解的瓶颈:(1)采用思维程序学习策略减轻数值计算的学习负担,该策略训练模型生成Python程序以执行数值计算;(2)通过视觉标记合并模块缩短视觉Transformer针对高分辨率图像产生的冗长视觉特征序列,该模块逐步融合最相似的视觉标记。大量实验表明,我们3B参数的TinyChart在ChartQA、Chart-to-Text、Chart-to-Table、OpenCQA及ChartX等多个图表理解基准测试中均达到最优性能。在ChartQA上,它超越ChartLlama、ChartAst等参数规模达13B的图表理解多模态大语言模型,以及闭源通用多模态大语言模型GPT-4V。得益于更小的模型规模和更高效的视觉编码,TinyChart在推理过程中展现出更高的吞吐量,从而实现更优的效率。我们的代码和模型已开源至https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/TinyChart。