Uni-SMART: Universal Science Multimodal Analysis and Research Transformer

Hengxing Cai,Xiaochen Cai,Shuwen Yang,Jiankun Wang,Lin Yao,Zhifeng Gao,Junhan Chang,Sihang Li,Mingjun Xu,Changxin Wang,Hongshuai Wang,Yongge Li,Mujie Lin,Yaqi Li,Yuqi Yin,Linfeng Zhang,Guolin Ke

In scientific research and its application, scientific literature analysis is crucial as it allows researchers to build on the work of others. However, the fast growth of scientific knowledge has led to a massive increase in scholarly articles, making in-depth literature analysis increasingly challenging and time-consuming. The emergence of Large Language Models (LLMs) has offered a new way to address this challenge. Known for their strong abilities in summarizing texts, LLMs are seen as a potential tool to improve the analysis of scientific literature. However, existing LLMs have their own limits. Scientific literature often includes a wide range of multimodal elements, such as molecular structure, tables, and charts, which are hard for text-focused LLMs to understand and analyze. This issue points to the urgent need for new solutions that can fully understand and analyze multimodal content in scientific literature. To answer this demand, we present Uni-SMART (Universal Science Multimodal Analysis and Research Transformer), an innovative model designed for in-depth understanding of multimodal scientific literature. Through rigorous quantitative evaluation across several domains, Uni-SMART demonstrates superior performance over leading text-focused LLMs. Furthermore, our exploration extends to practical applications, including patent infringement detection and nuanced analysis of charts. These applications not only highlight Uni-SMART's adaptability but also its potential to revolutionize how we interact with scientific literature.

翻译：在科学研究及其应用中，科学文献分析至关重要，它使研究者能够在前人工作的基础上推进。然而，科学知识的快速增长导致学术论文数量激增，使得深入文献分析日益困难且耗时。大规模语言模型（LLMs）的出现为应对这一挑战提供了新途径。凭借其强大的文本总结能力，LLMs被视为改进科学文献分析的潜在工具。但现有LLMs存在局限性：科学文献常包含分子结构、表格和图表等广泛的多模态元素，这是以文本为中心的LLMs难以理解分析的。这一问题凸显了对能够全面理解并分析科学文献中多模态内容的新解决方案的迫切需求。针对这一需求，我们提出了Uni-SMART（通用科学多模态分析与研究Transformer），这是一种专为深度理解多模态科学文献而设计的创新模型。通过跨多个领域的严格定量评估，Uni-SMART在性能上优于领先的文本型LLMs。此外，我们的探索延伸至实际应用，包括专利侵权检测和图表精细分析。这些应用不仅彰显了Uni-SMART的适应性，更展现了其革新人类与科学文献交互方式的潜力。