In the contemporary information era, significantly accelerated by the advent of Large-scale Language Models, the proliferation of scientific literature is reaching unprecedented levels. Researchers urgently require efficient tools for reading and summarizing academic papers, uncovering significant scientific literature, and employing diverse interpretative methodologies. To address this burgeoning demand, the role of automated scientific literature interpretation systems has become paramount. However, prevailing models, both commercial and open-source, confront notable challenges: they often overlook multimodal data, grapple with summarizing over-length texts, and lack diverse user interfaces. In response, we introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages, incorporating LLMs to augment its functionality. Our system first employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately. It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section. Following this, we introduce a hierarchical discourse-aware summarization method. It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs with specific prompts. Finally, we have designed four types of diversified user interfaces, including paper recommendation, multimodal Q\&A, audio broadcasting, and interpretation blog, which can be widely applied across various scenarios. Our qualitative and quantitative evaluations underscore the system's superiority, especially in scientific summarization, where it outperforms solutions relying solely on GPT-4.
翻译:在当代信息时代,大规模语言模型的兴起极大加速了科学文献的激增,其数量已达前所未有的水平。研究人员迫切需要高效工具来阅读和总结学术论文、发现重要科学文献,并采用多样化的解读方法。为满足这一日益增长的需求,自动科学文献解读系统的作用变得至关重要。然而,现有模型(无论是商业还是开源)面临显著挑战:它们常常忽视多模态数据,难以处理超长文本的总结,且缺乏多样化的用户界面。为此,我们提出了一种开源的多模态自动学术论文解读系统(MMAPIS),采用三阶段处理流程,并结合大语言模型(LLMs)增强其功能。该系统首先通过混合模态预处理与对齐模块,从文档中分别提取纯文本及表格或图表;随后根据这些信息所属的章节名称进行对齐,确保相同章节名称的数据归类于同一章节。接着,我们引入了一种层次化的话语感知总结方法。该方法利用提取的章节名称将文章划分为较短的文本片段,并通过带有特定提示的大语言模型实现章节内部及跨章节的特定总结。最后,我们设计了四种多样化的用户界面,包括论文推荐、多模态问答、音频播报和解读博客,可广泛应用于多种场景。我们的定性与定量评估凸显了系统的优越性,特别是在科学总结任务中,其表现优于仅依赖GPT-4的解决方案。