Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System

In the contemporary information era, significantly accelerated by the advent of Large-scale Language Models, the proliferation of scientific literature is reaching unprecedented levels. Researchers urgently require efficient tools for reading and summarizing academic papers, uncovering significant scientific literature, and employing diverse interpretative methodologies. To address this burgeoning demand, the role of automated scientific literature interpretation systems has become paramount. However, prevailing models, both commercial and open-source, confront notable challenges: they often overlook multimodal data, grapple with summarizing over-length texts, and lack diverse user interfaces. In response, we introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages, incorporating LLMs to augment its functionality. Our system first employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately. It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section. Following this, we introduce a hierarchical discourse-aware summarization method. It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs with specific prompts. Finally, we have designed four types of diversified user interfaces, including paper recommendation, multimodal Q\&A, audio broadcasting, and interpretation blog, which can be widely applied across various scenarios. Our qualitative and quantitative evaluations underscore the system's superiority, especially in scientific summarization, where it outperforms solutions relying solely on GPT-4.

翻译：在当代信息时代，大规模语言模型的兴起极大加速了科学文献的激增，其数量已达前所未有的水平。研究人员迫切需要高效工具来阅读和总结学术论文、发现重要科学文献，并采用多样化的解读方法。为满足这一日益增长的需求，自动科学文献解读系统的作用变得至关重要。然而，现有模型（无论是商业还是开源）面临显著挑战：它们常常忽视多模态数据，难以处理超长文本的总结，且缺乏多样化的用户界面。为此，我们提出了一种开源的多模态自动学术论文解读系统（MMAPIS），采用三阶段处理流程，并结合大语言模型（LLMs）增强其功能。该系统首先通过混合模态预处理与对齐模块，从文档中分别提取纯文本及表格或图表；随后根据这些信息所属的章节名称进行对齐，确保相同章节名称的数据归类于同一章节。接着，我们引入了一种层次化的话语感知总结方法。该方法利用提取的章节名称将文章划分为较短的文本片段，并通过带有特定提示的大语言模型实现章节内部及跨章节的特定总结。最后，我们设计了四种多样化的用户界面，包括论文推荐、多模态问答、音频播报和解读博客，可广泛应用于多种场景。我们的定性与定量评估凸显了系统的优越性，特别是在科学总结任务中，其表现优于仅依赖GPT-4的解决方案。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日