Recent advances in visual language models have enabled autonomous agents for complex reasoning, tool use, and document understanding. However, existing document agents mainly transform papers into static artifacts such as summaries, webpages, or slides, which are insufficient for technical papers involving dynamic mechanisms and state transitions. In this work, we propose a Paper-to-Interactive-System Agent that converts research papers into executable interactive web systems. Given a PDF paper, the agent performs end-to-end processing without human intervention, including paper understanding, system modeling, and interactive webpage synthesis, enabling users to manipulate inputs and observe dynamic behaviors. To evaluate this task, we introduce a benchmark of 19 research papers paired with expert-built interactive systems as ground truth. We further propose PaperVoyager, a structured generation framework that explicitly models mechanisms and interaction logic during synthesis. Experiments show that PaperVoyager significantly improves the quality of generated interactive systems, offering a new paradigm for interactive scientific paper understanding.
翻译:视觉语言模型的最新进展使得自主代理能够完成复杂推理、工具使用和文档理解等任务。然而,现有文档代理主要将论文转化为静态产物(如摘要、网页或幻灯片),这对于涉及动态机制和状态转换的技术论文而言是不充分的。本文提出了一种论文到交互式系统代理,可将研究论文转化为可执行的交互式网页系统。给定PDF论文,该代理无需人工干预即可执行端到端处理,包括论文理解、系统建模和交互式网页合成,使用户能够操纵输入并观察动态行为。为评估此任务,我们引入了一个包含19篇研究论文的基准测试,每篇论文均配有专家构建的交互式系统作为真值。我们进一步提出PaperVoyager,一种结构化生成框架,可在合成过程中显式建模机制与交互逻辑。实验表明,PaperVoyager显著提升了所生成交互式系统的质量,为交互式科学论文理解提供了新范式。