Scientific knowledge discovery increasingly relies on large language models, yet many existing scholarly assistants depend on proprietary systems with tens or hundreds of billions of parameters. Such reliance limits reproducibility and accessibility for the research community. In this work, we ask a simple question: do we need bigger models for scientific applications? Specifically, we investigate to what extent carefully designed retrieval pipelines can compensate for reduced model scale in scientific applications. We design a lightweight retrieval-augmented framework that performs task-aware routing to select specialized retrieval strategies based on the input query. The system further integrates evidence from full-text scientific papers and structured scholarly metadata, and employs compact instruction-tuned language models to generate responses with citations. We evaluate the framework across several scholarly tasks, focusing on scholarly question answering (QA), including single- and multi-document scenarios, as well as biomedical QA under domain shift and scientific text compression. Our findings demonstrate that retrieval and model scale are complementary rather than interchangeable. While retrieval design can partially compensate for smaller models, model capacity remains important for complex reasoning tasks. This work highlights retrieval and task-aware design as key factors for building practical and reproducible scholarly assistants.
翻译:科学知识发现越来越依赖大型语言模型,然而许多现有学术助手依赖于参数规模达数百亿甚至数千亿的专有系统。这种依赖性限制了研究社区的可复现性和可访问性。本文提出一个简单问题:科学应用是否需要更大模型?具体而言,我们研究精心设计的检索流程在多大程度上能够弥补模型规模缩减带来的影响。我们设计了一个轻量级检索增强框架,该框架根据输入查询进行任务感知路由,选择专门化的检索策略。系统进一步整合来自全文科学论文和结构化学术元数据的证据,并采用轻量级指令调优语言模型生成带引用的回复。我们在多个学术任务上评估该框架,重点关注学术问答任务,包括单文档和多文档场景,以及领域迁移下的生物医学问答和科学文本压缩。实验结果表明,检索与模型规模是互补而非替代关系:虽然检索设计能在一定程度上弥补小模型的不足,但模型容量对复杂推理任务仍至关重要。本文强调检索和任务感知设计是构建实用且可复现学术助手的关键因素。