Scientific knowledge discovery increasingly relies on large language models, yet many existing scholarly assistants depend on proprietary systems with tens or hundreds of billions of parameters. Such reliance limits reproducibility and accessibility for the research community. In this work, we ask a simple question: do we need bigger models for scientific applications? Specifically, we investigate to what extent carefully designed retrieval pipelines can compensate for reduced model scale in scientific applications. We design a lightweight retrieval-augmented framework that performs task-aware routing to select specialized retrieval strategies based on the input query. The system further integrates evidence from full-text scientific papers and structured scholarly metadata, and employs compact instruction-tuned language models to generate responses with citations. We evaluate the framework across several scholarly tasks, focusing on scholarly question answering (QA), including single- and multi-document scenarios, as well as biomedical QA under domain shift and scientific text compression. Our findings demonstrate that retrieval and model scale are complementary rather than interchangeable. While retrieval design can partially compensate for smaller models, model capacity remains important for complex reasoning tasks. This work highlights retrieval and task-aware design as key factors for building practical and reproducible scholarly assistants.
翻译:科学知识发现日益依赖于大型语言模型,然而许多现有的学术助手依赖于拥有数百亿甚至数千亿参数的专有系统。这种依赖性限制了研究社区的可复现性和可获取性。在本文中,我们提出一个简单的问题:科学应用是否需要更大的模型?具体来说,我们探究精心设计的检索流程在多大程度上能够弥补科学应用中模型规模的缩减。我们设计了一个轻量级的检索增强框架,该框架通过执行任务感知路由,根据输入查询选择专门的检索策略。该系统进一步整合来自全文科学论文和结构化学术元数据的证据,并采用紧凑型指令微调语言模型生成带有引用的回答。我们在多个学术任务上评估该框架,重点关注学术问答(QA),包括单文档和多文档场景,以及领域迁移下的生物医学问答和科学文本压缩。我们的研究结果表明,检索与模型规模是互补的,而非可互换的。虽然检索设计可以部分弥补较小模型的不足,但模型容量对于复杂的推理任务仍然重要。这项工作强调了检索和任务感知设计是构建实用且可复现的学术助手的关键因素。