Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning

翻译：Eliot：基于在线数据与学习的快速演变科学文献趋势交互式探索系统

Bernardo A. Denkvitts,Nitin Gupta,Biplav Srivastava

from arxiv, Under-review at CIKM Applied Research 2026

The rapid growth of scientific publishing has made it increasingly difficult to track how fast-moving areas evolve. Search engines and LLM-based assistants retrieve or summarize papers, but often hide how the corpus was selected, organized, or connected to temporal patterns. We present $\texttt{Eliot}$, a publicly deployed interactive system for traceable exploration of evolving scientific literature. Motivated by two studies on Large Language Models (LLMs) and Automated Planning and Scheduling (APS), $\texttt{Eliot}$ generalizes literature-evolution analysis beyond hand-built taxonomies and domain-specific scripts. Given explicit query terms and filters, it retrieves arXiv papers at query time, represents each paper by title and abstract, clusters the corpus into themes, assigns representative keywords, and visualizes each cluster's publication-year distribution. We evaluate $\texttt{Eliot}$ as both an applied system and an interactive research aid. An offline configuration study across eight arXiv domains compares document representations, dimensionality reduction methods, and clustering algorithms using intrinsic clustering and topic-coherence metrics; the results support MiniLM embeddings with 10-dimensional UMAP and Agglomerative Clustering as a practical default. A scenario-based survey and expert focus group assess interpretability and use contexts: participants rated cluster labels as meaningful in 85% of scenario responses, and feedback indicated that $\texttt{Eliot}$ is most valuable for auditable overviews of rapidly changing technical areas. These results suggest that query-time clustering and temporal inspection can complement search and generation tools by helping researchers inspect and refine the evidence behind literature trends.

翻译：科学出版物的快速增长使得追踪快速演变领域的动态日益困难。搜索引擎和基于大语言模型的助手能够检索或总结论文，但往往隐藏了文献库的选择、组织方式及其与时间模式的关联。我们提出了**Eliot**——一个面向公众部署的交互式系统，用于对演变中的科学文献进行可追溯的探索。受大语言模型和自动规划与调度两项研究的启发，**Eliot**将文献演变分析推广至无需手工构建分类体系或领域特定脚本的通用框架。给定显式查询词与筛选条件后，系统在查询时实时检索arXiv论文，通过标题与摘要表征每篇论文，对文献库进行主题聚类，为每个聚类分配代表性关键词，并可视化各聚类论文的发表年份分布分布。我们从应用系统与交互式研究辅助工具两个角度对**Eliot**进行评估。首先，跨八个arXiv领域的离线配置研究采用内在聚类指标与主题一致性指标，比较了文档表征、降维方法与聚类算法的性能，结果支持采用MiniLM嵌入结合10维UMAP与层次凝聚聚类作为实用默认方案。其次，基于场景的问卷调查与专家焦点小组评估了系统的可解释性与使用场景：参与者认为85%的场景响应中聚类标签具有意义，反馈表明**Eliot**在提供快速变化技术领域的可审计概览方面最具价值。这些结果表明，通过协助研究者审视并优化文献趋势背后的证据，实时聚类与时间维检视能够有效补充现有搜索与生成工具的功能。