Autodiscover: A reinforcement learning recommendation system for the cold-start imbalance challenge in active learning, powered by graph-aware thompson sampling

翻译：AutoDiscover：一种基于图感知汤普森采样的强化学习推荐系统，用于应对主动学习中的冷启动不平衡挑战

Parsa Vares

from arxiv, Master's Thesis, University of Luxembourg in collaboration with Luxembourg Institute of Science and Technology (LIST). Supervised by Prof. Jun Pang and Dr. Eloi Durant

Systematic literature reviews (SLRs) are fundamental to evidence-based research, but manual screening is an increasing bottleneck as scientific output grows. Screening features low prevalence of relevant studies and scarce, costly expert decisions. Traditional active learning (AL) systems help, yet typically rely on fixed query strategies for selecting the next unlabeled documents. These static strategies do not adapt over time and ignore the relational structure of scientific literature networks. This thesis introduces AutoDiscover, a framework that reframes AL as an online decision-making problem driven by an adaptive agent. Literature is modeled as a heterogeneous graph capturing relationships among documents, authors, and metadata. A Heterogeneous Graph Attention Network (HAN) learns node representations, which a Discounted Thompson Sampling (DTS) agent uses to dynamically manage a portfolio of query strategies. With real-time human-in-the-loop labels, the agent balances exploration and exploitation under non-stationary review dynamics, where strategy utility changes over time. On the 26-dataset SYNERGY benchmark, AutoDiscover achieves higher screening efficiency than static AL baselines. Crucially, the agent mitigates cold start by bootstrapping discovery from minimal initial labels where static approaches fail. We also introduce TS-Insight, an open-source visual analytics dashboard to interpret, verify, and diagnose the agent's decisions. Together, these contributions accelerate SLR screening under scarce expert labels and low prevalence of relevant studies.

翻译：系统文献综述（SLR）是循证研究的基础，但随着科学成果的快速增长，人工筛选日益成为瓶颈。筛选过程面临相关研究出现率低、专家决策稀缺且成本高昂的挑战。传统的主动学习（AL）系统虽能提供帮助，但通常依赖固定的查询策略来选择下一个未标记文档。这些静态策略无法随时间自适应调整，且忽略了科学文献网络的关系结构。本文提出AutoDiscover框架，将AL重新定义为由自适应智能体驱动的在线决策问题。该框架将文献建模为异质图，以捕捉文档、作者及元数据间的关系。通过异质图注意力网络（HAN）学习节点表征，再由折扣汤普森采样（DTS）智能体动态管理查询策略组合。借助实时人机协同标注，智能体能在非平稳的综述动态中平衡探索与利用，其中策略效用随时间变化。在包含26个数据集的SYNERGY基准测试中，AutoDiscover实现了比静态AL基线更高的筛选效率。关键的是，该智能体通过从极少量初始标签中引导发现，缓解了冷启动问题，而静态方法在此情况下往往失效。我们还引入了TS-Insight——一个开源可视化分析仪表盘，用于解释、验证和诊断智能体的决策。这些贡献共同在专家标注稀缺且相关研究低出现率的条件下，加速了SLR筛选进程。