The identification and ranking of impacted files within software reposi-tories is a key challenge in change impact analysis. Existing deterministic approaches that combine heuristic signals, semantic similarity measures, and graph-based centrality metrics have demonstrated effectiveness in nar-rowing candidate search spaces, yet their recall plateaus. This limitation stems from the treatment of features as linearly independent contributors, ignoring contextual dependencies and relationships between metrics that characterize expert reasoning patterns. To address this limitation, we propose the application of Multi-Head Self-Attention as a post-deterministic scoring refinement mechanism. Our approach learns contextual weighting between features, dynamically adjust-ing importance levels per file based on relational behavior exhibited across candidate file sets. The attention mechanism produces context-aware adjustments that are additively combined with deterministic scores, pre-serving interpretability while enabling reasoning similar to that performed by experts when reviewing change surfaces. We focus on recall rather than precision, as false negatives (missing impacted files) are far more costly than false positives (irrelevant files that can be quickly dismissed during review). Empirical evaluation on 200 test cases demonstrates that the introduc-tion of self-attention improves Top-50 recall from approximately 62-65% to between 78-82% depending on repository complexity and structure, achiev-ing 80% recall at Top-50 files. Expert validation yields improvement from 6.5/10 to 8.6/10 in subjective accuracy alignment. This transformation bridges the reasoning capability gap between deterministic automation and expert judgment, improving recall in repository-aware effort estimation.
翻译:在变更影响分析中,识别和排序软件仓库中受影响的文件是一个关键挑战。现有的确定性方法结合了启发式信号、语义相似性度量和基于图的中心性指标,在缩小候选搜索空间方面已证明有效,但其召回率存在瓶颈。这一局限源于将特征视为线性独立的贡献因素,忽略了表征专家推理模式的指标之间的上下文依赖关系和关联。为克服此局限,我们提出应用多头自注意力作为后确定性评分优化机制。该方法学习特征间的上下文权重,根据候选文件集所展现的关系行为动态调整每个文件的重要性级别。注意力机制产生上下文感知的调整值,这些调整值与确定性评分进行加性结合,在保持可解释性的同时,实现了类似于专家审查变更表面时的推理过程。我们更关注召回率而非精确率,因为假阴性(遗漏受影响文件)的代价远高于假阳性(可在评审过程中快速排除的无关文件)。基于200个测试案例的实证评估表明,引入自注意力机制可将Top-50召回率从约62-65%提升至78-82%(具体取决于仓库的复杂性和结构),在Top-50文件中实现80%的召回率。专家验证显示主观准确性对齐评分从6.5/10提升至8.6/10。这一转变弥合了确定性自动化与专家判断之间的推理能力差距,从而提升了仓库感知工作量估计中的召回率。