Software engineering teams increasingly depend on GitHub issue threads to coordinate work, report bugs, and negotiate technical decisions, yet most repository health tools focus on code metrics and ignore the conversational dynamics that drive or stall development. This paper presents SentTrack, a dual-lens framework for detecting socio-technical bottlenecks from GitHub issue discussions. Applied to the AvaloniaUI open-source repository across approximately 9,000 issue threads, the framework addresses three questions: how to automate workflow-inefficiency detection from real-time conversational data, whether sentiment signals can surface risk earlier than traditional label-based methods, and how to isolate human narrative from machine-generated noise in mixed-media issue text. SentTrack combines two complementary pipelines. A horizontal pipeline translates raw issue reports into clean summaries using a large language model, extracts mid-level concern phrases, and clusters them through UMAP and HDBSCAN, producing 613 semantic clusters from the first 3,608 issues processed. A vertical pipeline applies the ABCDE collaborative interaction framework to classify each comment and infer thread-level outcomes. Across the full corpus, 49\% of threads ended in stagnation and only 13\% reached resolution, with the resolution gap identified as the dominant bottleneck signal. A weighted scoring engine that combines negativity, stagnation, resolution gap, and thread length gives maintainers an interpretable prioritization tool for high-friction discussions before they stall development.
翻译:软件工程团队日益依赖GitHub问题线程来协调工作、报告缺陷并协商技术决策,然而多数仓库健康工具仅关注代码指标,忽视了推动或阻碍开发的对话动态。本文提出SentTrack——一种从GitHub问题讨论中检测社会技术瓶颈的双重视角框架。将该框架应用于AvaloniaUI开源仓库中约9000个问题线程,我们探究三个问题:如何从实时对话数据中自动检测工作流低效现象;情感信号能否比传统基于标签的方法更早揭示风险;如何在混合媒体问题文本中分离人类叙述与机器生成的噪声。SentTrack融合两条互补流水线。水平流水线利用大语言模型将原始问题报告转化为摘要,提取中层关注短语,并通过UMAP和HDBSCAN聚类,从首批处理的3608个问题中生成613个语义簇。垂直流水线应用ABCDE协作交互框架对每条评论进行分类并推断线程级结果。在全量语料中,49%的线程以停滞告终,仅13%达成解决方案,其中解决方案差距被识别为主要瓶颈信号。结合负面情绪、停滞状态、解决方案差距和线程长度的加权评分引擎,为主力开发者提供了可解释的优先级工具,用于识别开发停滞前的高摩擦讨论。