Towards Refining Developer Questions using LLM-Based Named Entity Recognition for Developer Chatroom Conversations

In software engineering chatrooms, communication is often hindered by imprecise questions that cannot be answered. Recognizing key entities can be essential for improving question clarity and facilitating better exchange. However, existing research using natural language processing techniques often overlooks these software-specific nuances. In this paper, we introduce Software-specific Named Entity Recognition, Intent Detection, and Resolution Classification (SENIR), a labeling approach that leverages a Large Language Model to annotate entities, intents, and resolution status in developer chatroom conversations. To offer quantitative guidance for improving question clarity and resolvability, we build a resolution prediction model that leverages SENIR's entity and intent labels along with additional predictive features. We evaluate SENIR on the DISCO dataset using a subset of annotated chatroom dialogues. SENIR achieves an 86% F-score for entity recognition, a 71% F-score for intent detection, and an 89% F-score for resolution status classification. Furthermore, our resolution prediction model, tested with various sampling strategies (random undersampling and oversampling with SMOTE) and evaluation methods (5-fold cross-validation, 10-fold cross-validation, and bootstrapping), demonstrates AUC values ranging from 0.7 to 0.8. Key factors influencing resolution include positive sentiment and entities such as Programming Language and User Variable across multiple intents, while diagnostic entities are more relevant in error-related questions. Moreover, resolution rates vary significantly by intent: questions about API Usage and API Change achieve higher resolution rates, whereas Discrepancy and Review have lower resolution rates. A Chi-Square analysis confirms the statistical significance of these differences.

翻译：在软件工程聊天室中，因问题表述不精确而无法获得解答的情况时常阻碍有效沟通。识别关键实体对于提升问题清晰度、促进高质量交流至关重要。然而，现有基于自然语言处理技术的研究往往忽视这些软件领域的特异性。本文提出一种软件领域专用的命名实体识别、意图检测与解决状态分类联合标注方法（SENIR），该方法利用大语言模型对开发者聊天室对话中的实体、意图及解决状态进行自动化标注。为提升问题清晰度与可解决性提供量化指导，我们构建了基于SENIR实体与意图标签及附加预测特征的解决状态预测模型。我们在DISCO数据集的标注对话子集上评估SENIR，其在实体识别任务中获得86%的F值，意图检测任务获得71%的F值，解决状态分类任务获得89%的F值。进一步地，我们采用多种采样策略（随机欠采样与基于SMOTE的过采样）和评估方法（5折交叉验证、10折交叉验证及自助法）测试解决状态预测模型，其AUC值介于0.7至0.8之间。影响问题解决的关键因素包括积极情感倾向以及跨多意图出现的"编程语言"与"用户变量"实体，而诊断类实体在错误相关提问中更为重要。不同意图的问题解决率存在显著差异：涉及API使用与API变更的提问解决率较高，而涉及差异说明与代码审查的提问解决率较低。卡方检验证实了这些差异的统计显著性。