Characterizing and Classifying Developer Forum Posts with their Intentions

With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.

翻译：随着开发者社区的蓬勃发展，在线技术论坛的帖子数量快速增长，这使得用户难以筛选有用帖子并定位重要信息。标签为用户提供了简洁的特征维度以寻找感兴趣的帖子，并为搜索引擎根据查询索引最相关帖子提供了依据。然而，大多数标签仅聚焦于技术视角（如编程语言、平台、工具）。在多数情况下，在线开发者社区中的论坛帖子反映了作者解决问题的意图、寻求建议、分享信息等。对帖子意图的建模可以为现有标签分类体系提供额外维度。通过参考前人研究并借鉴工业视角，我们构建了技术论坛帖子意图的精细分类体系。通过对从在线论坛抽取的样本帖子数据集进行人工标注与分析，我们理解了帖子构成（代码、错误信息）与其意图之间的关联性。此外，受人工研究启发，我们设计了一种基于预训练Transformer的模型来自动预测帖子意图。我们的意图预测框架的最佳变体取得了Micro F1分数0.589、Top 1-3准确率62.6%至87.8%、平均AUC 0.787的成绩，优于当前最先进的基线方法。对论坛帖子意图的刻画与自动分类，可帮助论坛维护者或第三方工具开发者改进技术论坛中帖子的组织与检索。我们已在补充材料包中发布了标注数据集和代码。