A First Look at Information Highlighting in Stack Overflow Answers

Context: Navigating the knowledge of Stack Overflow (SO) remains challenging. To make the posts vivid to users, SO allows users to write and edit posts with Markdown or HTML so that users can leverage various formatting styles (e.g., bold, italic, and code) to highlight the important information. Nonetheless, there have been limited studies on the highlighted information. Objective: We carried out the first large-scale exploratory study on the information highlighted in SO answers in our recent study. To extend our previous study, we develop approaches to automatically recommend highlighted content with formatting styles using neural network architectures initially designed for the Named Entity Recognition task. Method: In this paper, we studied 31,169,429 answers of Stack Overflow. For training recommendation models, we choose CNN and BERT models for each type of formatting (i.e., Bold, Italic, Code, and Heading) using the information highlighting dataset we collected from SO answers. Results: Our models based on CNN architecture achieve precision ranging from 0.71 to 0.82. The trained model for automatic code content highlighting achieves a recall of 0.73 and an F1 score of 0.71, outperforming the trained models for other formatting styles. The BERT models have even lower recalls and F1 scores than the CNN models. Our analysis of failure cases indicates that the majority of the failure cases are missing identification (i.e., the model misses the content that is supposed to be highlighted) due to the models tend to learn the frequently highlighted words while struggling to learn less frequent words. Conclusion: Our findings suggest that it is possible to develop recommendation models for highlighting information for answers with different formatting styles on Stack Overflow.

翻译：语境：浏览Stack Overflow（SO）的知识依然具有挑战性。为使帖子更生动，SO允许用户使用Markdown或HTML编写和编辑帖子，从而利用各种格式样式（如粗体、斜体和代码）突出显示重要信息。然而，关于突出显示信息的研究仍然有限。目标：我们在最近的研究中首次对SO回答中突出显示的信息进行了大规模探索性研究。为扩展之前的研究，我们开发了利用最初为命名实体识别任务设计的神经网络架构自动推荐带有格式样式的突出显示内容的方法。方法：本文研究了31,169,429条Stack Overflow回答。为训练推荐模型，我们使用从SO回答中收集的信息突出显示数据集，针对每种格式类型（即粗体、斜体、代码和标题）选择了CNN和BERT模型。结果：基于CNN架构的模型实现了0.71至0.82的精确度。用于自动代码内容突出显示的训练模型召回率达0.73，F1分数达0.71，优于其他格式样式的训练模型。BERT模型的召回率和F1分数甚至低于CNN模型。对失败案例的分析表明，大多数失败案例是由于模型倾向于学习频繁出现的词汇而难以学习低频词汇导致的缺失识别（即模型遗漏了本应突出显示的内容）。结论：我们的发现表明，为Stack Overflow上不同格式样式的回答开发突出显示信息的推荐模型是可行的。