Studying and Recommending Information Highlighting in Stack Overflow Answers

Context: Navigating the knowledge of Stack Overflow (SO) remains challenging. To make the posts vivid to users, SO allows users to write and edit posts with Markdown or HTML so that users can leverage various formatting styles (e.g., bold, italic, and code) to highlight the important information. Nonetheless, there have been limited studies on the highlighted information. Objective: We carried out the first large-scale exploratory study on the information highlighted in SO answers in our recent study. To extend our previous study, we develop approaches to automatically recommend highlighted content with formatting styles using neural network architectures initially designed for the Named Entity Recognition task. Method: In this paper, we studied 31,169,429 answers of Stack Overflow. For training recommendation models, we choose CNN-based and BERT-based models for each type of formatting (i.e., Bold, Italic, Code, and Heading) using the information highlighting dataset we collected from SO answers. Results: Our models achieve a precision ranging from 0.50 to 0.72 for different formatting types. It is easier to build a model to recommend Code than other types. Models for text formatting types (i.e., Heading, Bold, and Italic) suffer low recall. Our analysis of failure cases indicates that the majority of the failure cases are due to missing identification. One explanation is that the models are easy to learn the frequent highlighted words while struggling to learn less frequent words (i.g., long-tail knowledge). Conclusion: Our findings suggest that it is possible to develop recommendation models for highlighting information for answers with different formatting styles on Stack Overflow.

翻译：背景：浏览Stack Overflow（SO）中的知识仍然具有挑战性。为使帖子对用户更直观，SO允许用户使用Markdown或HTML编写和编辑帖子，从而利用多种格式样式（如粗体、斜体和代码）来突出重要信息。然而，目前对高亮信息的研究十分有限。目标：在我们近期的研究中，首次对SO答案中的高亮信息进行了大规模探索性研究。为拓展前期工作，我们开发了基于最初为命名实体识别任务设计的神经网络架构的方法，来自动推荐带有格式样式的高亮内容。方法：本文研究了31,169,429条Stack Overflow答案。为训练推荐模型，我们利用从SO答案中收集的信息高亮数据集，针对每种格式类型（即粗体、斜体、代码和标题）分别选择基于CNN和基于BERT的模型。结果：对于不同格式类型，我们的模型实现了0.50至0.72的精确率。与其它类型相比，为代码构建推荐模型更容易。文本格式类型（即标题、粗体和斜体）的模型召回率较低。通过对失败案例的分析，我们发现大多数失败案例源于识别缺失。一种解释是，模型容易学习高频高亮词汇，但在处理低频词汇（即长尾知识）时表现欠佳。结论：研究结果表明，为Stack Overflow上不同格式样式的答案开发高亮信息推荐模型是可行的。