Automated Identification of Sexual Orientation and Gender Identity Discriminatory Texts from Issue Comments

In an industry dominated by straight men, many developers representing other gender identities and sexual orientations often encounter hateful or discriminatory messages. Such communications pose barriers to participation for women and LGBTQ+ persons. Due to sheer volume, manual inspection of all communications for discriminatory communication is infeasible for a large-scale Free Open-Source Software (FLOSS) community. To address this challenge, this study aims to develop an automated mechanism to identify Sexual orientation and Gender identity Discriminatory (SGID) texts from software developers' communications. On this goal, we trained and evaluated SGID4SE ( Sexual orientation and Gender Identity Discriminatory text identification for (4) Software Engineering texts) as a supervised learning-based SGID detection tool. SGID4SE incorporates six preprocessing steps and ten state-of-the-art algorithms. SGID4SE implements six different strategies to improve the performance of the minority class. We empirically evaluated each strategy and identified an optimum configuration for each algorithm. In our ten-fold cross-validation-based evaluations, a BERT-based model boosts the best performance with 85.9% precision, 80.0% recall, and 82.9% F1-Score for the SGID class. This model achieves 95.7% accuracy and 80.4% Matthews Correlation Coefficient. Our dataset and tool establish a foundation for further research in this direction.

翻译：在由直男主导的行业中，许多代表其他性别认同与性取向的开发者常遭遇仇恨或歧视性信息。此类沟通对女性和LGBTQ+群体的参与构成障碍。由于信息量庞大，手动检查大规模自由开源软件社区中的所有沟通内容以识别歧视性信息并不可行。针对这一挑战，本研究旨在开发一种自动化机制，从软件开发者的沟通文本中识别性取向与性别认同歧视性文本。为此，我们训练并评估了SGID4SE（面向软件工程文本的性取向与性别认同歧视文本识别工具），这是一种基于监督学习的SGID检测工具。SGID4SE集成了六项预处理步骤与十种前沿算法，并实施了六种不同策略以提升少数类别的检测性能。我们通过实验评估了每种策略，并为每种算法确定了最优配置。在基于十折交叉验证的评估中，基于BERT的模型在SGID类别上取得了最优性能：精确率85.9%、召回率80.0%、F1分数82.9%。该模型准确率达95.7%，马修斯相关系数为80.4%。我们的数据集与工具为该方向的后续研究奠定了基础。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日