Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of cross-group text classification in order to understand how well classifiers trained on one domain or demographic can transfer to others, with a view to building more generalisable abuse classifiers. We fine-tune language models to classify tweets targeted at public figures across DOmains (sport and politics) and DemOgraphics (women and men) using our novel DODO dataset, containing 28,000 labelled entries, split equally across four domain-demographic pairs. We find that (i) small amounts of diverse data are hugely beneficial to generalisation and model adaptation; (ii) models transfer more easily across demographics but models trained on cross-domain data are more generalisable; (iii) some groups contribute more to generalisability than others; and (iv) dataset similarity is a signal of transferability.
翻译:公众人物在社交媒体上遭受不成比例的恶意攻击,这影响了他们积极参与公共生活。自动化系统可以在大规模范围内识别恶意行为,但标注训练数据成本高昂、流程复杂且可能带来危害。因此,理想系统应具备高效性和泛化能力,能够处理在线恶意攻击中既共享又独特的方面。我们深入探究跨群体文本分类的动力学机制,以理解基于某一领域或人口群体训练的恶意分类器向其他群体的迁移表现,旨在构建更具泛化性的分类模型。我们基于自建的多多数据集(包含28000条标注样本,在四个领域-人口配对组间平均分配),对面向公众人物的推文进行语言模型微调,涉及不同领域(体育与政治)和人口群体(女性和男性)。研究发现:(i)少量多样化数据对模型泛化与自适应具有显著促进作用;(ii)模型更易实现跨人口群体迁移,但基于跨领域数据训练的模型泛化性更强;(iii)不同群体对泛化能力的贡献存在差异;(iv)数据集相似性是预测可迁移性的有效信号。