Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of cross-group text classification in order to understand how well classifiers trained on one domain or demographic can transfer to others, with a view to building more generalisable abuse classifiers. We fine-tune language models to classify tweets targeted at public figures across DOmains (sport and politics) and DemOgraphics (women and men) using our novel DODO dataset, containing 28,000 labelled entries, split equally across four domain-demographic pairs. We find that (i) small amounts of diverse data are hugely beneficial to generalisation and model adaptation; (ii) models transfer more easily across demographics but models trained on cross-domain data are more generalisable; (iii) some groups contribute more to generalisability than others; and (iv) dataset similarity is a signal of transferability.
翻译:公众人物在社交媒体上遭受不成比例的滥用攻击,严重影响其参与公共生活的积极性。自动化系统可大规模识别此类攻击,但训练数据的标注成本高昂、过程复杂且存在潜在危害。因此,系统需具备高效性与泛化能力,既能处理网络攻击的共同特征,又能应对特定场景的差异。我们探索跨群体文本分类的动态机制,以研究基于特定领域或人口统计群体训练的模型向其他群体迁移的效果,旨在构建更具泛化性的攻击分类器。本研究通过自建的DODO数据集(包含28,000条标注条目,均匀分布于四个领域-人口统计组合),对语言模型进行微调,用于分类针对公众人物的推特内容。该数据覆盖双领域(体育与政治)和双人口统计群体(女性与男性)。研究发现:(i)少量多样性数据即可显著提升模型泛化能力与适应性;(ii)模型在跨人口统计群体间的迁移更易实现,但跨领域训练的模型泛化性更强;(iii)不同群体对模型泛化性的贡献存在差异;(iv)数据集相似度是迁移能力的有效表征信号。