Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of cross-group text classification in order to understand how well classifiers trained on one domain or demographic can transfer to others, with a view to building more generalisable abuse classifiers. We fine-tune language models to classify tweets targeted at public figures across DOmains (sport and politics) and DemOgraphics (women and men) using our novel DODO dataset, containing 28,000 labelled entries, split equally across four domain-demographic pairs. We find that (i) small amounts of diverse data are hugely beneficial to generalisation and model adaptation; (ii) models transfer more easily across demographics but models trained on cross-domain data are more generalisable; (iii) some groups contribute more to generalisability than others; and (iv) dataset similarity is a signal of transferability.
翻译:公众人物在社交媒体上遭受不成比例的辱骂,这影响了他们积极参与公共生活。自动化系统能够大规模识别辱骂,但标注训练数据成本高昂、复杂且可能带来危害。因此,系统需要具备高效性和泛化能力,以处理在线辱骂的共性和特定方面。我们探索跨群体文本分类的动态机制,旨在理解在一个领域或人口统计群体上训练的分类器如何迁移到其他领域或群体,从而构建更泛化的辱骂分类器。我们使用自创的DODO数据集(包含28,000条标注条目,均匀分布在四个领域-人口统计对中),微调语言模型来分类针对公众人物的推特,涉及不同领域(体育和政治)和人口统计群体(女性和男性)。研究发现:(i)少量多样化数据对泛化和模型适配极为有益;(ii)模型更易跨人口统计群体迁移,但跨领域数据训练的模型更具泛化性;(iii)某些群体对泛化能力的贡献大于其他群体;(iv)数据集相似性是迁移能力的信号。