Understanding writing style in social media with a supervised contrastively pre-trained transformer

Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation. Malicious actors now have unprecedented freedom to misbehave, leading to severe societal unrest and dire consequences, as exemplified by events such as the Capitol assault during the US presidential election and the Antivaxx movement during the COVID-19 pandemic. Understanding online language has become more pressing than ever. While existing works predominantly focus on content analysis, we aim to shift the focus towards understanding harmful behaviors by relating content to their respective authors. Numerous novel approaches attempt to learn the stylistic features of authors in texts, but many of these approaches are constrained by small datasets or sub-optimal training losses. To overcome these limitations, we introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 10^6 authored texts involving 70k heterogeneous authors. Our model leverages Supervised Contrastive Loss to teach the model to minimize the distance between texts authored by the same individual. This author pretext pre-training task yields competitive performance at zero-shot with PAN challenges on attribution and clustering. Additionally, we attain promising results on PAN verification challenges using a single dense layer, with our model serving as an embedding encoder. Finally, we present results from our test partition on Reddit. Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80\% accuracy. We share our pre-trained model at huggingface (https://huggingface.co/AIDA-UPM/star) and our code is available at (https://github.com/jahuerta92/star)

翻译：在线社交网络为有害行为提供了肥沃土壤，从仇恨言论到虚假信息传播不一而足。恶意行为者如今拥有前所未有的自由实施不当行为，导致严重的社会动荡和灾难性后果，美国大选期间的国会山袭击事件和新冠疫情期间的"反疫苗"运动便是例证。理解在线语言已变得比以往任何时候都更为紧迫。现有研究主要聚焦于内容分析，而本文旨在通过将内容与其作者相关联，将研究重心转向理解有害行为。尽管众多新方法尝试学习文本中作者的风格特征，但其中许多方法受限于小规模数据集或次优的训练损失。为克服这些局限，我们提出了面向作者表征的风格Transformer（STAR），该模型在包含4.5×10^6篇作者文本（涵盖7万名异构作者）的公共来源大规模语料库上进行训练。我们的模型采用监督对比损失，通过训练使模型最小化同一作者所创作文本之间的距离。这种作者预训练任务在PAN赛道的零样本归因与聚类挑战中展现出具有竞争力的性能。此外，在PAN验证挑战中，我们仅使用单个稠密层便获得了令人振奋的结果，其中我们的模型作为嵌入编码器发挥作用。最后，我们展示了在Reddit测试集上的实验结果：基于8个文档（每文档512个标记）的支持集，我们能够从多达1616位作者构成的作者集合中实现至少80%的准确率作者识别。我们已在huggingface平台（https://huggingface.co/AIDA-UPM/star）共享预训练模型，代码也已开源（https://github.com/jahuerta92/star）。