Social media posts may go viral and reach large numbers of people within a short period of time. Such posts may threaten the public dialogue if they contain misleading content, making their early detection highly crucial. Previous works proposed their own metrics to annotate if a tweet is viral or not in order to automatically detect them later. However, such metrics may not accurately represent viral tweets or may introduce too many false positives. In this work, we use the ground truth data provided by Twitter's "Viral Tweets" topic to review the current metrics and also propose our own metric. We find that a tweet is more likely to be classified as viral by Twitter if the ratio of retweets to its author's followers exceeds some threshold. We found this threshold to be 2.16 in our experiments. This rule results in less false positives although it favors smaller accounts. We also propose a transformers-based model to early detect viral tweets which reports an F1 score of 0.79. The code and the tweet ids are publicly available at: https://github.com/tugrulz/ViralTweets
翻译:社交媒体帖子可能在短时间内迅速传播并触达大规模用户。若此类帖子包含误导性内容,可能威胁公共对话,因此早期检测至关重要。已有研究提出了各自用于标注推文是否具有病毒性的指标,以便后续自动检测。然而,这类指标可能无法准确反映病毒推文的特征,或导致过多误报。本研究利用推特“病毒推文”话题提供的真实标注数据,对现有指标进行重新评估,并提出了自己的指标。我们发现,当推文的转发数与其作者粉丝数的比值超过某个阈值时,该推文更可能被推特分类为病毒式推文。实验中确定的该阈值为2.16。该规则虽更倾向于小规模账号,但能有效降低误报率。此外,我们提出了一种基于Transformer的模型用于早期检测病毒推文,其F1分数达到0.79。相关代码及推文ID已公开于:https://github.com/tugrulz/ViralTweets