With the remarkable capability to reach the public instantly, social media has become integral in sharing scholarly articles to measure public response. Since spamming by bots on social media can steer the conversation and present a false public interest in given research, affecting policies impacting the public's lives in the real world, this topic warrants critical study and attention. We used the Altmetric dataset in combination with data collected through the Twitter Application Programming Interface (API) and the Botometer API. We combined the data into an extensive dataset with academic articles, several features from the article and a label indicating whether the article had excessive bot activity on Twitter or not. We analyzed the data to see the possibility of bot activity based on different characteristics of the article. We also trained machine-learning models using this dataset to identify possible bot activity in any given article. Our machine-learning models were capable of identifying possible bot activity in any academic article with an accuracy of 0.70. We also found that articles related to "Health and Human Science" are more prone to bot activity compared to other research areas. Without arguing the maliciousness of the bot activity, our work presents a tool to identify the presence of bot activity in the dissemination of an academic article and creates a baseline for future research in this direction.
翻译:社交媒体凭借其即时触达公众的卓越能力,已成为分享学术文章以衡量公众反应的重要平台。由于社交媒体上的机器人水军可能引导舆论,制造对特定研究的虚假公众兴趣,进而影响现实世界中关乎公众生活的政策,这一议题值得深入研究和关注。我们结合Altmetric数据集、通过Twitter应用程序接口(API)和Botometer API收集的数据,构建了一个包含学术文章、文章多项特征以及标注文章在Twitter上是否存在异常机器人活动的综合数据集。我们通过数据分析探究了文章不同特征与机器人活动可能性之间的关系,并基于该数据集训练了机器学习模型,以识别任意给定文章中可能存在的机器人活动。我们的机器学习模型能够以0.70的准确率识别学术文章中潜在的机器人活动。此外,我们发现与其他研究领域相比,涉及“健康与人文科学”的文章更容易受到机器人活动的影响。本研究不讨论机器人活动的恶意性质,而是提供了一种识别学术文章传播过程中机器人活动的工具,并为该方向的未来研究建立了基线。