Labeled Datasets for Research on Information Operations

Social media platforms have become a hub for political activities and discussions, democratizing participation in these endeavors. However, they have also become an incubator for manipulation campaigns, like information operations (IOs). Some social media platforms have released datasets related to such IOs originating from different countries. However, we lack comprehensive control data that can enable the development of IO detection methods. To bridge this gap, we present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data). The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries. By comparing these coordinated accounts against organic ones, researchers can develop and benchmark IO detection algorithms.

翻译：社交媒体平台已成为政治活动与讨论的中心，促进了公众参与的民主化。然而，这些平台也演变为操纵性活动（如信息操作）的温床。部分社交媒体平台已发布涉及源自不同国家的此类信息操作的相关数据集。然而，我们仍缺乏能够支持信息操作检测方法开发的全面对照数据。为弥补这一空白，我们提出了涵盖26项活动的新标注数据集，其中既包含经社交媒体平台验证的信息操作帖文，也收录了30.3万个账户在同一时间段内讨论相关主题的超过1300万条帖文（对照数据）。这些数据集将有助于研究不同活动与国家间协同账户所采用的叙事策略、网络互动及参与机制。通过将这些协同账户与自然账户进行对比，研究人员能够开发并评估信息操作检测算法。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日