Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
翻译:利用在线社交媒体建模和预测真实人类行为,是政界、政府、学术界和工业界共同关注的活跃研究领域。自2006年创立以来,Twitter被提议作为潜在的研究平台,可用于衡量和预测社会行为。过去十年间,Twitter用户基数持续增长,其代表性逐渐趋近整体人口。本研究以2021年墨西哥立法选举为背景分析该用户群体,采用选举日前六个月内产生的1500万条选举相关推文数据集。我们探索了将政治倾向分配给执政党或反对党的不同选举模型,发现利用地理属性数据的模型在预测选举结果时,其精确度和准确度均优于传统民调方法。这些结果表明,公开网络数据分析可超越传统民调方法,政治分析及一般性预测在近期内若整合此类数据将获益良多。此外,同一含地理属性的Twitter数据集与墨西哥官方人口普查数据中的网络使用状况呈现正相关。这些发现表明,当前阶段经适当筛选的线上活动已能准确反映线下行为特征。