Recent decisions to discontinue access to social media APIs are having detrimental effects on Internet research and the field of computational social science as a whole. This lack of access to data has been dubbed the Post-API era of Internet research. Fortunately, popular search engines have the means to crawl, capture, and surface social media data on their Search Engine Results Pages (SERP) if provided the proper search query, and may provide a solution to this dilemma. In the present work we ask: does SERP provide a complete and unbiased sample of social media data? Is SERP a viable alternative to direct API-access? To answer these questions, we perform a comparative analysis between (Google) SERP results and nonsampled data from Reddit and Twitter/X. We find that SERP results are highly biased in favor of popular posts; against political, pornographic, and vulgar posts; are more positive in their sentiment; and have large topical gaps. Overall, we conclude that SERP is not a viable alternative to social media API access.
翻译:近期停止访问社交媒体API的决定对互联网研究及整个计算社会科学领域造成了不利影响。这种数据访问的缺失被称为互联网研究的后API时代。幸运的是,主流搜索引擎具备爬取、捕获并在搜索结果页(SERP)上呈现社交媒体数据的能力——只要提供恰当的搜索查询语句——这或许能为当前困境提供解决方案。本研究提出以下问题:SERP能否提供完整无偏的社交媒体数据样本?SERP能否成为直接API访问的可行替代方案?为解答这些问题,我们对(谷歌)SERP结果与Reddit及Twitter/X平台上的未采样数据进行了比较分析。研究发现:SERP结果显著偏向高热度帖子;对政治、色情及粗俗内容存在抑制;情感倾向更为积极;且存在显著的主题覆盖缺失。总体而言,我们得出结论:SERP无法成为社交媒体API访问的有效替代方案。