To prevent the spread of disinformation on Instagram, we need to study the accounts and content of disinformation actors. However, due to their malicious nature, Instagram often bans accounts that are responsible for spreading disinformation, making these accounts inaccessible from the live web. The only way we can study the content of banned accounts is through public web archives such as the Internet Archive. However, there are many issues present with archiving Instagram pages. Specifically, we focused on the issue that many Wayback Machine Instagram mementos redirect to the Instagram login page. In this study, we determined that mementos of Instagram account pages on the Wayback Machine began redirecting to the Instagram login page in August 2019. We also found that Instagram mementos on Archive.today, Arquivo.pt, and Perma.cc are also not well archived in terms of quantity and quality. Moreover, we were unsuccessful in all our attempts to archive Katy Perry's Instagram account page on Archive.today, Arquivo.pt, and Conifer. Although in the minority, replayable Instagram mementos exist in public archives and contain valuable data for studying disinformation on Instagram. With that in mind, we developed a Python script to web scrape Instagram mementos. As of August 2023, the Python script can scrape Wayback Machine archives of Instagram account pages between November 7, 2012 and June 8, 2018.
翻译:为防止Instagram上虚假信息的传播,我们需要研究虚假信息传播者的账户和内容。然而,由于其恶意性质,Instagram通常会封禁传播虚假信息的账户,导致这些账户在实时网络上无法访问。我们研究被封禁账户内容的唯一途径是通过公共网络存档,例如互联网档案馆。但在存档Instagram页面时存在诸多问题,我们重点关注了许多Wayback Machine中Instagram存档会重定向至Instagram登录页面的问题。本研究发现,Wayback Machine上的Instagram账户页面存档自2019年8月起开始重定向至Instagram登录页面。我们还发现Archive.today、Arquivo.pt和Perma.cc上的Instagram存档在数量和质量方面同样不理想。此外,我们在Archive.today、Arquivo.pt和Conifer上尝试存档凯蒂·佩里的Instagram账户页面的所有努力均告失败。尽管数量较少,但公共存档中仍存在可重放的Instagram存档,其中包含研究Instagram虚假信息的宝贵数据。基于此,我们开发了一个用于抓取Instagram存档的Python脚本。截至2023年8月,该Python脚本可抓取2012年11月7日至2018年6月8日期间Wayback Machine中Instagram账户页面的存档。