The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.
翻译:存档查询日志(AQL)是一个此前未被开发利用的综合性查询日志,由互联网档案馆在过去25年间收集。其首个版本包含来自550个搜索引擎的3.56亿条查询、1.66亿个搜索结果页面及17亿条搜索结果。尽管文献中已研究过众多查询日志,但这些日志所属的搜索引擎提供商通常不会公开发布,以保护用户隐私和关键商业数据。在少数公开可用的查询日志中,没有一份能兼具规模、覆盖面与多样性。AQL是首个达到此标准的数据集,为新型检索模型研究和(历时性)搜索引擎分析提供了可能。该日志以隐私保护方式提供,既促进了开放式研究,也推动了搜索行业在透明性与可问责性方面的提升。