The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.
翻译:档案查询日志(AQL)是一个以往未被使用过的、全面的查询日志,由互联网档案馆在过去25年间收集而成。其首个版本收录了来自550个搜索提供商的3.56亿条查询、1.66亿个搜索结果页面以及17亿个搜索结果。尽管文献中已有大量查询日志的研究,但拥有这些日志的搜索提供商通常因保护用户隐私和重要商业数据而不公开其日志。在少数公开可用的查询日志中,没有哪一个能兼具规模、覆盖范围与多样性。AQL是首个实现这一点的日志,为新型检索模型研究以及(历时性)搜索引擎分析提供了可能。该日志以隐私保护的方式提供,有助于推动开放研究,并促进搜索行业更加透明和负责。