This publication describes the motivation and generation of $Q_{bias}$, a large dataset of Google and Bing search queries, a scraping tool and dataset for biased news articles, as well as language models for the investigation of bias in online search. Web search engines are a major factor and trusted source in information search, especially in the political domain. However, biased information can influence opinion formation and lead to biased opinions. To interact with search engines, users formulate search queries and interact with search query suggestions provided by the search engines. A lack of datasets on search queries inhibits research on the subject. We use $Q_{bias}$ to evaluate different approaches to fine-tuning transformer-based language models with the goal of producing models capable of biasing text with left and right political stance. Additionally to this work we provided datasets and language models for biasing texts that allow further research on bias in online information search.
翻译:本文阐述了$Q_{bias}$的动机与构建过程,该数据集包含Google和Bing搜索引擎的大量搜索查询,同时提供了用于偏见新闻文章的数据采集工具和数据集,以及用于研究在线搜索中偏见的语言模型。网络搜索引擎是信息搜索中的关键因素和可信来源,尤其在政治领域。然而,偏见信息可能影响观点形成并导致偏颇意见。用户通过制定搜索查询并与搜索引擎提供的查询建议交互来使用搜索引擎。搜索查询数据集的匮乏制约了该领域的研究。我们利用$Q_{bias}$评估了基于Transformer的语言模型微调的不同方法,旨在生成能够将文本偏向左翼和右翼政治立场的模型。除本研究外,我们还提供了用于文本偏向的数据集和语言模型,以支持对在线信息搜索中偏见的进一步研究。