Conflicts of interest often arise between data sources and their users regarding how the users' information needs should be interpreted by the data source. For example, an online product search might be biased towards presenting certain products higher than in its list of results to improve its revenue, which may not follow the user's desired ranking expressed in their query. The research community has proposed schemes for data systems to implement to ensure unbiased results. However, data systems and services usually have little or no incentive to implement these measures, e.g., these biases often increase their profits. In this paper, we propose a novel formal framework for querying in settings where the data source has incentives to return biased answers intentionally due to the conflict of interest between the user and the data source. We propose efficient algorithms to detect whether it is possible for users to extract relevant information from biased data sources. We propose methods to detect biased information in the results of a query efficiently. We also propose algorithms to reformulate input queries to increase the amount of relevant information in the returned results over biased data sources. Using experiments on real-world datasets, we show that our algorithms are efficient and return relevant information over large data.
翻译:数据源与用户之间常因如何解读用户信息需求而产生利益冲突。例如,在线商品搜索可能为提高收益而偏向于将特定商品置于结果列表前列,这种排序方式可能偏离用户在查询中表达的理想排序。研究界已提出多种方案供数据系统实施以确保结果公正性,但数据系统与服务通常缺乏实施这些措施的动力(例如,此类偏差常能提升其利润)。本文提出一种新颖的形式化框架,用于处理因用户与数据源间利益冲突导致数据源有意返回偏差结果的查询场景。我们提出高效算法以检测用户是否可能从有偏数据源中提取相关信息,并建立方法以有效识别查询结果中的偏差信息。同时,我们设计算法对输入查询进行重构,以提升从有偏数据源返回结果中的相关信息量。通过真实世界数据集的实验验证,我们的算法具有高效性,并能从大规模数据中返回相关信息。