Break Out of a Pigeonhole: A Unified Framework for Examining Miscalibration, Bias, and Stereotype in Recommender Systems

Despite the benefits of personalizing items and information tailored to users' needs, it has been found that recommender systems tend to introduce biases that favor popular items or certain categories of items, and dominant user groups. In this study, we aim to characterize the systematic errors of a recommendation system and how they manifest in various accountability issues, such as stereotypes, biases, and miscalibration. We propose a unified framework that distinguishes the sources of prediction errors into a set of key measures that quantify the various types of system-induced effects, both at the individual and collective levels. Based on our measuring framework, we examine the most widely adopted algorithms in the context of movie recommendation. Our research reveals three important findings: (1) Differences between algorithms: recommendations generated by simpler algorithms tend to be more stereotypical but less biased than those generated by more complex algorithms. (2) Disparate impact on groups and individuals: system-induced biases and stereotypes have a disproportionate effect on atypical users and minority groups (e.g., women and older users). (3) Mitigation opportunity: using structural equation modeling, we identify the interactions between user characteristics (typicality and diversity), system-induced effects, and miscalibration. We further investigate the possibility of mitigating system-induced effects by oversampling underrepresented groups and individuals, which was found to be effective in reducing stereotypes and improving recommendation quality. Our research is the first systematic examination of not only system-induced effects and miscalibration but also the stereotyping issue in recommender systems.

翻译：尽管推荐系统能够根据用户需求实现个性化内容推送，但研究发现其往往偏好流行物品、特定类别物品及主流用户群体，从而引入系统性偏差。本研究旨在刻画推荐系统的系统性错误及其在刻板印象、偏误和校准偏差等问责性问题中的表现。我们提出一个统一框架，将预测错误的来源区分为一组关键指标，用以量化系统在个体与集体层面产生的各类效应。基于该测量框架，我们以电影推荐为场景，对最广泛使用的算法进行了实证分析。研究揭示三项重要发现：（1）算法差异性：相较于复杂算法，简单算法生成的推荐更具刻板印象但偏误程度更低；（2）群体与个体差异化影响：系统引发的偏误与刻板印象对非典型用户及少数群体（如女性和年长用户）造成不成比例的影响；（3）缓解可能性：通过结构方程建模，我们识别出用户特征（典型性与多样性）、系统效应与校准偏差之间的交互关系。进一步研究发现，对欠代表性群体和个体进行过采样可有效降低刻板印象并提升推荐质量。本研究首次系统性地考察了推荐系统中的系统效应、校准偏差及刻板印象问题。