Empirical investigations into unintended model behavior often show that the algorithm is predicting another outcome than what was intended. These exposes highlight the need to identify when algorithms predict unintended quantities - ideally before deploying them into consequential settings. We propose a falsification framework that provides a principled statistical test for discriminant validity: the requirement that an algorithm predict intended outcomes better than impermissible ones. Drawing on falsification practices from causal inference, econometrics, and psychometrics, our framework compares calibrated prediction losses across outcomes to assess whether the algorithm exhibits discriminant validity with respect to a specified impermissible proxy. In settings where the target outcome is difficult to observe, multiple permissible proxy outcomes may be available; our framework accommodates both this setting and the case with a single permissible proxy. Throughout we use nonparametric hypothesis testing methods that make minimal assumptions on the data-generating process. We illustrate the method in an admissions setting, where the framework establishes discriminant validity with respect to gender but fails to establish discriminant validity with respect to race. This demonstrates how falsification can serve as an early validity check, prior to fairness or robustness analyses. We also provide analysis in a criminal justice setting, where we highlight the limitations of our framework and emphasize the need for complementary approaches to assess other aspects of construct validity and external validity.
翻译:对模型意外行为的实证研究常表明,算法预测的结果与预期目标不符。这些现象凸显了在算法部署至关键场景前,识别其何时预测了非预期量的必要性。我们提出一个证伪框架,为区分效度提供原则性的统计检验:该框架要求算法对预期结果的预测应优于对不允可结果的预测。借鉴来自因果推断、计量经济学和心理测量学的证伪实践,我们的框架通过比较跨结果的校准预测损失,以评估算法相对于特定不允可代理变量是否展现区分效度。在目标结果难以观测的场景中,可能存在多个允可的代理结果;我们的框架既适用于此类多代理场景,也适用于单一允可代理的情形。整个过程中,我们采用对数据生成过程假设极少的非参数假设检验方法。我们在招生录取场景中演示了该方法,其中框架对性别维度建立了区分效度,但未能对种族维度建立区分效度。这证明了证伪如何能在进行公平性或鲁棒性分析之前,作为有效的早期效度检验手段。我们还在刑事司法场景中进行了分析,重点阐述了本框架的局限性,并强调需要采用互补性方法来评估构念效度与外部效度的其他维度。