Harvard Eye Fairness: A Large-Scale 3D Imaging Dataset for Equitable Eye Diseases Screening and Fair Identity Scaling

Fairness or equity in machine learning is profoundly important for societal well-being, but limited public datasets hinder its progress, especially in the area of medicine. It is undeniable that fairness in medicine is one of the most important areas for fairness learning's applications. Currently, no large-scale public medical datasets with 3D imaging data for fairness learning are available, while 3D imaging data in modern clinics are standard tests for disease diagnosis. In addition, existing medical fairness datasets are actually repurposed datasets, and therefore they typically have limited demographic identity attributes with at most three identity attributes of age, gender, and race for fairness modeling. To address this gap, we introduce our Eye Fairness dataset with 30,000 subjects (Harvard-EF) covering three major eye diseases including age-related macular degeneration, diabetic retinopathy, and glaucoma affecting 380 million patients globally. Our Harvard-EF dataset includes both 2D fundus photos and 3D optical coherence tomography scans with six demographic identity attributes including age, gender, race, ethnicity, preferred language, and marital status. We also propose a fair identity scaling (FIS) approach combining group and individual scaling together to improve model fairness. Our FIS approach is compared with various state-of-the-art fairness learning methods with superior performance in the racial, gender, and ethnicity fairness tasks with 2D and 3D imaging data, which demonstrate the utilities of our Harvard-EF dataset for fairness learning. To facilitate fairness comparisons between different models, we propose performance-scaled disparity measures, which can be used to compare model fairness accounting for overall performance levels. The dataset and code are publicly accessible via https://ophai.hms.harvard.edu/datasets/harvard-ef30k.

翻译：机器学习中的公平性或平等性对社会福祉至关重要，但有限的数据集（尤其在医学领域）阻碍了其进展。不可否认，医学公平性是公平学习最重要的应用领域之一。目前尚无用于公平学习的大规模公开三维影像医疗数据集，而现代临床中的三维影像数据已是疾病诊断的标准检查手段。此外，现有医疗公平数据集实际上是对原有数据集的改造利用，通常仅具有有限的群体身份属性（最多包含年龄、性别和种族三种属性）用于公平性建模。为解决这一缺口，我们推出了包含30,000名受试者的眼病公平数据集（Harvard-EF），涵盖影响全球3.8亿患者的三大主要眼病：年龄相关性黄斑变性、糖尿病视网膜病变和青光眼。该数据集既包含二维眼底照片，也包含三维光学相干断层扫描影像，并提供了六种群体身份属性（年龄、性别、种族、民族、偏好语言和婚姻状况）。我们还提出一种公平身份缩放方法（FIS），通过结合群体缩放与个体缩放来提升模型公平性。我们将FIS方法与多种前沿公平学习方法进行比较，在二维和三维影像数据的种族、性别及民族公平性任务中展现出优越性能，验证了Harvard-EF数据集在公平学习中的实用价值。为促进不同模型间的公平性比较，我们提出性能缩放差异度量指标，可在考虑整体性能水平的前提下比较模型公平性。数据集与代码可通过https://ophai.hms.harvard.edu/datasets/harvard-ef30k公开获取。