Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (that is, mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets. Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications such as those modifying the design of a downstream model. The codebase for our project is available at https://github.com/healthylaife/FairSynth
翻译:在确保医疗健康应用人工智能工具负责任设计的诸多方面中,解决公平性问题一直是关键焦点。具体而言,鉴于电子健康记录数据的广泛普及及其在支持广泛临床决策支持任务方面的巨大潜力,提升此类健康人工智能工具的公平性至关重要。尽管已通过多种方法应对这一广泛问题(即缓解基于电子健康记录的人工智能模型中的公平性问题),但与具体任务和模型无关的方法明显稀缺。在本研究中,我们旨在填补这一空白,提出一种新的流程来生成合成电子健康记录数据。该流程生成的合成数据不仅与真实电子健康记录数据保持一致(忠实于原数据),而且在与真实数据结合使用时,能够减少下游任务中由最终用户定义的公平性顾虑。我们在多种下游任务和两个不同的电子健康记录数据集上验证了所提出流程的有效性。我们提出的流程能够为现有的健康人工智能应用公平性处理方法工具箱增添一种广泛适用且具有互补性的工具,例如那些修改下游模型设计的方法。本项目的代码库可在 https://github.com/healthylaife/FairSynth 获取。