The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the "orthogonalization" or "normalization" of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method's effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.
翻译:黑箱算法的复杂性可能导致各种挑战,包括引入偏差。这些偏差在算法应用中会带来即时风险。例如,已有研究表明,神经网络能仅凭患者的X光片推断出种族信息,而这一任务超出了医学专家的能力范围。如果医学专家对此一无所知,基于该算法的自动决策便可能纯粹依据种族信息来开具治疗方案。尽管现有方法允许对神经网络进行针对此类信息的“正交化”或“归一化”处理,但现有方法均基于线性模型。我们的论文通过引入针对ReLU激活函数等非线性的修正,推动了这一领域的讨论发展。我们的方法还涵盖了标量和张量值预测,便于其集成到神经网络架构中。通过大量实验,我们验证了该方法在保护广义线性模型中敏感数据、归一化卷积神经网络中的元数据以及修正预训练嵌入中不期望属性方面的有效性。