Reinforcing or even exacerbating societal biases and inequalities will increase significantly as generative AI increasingly produces useful artifacts, from text to images and beyond, for the real world. We address these issues by formally characterizing the notion of fairness for generative AI as a basis for monitoring and enforcing fairness. We define two levels of fairness using the notion of infinite sequences of abstractions of AI-generated artifacts such as text or images. The first is the fairness demonstrated on the generated sequences, which is evaluated only on the outputs while agnostic to the prompts and models used. The second is the inherent fairness of the generative AI model, which requires that fairness be manifested when input prompts are neutral, that is, they do not explicitly instruct the generative AI to produce a particular type of output. We also study relative intersectional fairness to counteract the combinatorial explosion of fairness when considering multiple categories together with lazy fairness enforcement. Finally, fairness monitoring and enforcement are tested against some current generative AI models.
翻译:随着生成式AI日益产生从文本到图像等现实世界中有用的产物,强化甚至加剧社会偏见与不平等的问题将显著增加。我们通过形式化刻画生成式AI公平性概念来应对这些问题,以此作为监控与实施公平性的基础。利用AI生成产物(如文本或图像)的无限抽象序列概念,我们定义了两种公平性层级:其一是生成序列所展现的公平性——仅基于输出结果评估,与提示词及所用模型无关;其二是生成式AI模型的内在公平性——要求在输入提示词保持中立(即未明确指令生成特定类型输出)时,模型也能体现公平性。我们还研究了相对交叉公平性,以应对多类别同时考量时公平性面临的组合爆炸问题,并提出了惰性公平性实施策略。最终,我们对当前部分生成式AI模型进行了公平性监控与实施测试。