The risk of reinforcing or exacerbating societal biases and inequalities is growing as generative AI increasingly produces content that resembles human output, from text to images and beyond. Here we formally characterize the notion of fairness for generative AI as a basis for monitoring and enforcing fairness. We define two levels of fairness utilizing the concept of infinite words. The first is the fairness demonstrated on the generated sequences, which is only evaluated on the outputs while agnostic to the prompts/models used. The second is the inherent fairness of the generative AI model, which requires that fairness be manifested when input prompts are neutral, that is, they do not explicitly instruct the generative AI to produce a particular type of output. We also study relative intersectional fairness to counteract the combinatorial explosion of fairness when considering multiple categories together with lazy fairness enforcement. Our implemented specification monitoring and enforcement tool shows interesting results when tested against several generative AI models.
翻译:随着生成式人工智能日益生成与人类产出难以区分的文本、图像等内容,其强化或加剧社会偏见与不公平的风险与日俱增。本文从形式化角度刻画生成式人工智能的公平性概念,以此作为监测与实施公平性的基础。我们利用无限词概念定义了两个层次的公平性:第一层是生成序列所展现的公平性,仅对输出进行评估,而不考虑所使用的提示词或模型;第二层是生成式人工智能模型的内在公平性,要求当输入提示为中性(即未明确指示生成特定类型输出)时,公平性得以体现。此外,为应对多类别交叉组合导致的公平性组合爆炸问题,我们研究了相对交叉公平性及惰性公平实施策略。我们实现的规范监测与实施工具在多个生成式人工智能模型上的测试结果展示了令人关注的发现。