With rapid development and deployment of generative language models in global settings, there is an urgent need to also scale our measurements of harm, not just in the number and types of harms covered, but also how well they account for local cultural contexts, including marginalized identities and the social biases experienced by them. Current evaluation paradigms are limited in their abilities to address this, as they are not representative of diverse, locally situated but global, socio-cultural perspectives. It is imperative that our evaluation resources are enhanced and calibrated by including people and experiences from different cultures and societies worldwide, in order to prevent gross underestimations or skews in measurements of harm. In this work, we demonstrate a socio-culturally aware expansion of evaluation resources in the Indian societal context, specifically for the harm of stereotyping. We devise a community engaged effort to build a resource which contains stereotypes for axes of disparity that are uniquely present in India. The resultant resource increases the number of stereotypes known for and in the Indian context by over 1000 stereotypes across many unique identities. We also demonstrate the utility and effectiveness of such expanded resources for evaluations of language models. CONTENT WARNING: This paper contains examples of stereotypes that may be offensive.
翻译:随着生成式语言模型在全球范围内的快速开发与部署,我们迫切需要同步扩大对危害的评估维度——不仅涵盖危害类型与数量的扩展,更要考量其如何充分反映地方文化语境,包括边缘化身份群体及其面临的社会偏见。当前的评估范式在应对这一需求时存在显著局限,因其无法代表多元、立足地方却具有全球意义的社会文化视角。为预防对危害的严重低估或测量偏差,必须通过吸纳全球不同文化和社会群体的经验来增强与校准评估资源。本研究以印度社会语境为案例,展示了社会文化意识驱动的评估资源扩展实践,聚焦刻板印象危害领域。我们设计了一项社区参与式工作,构建了包含印度特有差异轴心刻板印象的资源库。最终资源为印度语境新增超过1000条刻板印象,覆盖众多独特身份群体。研究还验证了此类扩展资源在语言模型评估中的实用性与有效性。内容警告:本文包含可能引发不适的刻板印象示例。