Unequal representation across cultures and socioeconomic groups in AI is a significant and challenging problem, often leading to uneven model performance. As a step toward addressing this issue, we formulate translated non-English, geographic, and socioeconomic integrated prompts and evaluate their impact on VL model performance for data from different countries and income groups. Our findings show that geographic and socioeconomic integrated prompts improve VL performance on lower-income data and favor the retrieval of topic appearances commonly found in data from low-income households. From our analyses, we identify and highlight contexts where these strategies yield the most improvements. Our model analysis code is publicly available at https://github.com/Anniejoan/Uplifting-Lower-income-data .
翻译:人工智能在不同文化和社会经济群体中的不平等表征是一个重要且具有挑战性的问题,常导致模型性能不均衡。为解决此问题,我们构建了翻译后的非英语、地理及社会经济综合提示,并评估其对来自不同国家和收入群体数据的视觉语言模型性能的影响。我们的研究结果表明,地理和社会经济综合提示能提升视觉语言模型在低收入数据上的性能,并有利于检索低收入家庭数据中常见的主题呈现。通过分析,我们识别并强调了这些策略能带来最大改进的上下文情境。我们的模型分析代码已在 https://github.com/Anniejoan/Uplifting-Lower-income-data 公开提供。