The rapid progress of large language models (LLMs) raises concerns about cultural bias, fairness, and performance in diverse languages and underrepresented regions. Addressing these gaps requires large-scale resources grounded in multilingual, local, and cultural contexts. We systematize and extend the earlier NativQA framework to multimodality by adding image, audio, and video support, enabling scalable construction of culturally and regionally aligned QA datasets in native languages. Given user-defined seed queries, the framework uses search engines to collect location-specific everyday information. We evaluate it across 39 locations in 24 countries and 7 languages, spanning extremely low-resource to high-resource settings, and collect over $\sim$300K text QA pairs, $\sim$312K images, and $\sim$29K videos with associated audio. The developed resources can be used for LLMs benchmarking and further fine-tuning. The framework has been made publicly available for the community (https://gitlab.com/nativqa/nativqa-framework). Demo video is available here: \href{https://shorturl.at/DAVn9}{https://shorturl.at/DAVn9}.
翻译:大型语言模型(LLM)的快速发展引发了对文化偏见、公平性以及在多种语言和代表性不足地区性能的担忧。解决这些差距需要基于多语言、本地和文化背景的大规模资源。我们将早期的NativQA框架系统化并扩展至多模态,通过添加图像、音频和视频支持,实现了以本地语言构建文化及区域对齐的问答数据集的可扩展性。给定用户定义的种子查询,该框架利用搜索引擎收集特定位置的日常信息。我们在24个国家39个地点、涵盖7种语言(从极低资源到高资源场景)中对其进行评估,收集了约30万文本问答对、约31.2万张图像以及约2.9万段含音频的视频。所开发的资源可用于LLM基准测试与进一步微调。该框架已向社区公开提供(https://gitlab.com/nativqa/nativqa-framework)。演示视频见:https://shorturl.at/DAVn9。