Large Language Models (LLMs) often show reduced performance, cultural alignment, and safety robustness in non-English languages, partly because English dominates both pre-training data and human preference alignment datasets. Training methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) require human preference data, which remains scarce and largely non-public for many languages beyond English. To address this gap, we introduce compar:IA, an open-source digital public service developed inside the French government and designed to collect large-scale human preference data from a predominantly French-speaking general audience. The platform uses a blind pairwise comparison interface to capture unconstrained, real-world prompts and user judgments across a diverse set of language models, while maintaining low participation friction and privacy-preserving automated filtering. As of 2026-02-07, compar:IA has collected over 600,000 free-form prompts and 250,000 preference votes, with approximately 89% of the data in French. We release three complementary datasets -- conversations, votes, and reactions -- under open licenses, and present initial analyses, including a French-language model leaderboard and user interaction patterns. Beyond the French context, compar:IA is evolving toward an international digital public good, offering reusable infrastructure for multilingual model training, evaluation, and the study of human-AI interaction.
翻译:大语言模型在非英语语言中常表现出性能下降、文化对齐度不足及安全鲁棒性减弱等问题,部分原因在于英语主导了预训练数据与人类偏好对齐数据集。强化学习人类反馈与直接偏好优化等训练方法需要人类偏好数据,而英语之外的许多语言仍严重缺乏此类数据且大多未公开。为填补这一空白,我们推出compar:IA——一项由法国政府内部开发的开源数字公共服务,旨在从以法语使用者为主的广泛受众中收集大规模人类偏好数据。该平台采用盲选成对比较界面,在多样化语言模型集合中捕获无约束的真实世界提示与用户判断,同时保持较低的参与门槛及保护隐私的自动过滤机制。截至2026年2月7日,compar:IA已收集超过60万条自由形式提示与25万次偏好投票,其中约89%的数据为法语。我们以开放许可发布了三个互补数据集——对话记录、投票数据与互动反馈,并提供了初步分析,包括法语模型排行榜及用户交互模式分析。超越法国语境,compar:IA正逐步发展为国际数字公共产品,为多语言模型训练、评估及人机交互研究提供可复用的基础设施。