Recent trends in LLMs development clearly show growing interest in the use and application of sovereign LLMs. The global debate over sovereign LLMs highlights the need for governments to develop their LLMs, tailored to their unique socio-cultural and historical contexts. However, there remains a shortage of frameworks and datasets to verify two critical questions: (1) how well these models align with users' socio-cultural backgrounds, and (2) whether they maintain safety and technical robustness without exposing users to potential harms and risks. To address this gap, we construct a new dataset and introduce an analytic framework for extracting and evaluating the socio-cultural elements of sovereign LLMs, alongside assessments of their technical robustness. Our experimental results demonstrate that while sovereign LLMs play a meaningful role in supporting low-resource languages, they do not always meet the popular claim that these models serve their target users well. We also show that pursuing this untested claim may lead to underestimating critical quality attributes such as safety. Our study suggests that advancing sovereign LLMs requires a more extensive evaluation that incorporates a broader range of well-grounded and practical criteria.
翻译:近期大语言模型的发展趋势清晰地显示出对主权大语言模型使用与应用的日益增长的兴趣。关于主权大语言模型的全球辩论凸显了各国政府根据其独特的社会文化与历史背景开发自身大语言模型的必要性。然而,目前仍缺乏相应的框架与数据集来验证两个关键问题:(1)这些模型与用户社会文化背景的契合程度如何;(2)它们是否能在不将用户暴露于潜在危害与风险的前提下,保持安全性与技术鲁棒性。为填补这一空白,我们构建了一个新的数据集,并引入了一个分析框架,用于提取和评估主权大语言模型的社会文化要素,同时对其技术鲁棒性进行评估。我们的实验结果表明,尽管主权大语言模型在支持低资源语言方面发挥着有意义的作用,但它们并不总能满足"这些模型能很好地服务于其目标用户"这一普遍主张。我们还表明,追求这一未经检验的主张可能导致低估诸如安全性等关键质量属性。我们的研究表明,推进主权大语言模型的发展需要进行更广泛的评估,纳入更全面、更具根基且实用的标准。