Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, as well as industry practices and standards. Based on this analysis, we propose a set of guiding principles for GenFMs, developed through extensive multidisciplinary collaboration that integrates technical, ethical, legal, and societal perspectives. Second, we introduce TrustGen, the first dynamic benchmarking platform designed to evaluate trustworthiness across multiple dimensions and model types, including text-to-image, large language, and vision-language models. TrustGen leverages modular components--metadata curation, test case generation, and contextual variation--to enable adaptive and iterative assessments, overcoming the limitations of static evaluation methods. Using TrustGen, we reveal significant progress in trustworthiness while identifying persistent challenges. Finally, we provide an in-depth discussion of the challenges and future directions for trustworthy GenFMs, which reveals the complex, evolving nature of trustworthiness, highlighting the nuanced trade-offs between utility and trustworthiness, and consideration for various downstream applications, identifying persistent challenges and providing a strategic roadmap for future research. This work establishes a holistic framework for advancing trustworthiness in GenAI, paving the way for safer and more responsible integration of GenFMs into critical applications. To facilitate advancement in the community, we release the toolkit for dynamic evaluation.
翻译:生成式基础模型已成为变革性工具,但其广泛应用引发了对其多维度可靠性的关键关切。本文通过三项核心贡献提出应对这些挑战的综合框架。首先,我们系统梳理了各国政府与监管机构的全球人工智能治理法规政策,以及行业实践与标准。基于此分析,我们提出一套通过融合技术、伦理、法律与社会视角的多学科广泛协作制定的生成式基础模型指导原则。其次,我们推出首个动态基准测试平台TrustGen,用于评估涵盖文本到图像、大语言及视觉语言模型等多维度与模型类型的可靠性。TrustGen利用元数据策展、测试用例生成和上下文变异三大模块化组件,实现自适应迭代评估,克服了静态评估方法的局限性。通过TrustGen的实证分析,我们揭示了可靠性方面的显著进展,同时识别出持续存在的挑战。最后,我们深入探讨了可信生成式基础模型面临的挑战与未来方向,揭示了可靠性复杂演变的本质,阐明了效用与可靠性之间微妙的权衡关系,考量了各类下游应用场景,识别出持续性挑战并为未来研究提供战略路线图。本研究建立了推进生成式人工智能可靠性的整体框架,为生成式基础模型更安全、更负责任地融入关键应用铺平道路。为促进学界发展,我们同步开源动态评估工具包。