The field of AI alignment is increasingly concerned with the questions of how values are integrated into the design of generative AI systems and how their integration shapes the social consequences of AI. However, existing transparency frameworks focus on the informational aspects of AI models, data, and procedures, while the institutional and organizational forces that shape alignment decisions and their downstream effects remain underexamined in both research and practice. To address this gap, we develop a framework of \emph{structural transparency} for analyzing organizational and institutional decisions concerning AI alignment, drawing on the theoretical lens of Institutional Logics. We develop a categorization of organizational decisions that are present in the governance of AI alignment, and provide an explicit analytical approach to examining them. We operationalize the framework through five analytical components, each with an accompanying "analyst recipe" that collectively identify the primary institutional logics and their internal relationships, external disruptions to existing social orders, and finally, how the structural risks of each institutional logic are mapped to a catalogue of sociotechnical harms. The proposed concept of structural transparency enables analysts to complement existing approached based on informational transparency with macro-level analyses that capture the institutional dynamics and consequences of decisions regarding AI alignment.
翻译:人工智能对齐领域日益关注价值观如何融入生成式人工智能系统的设计,以及这种融合如何影响人工智能的社会后果。然而,现有的透明度框架主要聚焦于人工智能模型、数据及流程的信息层面,而对形塑对齐决策及其下游影响的制度与组织力量,在研究和实践中均未得到充分审视。为弥补这一不足,我们借鉴制度逻辑的理论视角,构建了一个用于分析人工智能对齐相关组织与制度决策的“结构透明度”框架。我们对人工智能对齐治理中存在的组织决策进行了分类,并提供了明确的实证分析方法。该框架通过五个分析组件实现操作化,每个组件均配有相应的“分析指南”,共同用于识别:主导性制度逻辑及其内部关系、对现有社会秩序的外部冲击,以及每种制度逻辑的结构性风险如何映射至社会技术危害的分类体系。所提出的结构透明度概念使分析者能够将基于信息透明度的现有方法与宏观层面分析相结合,从而捕捉人工智能对齐决策的制度动态及其社会后果。