Conversational AI is increasingly used for advice, interpretation, reassurance, and decision support in contexts where users may be vulnerable, uncertain, or dependent on the system's apparent competence. Existing alignment work often focuses on model objectives, preference optimization, or output correctness. Yet, many harms arise through interaction: how systems frame authority, express uncertainty, simulate empathy, support reasoning, and make boundaries legible. This paper introduces the Layered Cognitive Alignment Model (LCAM), a conceptual and normative framework for diagnosing interac-tional alignment failures in conversational AI. LCAM defines alignment as a calibrated fit among system behavior, user goals, task demands, and normative context. It distinguishes five layers of fit: perceptual, semantic, affective, cognitive, and ethical, and two diagnostic polarities of misalignment: underfit and overreach. We apply LCAM to a published LLM counseling example, showing how an apparently supportive response can reinforce harmful beliefs, simulate inappropriate care, and obscure role boundaries. By translating conversational failures into audit and governance questions concerning over-reliance, false intimacy, autonomy erosion, boundary confusion, and inappropriate trust, LCAM offers a theoretical and normative lens for evaluating conversational AI beyond accuracy, helpfulness, or trust.
翻译:对话式AI越来越多地用于用户在脆弱、不确定或依赖系统表面能力的情境中提供建议、解读、安慰和决策支持。现有的对齐工作常聚焦于模型目标、偏好优化或输出正确性。然而,许多危害源于交互本身:系统如何构建权威、表达不确定性、模拟共情、支持推理以及界定边界。本文引入了分层认知对齐模型(LCAM),这是一个概念性和规范性框架,用于诊断对话式AI中的交互对齐失败。LCAM将对齐定义为系统行为、用户目标、任务需求和规范语境之间的校准匹配。它区分了五个匹配层面:感知层、语义层、情感层、认知层和伦理层,以及两种失调的诊断极性:欠匹配和过度干预。我们将LCAM应用于一个已发表的LLM心理咨询实例,展示了一个看似支持性的回应如何强化有害信念、模拟不恰当的关怀并模糊角色边界。通过将对话失败转化为关于过度依赖、虚假亲密、自主性侵蚀、边界混淆和不恰当信任的审计与治理问题,LCAM为在准确性、有用性或信任之外评估对话式AI提供了理论性和规范性的视角。