AI emotional companions face a safety-rapport paradox: restrictive safeguards can damage supportive alliance, while permissive systems risk user harm. We present SLIP (Staged Layers of Intervention Protocol), a four-stage graduated methodology deriving interventions (none, soft, hard) from structured qualitative indicators -- affect intensity (a) and narrative dynamism (m) -- alongside ETHICS (Emergent Taxonomy for Human-AI Interaction Context Signals), a "signals not labels" taxonomy. An evaluation combining a small-scale production deployment (N=68 entries, 10 users, 10 weeks) with a synthetic persona battery (N=91, 5 behavioral-risk profiles) achieved 0% false positives for the flow persona and showed expected escalation patterns in crisis-oriented personas. However, initial results showed that 8 consecutive days of high-energy elevation produced zero interventions (0/8), exposing a boundary where the "do not pathologize" principle conflicts with safety. A subsequent three-model stress test demonstrated that increased model capability improves detection from 0/8 to 6/8 while preserving 0/10 flow false positives in the largest model. Read as preliminary, these findings position graduated intervention as a design direction for navigating -- not resolving -- the safety-rapport tension in affective computing.
翻译:AI情感伴侣面临安全-信赖悖论:限制性防护可能损害支持性联盟,而放任系统则存在用户伤害风险。我们提出SLIP(分级干预协议)——一种基于结构化定性指标(情感强度a与叙事动力m)生成干预策略(无/软/硬干预)的四阶段分级方法,并配套提出ETHICS(人机交互情境信号分类法)——一种“信号而非标签”式分类系统。评估结合小规模生产部署(N=68条记录,10名用户,10周)与合成人格测试集(N=91,5种行为风险画像),对流畅型人格实现0%假阳性率,并在危机型人格中呈现预期的升级模式。然而初步结果显示,连续8天的高强度情绪波动未触发任何干预(0/8),暴露出“非病理化”原则与安全保障的冲突边界。后续三模型压力测试表明,模型能力提升使检测率从0/8增至6/8,且在最大模型中保持流畅型假阳性率为0/10。作为初步成果,本研究将分级干预定位为情感计算中应对(而非解决)安全-信赖张力的设计方向。