Activation steering has emerged as a powerful tool for shaping the behaviour of large language models at inference time, yet most prior work injects a \emph{single} semantic direction into the residual stream. We study the richer setting in which two semantically opposing steering vectors are superimposed -- a regime we call \textbf{Creative Collision}. Concretely, we construct directorial persona vectors for Steven Spielberg (optimistic, redemptive moral valence) and Martin Scorsese (dark, morally ambiguous) via mean-difference activation contrast on curated screenplay-derived corpora, then interpolate between them with a scalar mixing parameter $α\in [0,1]$ and a steering coefficient $λ$. Across five evaluation axes -- moral valence, generation coherence, surface style, directional dominance, and vector geometry -- three principal findings emerge: (i)~Spielberg's representational signature exhibits robust \emph{directional dominance}, suppressing Scorsese's moral influence across almost the entire interpolation range; (ii)~intermediate collision points paradoxically \emph{improve} generation coherence relative to pure single-director steering at high $λ$; and (iii)~both personas localise maximally to layer~28 of a 40-layer decoder-only transformer, revealing a shared \emph{moral-tone substrate}. These results illuminate the geometry of competing semantic directions in transformer residual streams and have direct implications for controllable creative generation and value-aligned narrative synthesis.
翻译:激活引导已成为在推理阶段塑造大型语言模型行为的强大工具,然而大多数先前工作仅将单一语义方向注入残差流中。我们研究了两种语义相反的引导向量叠加的更丰富场景——我们将这种机制称为"创意碰撞"。具体而言,我们通过在精心整理的剧本语料库上进行均值差异激活对比,构建了史蒂文·斯皮尔伯格(乐观、救赎性道德倾向)和马丁·斯科塞斯(阴暗、道德模糊)的导演人格向量,然后使用标量混合参数α∈[0,1]和引导系数λ进行插值。在五个评估维度(道德倾向、生成连贯性、表面风格、方向主导性和向量几何)上,我们发现了三个主要结论:(i)斯皮尔伯格的表征特征表现出稳健的"方向主导性",在几乎整个插值范围内压制了斯科塞斯的道德影响;(ii)在较高的λ下,与纯单一导演引导相比,中间碰撞点反而提高了生成连贯性;(iii)两种人格均在40层仅解码器转换器的第28层达到最大定位,揭示了一个共享的"道德基调基底"。这些结果阐明了转换器残差流中竞争语义方向的几何特性,并对可控创意生成和价值观对齐的叙事合成具有直接意义。