Subliminal Effects in Your Data: A General Mechanism via Log-Linearity

Training modern large language models (LLMs) has become a veritable smorgasbord of algorithms and datasets designed to elicit particular behaviors, making it critical to develop techniques to understand the effects of datasets on the model's properties. This is exacerbated by recent experiments that show datasets can transmit signals that are not directly observable from individual datapoints, posing a conceptual challenge for dataset-centric understandings of LLM training and suggesting a missing fundamental account of such phenomena. Towards understanding such effects, inspired by recent work on the linear structure of LLMs, we uncover a general mechanism through which hidden subtexts can arise in generic datasets. We introduce Logit-Linear-Selection (LLS), a method that prescribes how to select subsets of a generic preference dataset to elicit a wide range of hidden effects. We apply LLS to discover subsets of real-world datasets so that models trained on them exhibit behaviors ranging from having specific preferences, to responding to prompts in a different language not present in the dataset, to taking on a different persona. Crucially, the effect persists for the selected subset, across models with varying architectures, supporting its generality and universality.

翻译：训练现代大型语言模型（LLMs）已成为各种算法和数据集的集合，旨在激发特定行为，这使得开发理解数据集对模型属性影响的技术变得至关重要。最近的实验进一步加剧了这一挑战，这些实验表明数据集能够传递无法从单个数据点直接观测到的信号，这对以数据集为中心的LLM训练理解提出了概念性挑战，并暗示此类现象缺乏根本性解释。为理解此类效应，受近期关于LLMs线性结构研究的启发，我们揭示了一种通用机制，通过该机制，隐藏的潜文本可在通用数据集中产生。我们提出了对数线性选择（Logit-Linear-Selection, LLS）方法，该方法规定了如何从通用偏好数据集中选择子集以激发广泛的隐藏效应。我们应用LLS从现实数据集中发现子集，使得基于这些子集训练的模型展现出从具有特定偏好、到以数据集中未出现的不同语言响应提示、再到呈现不同人格特征等一系列行为。关键的是，这种效应在选定的子集中持续存在，且跨越不同架构的模型，支持了其普适性与通用性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【AAAI2026】NeSTR：一种用于大型语言模型的神经-符号可溯因框架，用于时间推理

专知会员服务

17+阅读 · 2025年12月10日

从数据中心视角出发的高效大语言模型训练综述

专知会员服务

23+阅读 · 2025年10月31日

【CMU博士论文】大型语言模型的隐性特性

专知会员服务

15+阅读 · 2025年10月18日

什么是后训练？大语言模型训练后优化方法综述，87页pdf

专知会员服务

54+阅读 · 2025年3月11日