Despite their scale and success, modern transformers are usually trained as single-minded systems: optimization produces a deterministic set of parameters, representing a single functional hypothesis about the data. Motivated by the analogy to human populations, in which population-level intelligence emerges from diverse individual behaviors, we propose Population Bayesian Transformers (B-Trans), which enable sampling diverse yet coherent transformer large language model instances (hereafter referred to as a 'mind') from a single pre-trained LLM. B-Trans introduces a Bayesian-inspired posterior proxy by injecting stochasticity directly into normalization layers, avoiding the prohibitive cost of training full Bayesian neural networks. Sampling from this proxy yields a population of minds with diverse behaviors while maintaining general competence. During the generation of each response, we sample a single realization from the random distribution and hold it fixed, ensuring temporal consistency and reasoning coherence. Experiments on zero-shot generation and Reinforcement Learning with Verifiable Rewards (RLVR) demonstrate that B-Trans effectively leverages the stochastic model diversity, yielding superior response diversity while achieving better task performance compared to deterministic baselines.
翻译:尽管规模庞大且成效显著,现代Transformer通常被训练为单一心智系统:优化过程产生一组确定性参数,代表对数据的单一功能假设。受人类群体智能的启发——群体层面的智能源自多样化的个体行为,我们提出群体贝叶斯Transformer(B-Trans),该模型能够从单个预训练大语言模型中采样生成多样且连贯的Transformer大语言模型实例(下文称为“心智”)。B-Trans通过直接在归一化层注入随机性,引入贝叶斯启发的后验代理,避免了训练完整贝叶斯神经网络的高昂成本。从该代理采样可获得具有多样化行为模式的心智群体,同时保持通用能力。在生成每个响应时,我们从随机分布中采样单个实现并保持其固定,确保时间一致性与推理连贯性。在零样本生成和可验证奖励强化学习(RLVR)上的实验表明,B-Trans能有效利用随机模型多样性,在获得更优任务性能的同时,相比确定性基线模型产生更卓越的响应多样性。