Nous: An Attempt to Extract and Inject the Cognition Behind Prediction-Market Behavior

from arxiv, 37 pages, 1 figure, 7 tables. Reproduction artifacts (code, frozen profiles, prompts, model outputs): https://github.com/WillChienT/nous-paper

As LLM agents proliferate in prediction markets and collective decision-making, they risk a cognitive monoculture: agents built on shared foundation models produce correlated forecasts, and recent measurement finds frontier-model errors correlated at r ~ 0.77. We ask whether human cognitive diversity can be recovered from behavior and transferred to LLM agents. Nous extracts a structured eight-dimension behavioral profile from real Polymarket trading activity and injects it into agents through prompts. Our central finding is a dissociation between the two halves of that pipeline. Extraction works, partially: across 100 wallets, 8 of 14 parameters are temporally stable (split-half ICC >= 0.5, bootstrap CI lower bound > 0.3; contrarian score reaches ICC ~ 0.9); wallets are identifiable from their profiles well above chance (top-1 retrieval 17-22% vs. 1% chance); and two of four pre-specified dimensions rank-correlate with future realized profit out-of-sample, though the correlations do not survive behavioral-confound controls. Prompt-level injection does not measurably transmit it: on a semantic embedding metric, structured injection shows no significant advantage over a length-matched control on any model, and the diversity it induces neither reduces ensemble error correlation nor improves Brier score -- a null that persists across exploratory checks on sampling temperature, profile diversity, and question difficulty. Measuring the prompts themselves locates the compression before the model: the structure-to-narrative translator emits near-uniform prompts whose spread does not track profile spread. We position Nous as measuring the cognitive-monoculture problem and the limits of a prompt-level remedy, motivating deeper, below-the-prompt injection (fine-tuning, activation steering). Code, frozen profiles, prompts, and model outputs: https://github.com/WillChienT/nous-paper

翻译：随着LLM智能体在预测市场和集体决策中大量涌现，认知同质化风险随之显现：基于共享基础模型构建的智能体会产生关联性预测，最新测量显示前沿模型间的预测误差相关系数高达r≈0.77。我们探究能否从人类行为中恢复认知多样性并将其迁移至LLM智能体。Nous从Polymarket真实交易活动中提取出结构化八维行为特征，并通过提示词注入智能体。核心发现揭示该流程两个环节存在分离：提取环节部分有效——在100个钱包中，14个参数中的8个具有时间稳定性（分半ICC≥0.5，自助法置信区间下限>0.3；逆向投资者得分达ICC≈0.9）；基于行为特征识别钱包的成功率显著高于随机水平（top-1检索率17-22%，随机基准1%）；预设的四个维度中有两个与样本外未来实现利润存在秩相关，但该相关性在控制行为混淆变量后消失。注入环节未能实现可测量的认知传递：在语义嵌入指标上，结构化注入相比长度匹配的对照方法未在各种模型中展现显著优势，且其诱导的多样性既未降低集成预测误差关联性，也未提升Brier分数——这一零结果在采样温度、特征多样性和问题难度等探索性分析中保持稳健。对提示词本身的测量将压缩瓶颈定位在模型之前：结构-叙事转换器生成了近乎均匀的提示词，其离散度未反映特征分布的离散度。我们将Nous定位为认知同质化问题的测量工具及提示层面补救方案的局限性评估，旨在推动更深层的非提示层面注入方法（微调、激活干预）。代码、固化特征、提示词及模型输出见：https://github.com/WillChienT/nous-paper