CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval

The central challenge of reasoning-intensive retrieval lies in identifying implicitreasoning relationships between queries and documents, rather than superficial se-mantic or lexical similarity. The contrastive learning paradigm is fundamentallya static representation consolidation technique: during training, it encodes hier-archical relevance concepts into fixed geometric structures in the vector space,and at inference time it cannot dynamically adjust relevance judgments accord-ing to the specific reasoning demands of each query. Consequently, performancedegrades noticeably when vocabulary mismatch exists between queries and doc-uments or when implicit reasoning is required to establish relevance. This pa-per proposes Thought 1 (T1), a generative retrieval model that shifts relevancemodeling from static alignment to dynamic reasoning. On the query side, T1 dy-namically generates intermediate reasoning trajectories for each query to bridgeimplicit reasoning relationships and uses <embtoken> as a semantic aggregationpoint for the reasoning output. On the document side, it employs an instruction+ text + <embtoken> encoding format to support high-throughput indexing. Tointernalize dynamic reasoning capabilities into vector representations, we adopt athree-stage training curriculum and introduce GRPO in the third stage, enablingthe model to learn optimal derivation strategies for different queries through trial-and-error reinforcement learning. On the BRIGHT benchmark, T1-4B exhibitsstrong performance under the original query setting, outperforming larger modelstrained with contrastive learning overall, and achieving performance comparableto multi-stage retrieval pipelines. The results demonstrate that replacing static rep-resentation alignment with dynamic reasoning generation can effectively improvereasoning-intensive retrieval performance.

翻译：推理密集型检索的核心挑战在于识别查询与文档之间的隐含推理关系，而非表面的语义或词汇相似性。对比学习范式本质上是一种静态表示固化技术：在训练阶段，它将层级化的相关性概念编码为向量空间中的固定几何结构；在推理阶段，它无法根据每个查询的具体推理需求动态调整相关性判断。因此，当查询与文档之间存在词汇不匹配或需要隐含推理来建立相关性时，性能会显著下降。本文提出Thought 1（T1）——一种生成式检索模型，将相关性建模从静态对齐转向动态推理。在查询侧，T1为每个查询动态生成中间推理轨迹以桥接隐含推理关系，并使用<embtoken>作为推理输出的语义聚合点。在文档侧，它采用指令+文本+<embtoken>的编码格式以支持高吞吐量索引。为了将动态推理能力内化到向量表示中，我们采用三阶段训练课程，并在第三阶段引入GRPO，使模型能够通过试错式强化学习为不同查询学习最优推导策略。在BRIGHT基准测试中，T1-4B在原始查询设置下表现出强劲性能，整体上优于采用对比学习的更大规模模型，并达到了与多阶段检索流程相当的性能。结果表明，用动态推理生成替代静态表示对齐能有效提升推理密集型检索的性能。