Large language models exhibit a systematic tendency toward early semantic commitment: given ambiguous input, they collapse multiple valid interpretations into a single response before sufficient context is available. This premature collapse discards information that may prove essential as dialogue evolves. We present a formal framework for text-to-state mapping (phi: T -> S) that transforms natural language into a non-collapsing state space where multiple interpretations coexist. The mapping decomposes into three stages: conflict detection, interpretation extraction, and state construction. We instantiate phi with a hybrid extraction pipeline that combines rule-based segmentation for explicit conflict markers (adversative conjunctions, hedging expressions) with LLM-based enumeration of implicit ambiguity (epistemic, lexical, structural). On a test set of 68 ambiguous sentences, the resulting states preserve interpretive multiplicity: using hybrid extraction, we obtain mean state entropy H = 1.087 bits across ambiguity categories, compared to H = 0 for collapse-based baselines that commit to a single interpretation. We additionally instantiate the rule-based conflict detector for Japanese markers (kedo, kamoshirenai, etc.) to illustrate cross-lingual portability of the conflict detection stage. This framework extends Non-Resolution Reasoning (NRR) by providing the missing algorithmic bridge between text and the NRR state space, enabling architectural collapse deferment in LLM inference. Design principles for state-to-state transformations are detailed in the Appendix, with empirical validation on 580 test cases (180 single states, 200 contradictory pairs, 200 temporal pairs), demonstrating 0% collapse for principle-satisfying operators versus up to 17.8% for violating operators.
翻译:大型语言模型表现出一种系统性的早期语义承诺倾向:面对歧义输入时,它们会在获得足够上下文之前,将多种有效解释坍缩为单一响应。这种过早的坍缩丢弃了在对话演进过程中可能至关重要的信息。我们提出了一个文本到状态映射(phi: T -> S)的形式化框架,该框架将自然语言转换到一个非坍缩的状态空间,使得多种解释能够共存。该映射分解为三个阶段:冲突检测、解释提取和状态构建。我们通过一个混合提取流水线实例化phi,该流水线结合了基于规则的显式冲突标记(转折连词、模糊表达)分割与基于LLM的隐式歧义(认知、词汇、结构)枚举。在一个包含68个歧义句子的测试集上,生成的状态保留了解释的多样性:使用混合提取,我们在各歧义类别上获得的平均状态熵H = 1.087比特,相比之下,基于坍缩并承诺单一解释的基线方法H = 0。我们还为日语标记(kedo、kamoshirenai等)实例化了基于规则的冲突检测器,以说明冲突检测阶段的跨语言可移植性。该框架通过提供文本与NRR状态空间之间缺失的算法桥梁,扩展了非消解推理,从而在LLM推理中实现了架构层面的坍缩延迟。状态到状态转换的设计原则在附录中详述,并在580个测试用例(180个单状态、200个矛盾对、200个时序对)上进行了实证验证,结果表明满足原则的算子坍缩率为0%,而违反原则的算子坍缩率高达17.8%。