Large language models exhibit a systematic tendency toward early semantic commitment: given ambiguous input, they collapse multiple valid interpretations into a single response before sufficient context is available. We present a formal framework for text-to-state mapping ($φ: \mathcal{T} \to \mathcal{S}$) that transforms natural language into a non-collapsing state space where multiple interpretations coexist. The mapping decomposes into three stages: conflict detection, interpretation extraction, and state construction. We instantiate $φ$ with a hybrid extraction pipeline combining rule-based segmentation for explicit conflict markers (adversative conjunctions, hedging expressions) with LLM-based enumeration of implicit ambiguity (epistemic, lexical, structural). On a test set of 68 ambiguous sentences, the resulting states preserve interpretive multiplicity: mean state entropy $H = 1.087$ bits across ambiguity categories, compared to $H = 0$ for collapse-based baselines. We additionally instantiate the rule-based conflict detector for Japanese markers to illustrate cross-lingual portability. This framework extends Non-Resolution Reasoning (NRR) by providing the missing algorithmic bridge between text and the NRR state space, enabling architectural collapse deferment in LLM inference.
翻译:大型语言模型表现出一种系统性的早期语义承诺倾向:给定模糊输入时,它们会在获得足够上下文之前,将多种有效解释坍缩为单一响应。我们提出了一个用于文本到状态映射($φ: \mathcal{T} \to \mathcal{S}$)的形式化框架,该框架将自然语言转换为一个非坍缩的状态空间,使得多种解释能够共存。该映射分解为三个阶段:冲突检测、解释提取和状态构建。我们通过一个混合提取流程实例化了$φ$,该流程结合了基于规则的分割(用于检测显式冲突标记,如转折连词、模糊表达)和基于LLM的隐式歧义枚举(认知、词汇、结构歧义)。在一个包含68个模糊句子的测试集上,生成的状态保留了解释的多样性:跨歧义类别的平均状态熵$H = 1.087$比特,而基于坍缩的基线方法$H = 0$。我们还为日语标记实例化了基于规则的冲突检测器,以说明其跨语言可移植性。该框架通过提供文本与NRR状态空间之间缺失的算法桥梁,扩展了非消解推理,从而实现了LLM推理中的架构级坍缩延迟。