NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM Inference

from arxiv, 25 pages, 5 figures, 7 tables. Replacement synced to repository snapshot v38. Added a direct series-hub link in the abstract for cross-paper navigation: https://github.com/kei-saito-research/nrr-series-hub . Series numbering policy: paper3 is intentionally skipped and never reused

Large language models exhibit a systematic tendency toward early semantic commitment: given ambiguous input, they collapse multiple valid interpretations into a single response before sufficient context is available. This premature collapse discards information that may prove essential as dialogue evolves. We present a formal framework for text-to-state mapping (phi: T -> S) that transforms natural language into a non-collapsing state space where multiple interpretations coexist. The mapping decomposes into three stages: conflict detection, interpretation extraction, and state construction. We instantiate phi with a hybrid extraction pipeline that combines rule-based segmentation for explicit conflict markers (adversative conjunctions, hedging expressions) with LLM-based enumeration of implicit ambiguity (epistemic, lexical, structural). On a test set of 68 ambiguous sentences, the resulting states preserve interpretive multiplicity: using hybrid extraction, we obtain mean state entropy H = 1.087 bits across ambiguity categories, compared to H = 0 for collapse-based baselines that commit to a single interpretation. We additionally instantiate the rule-based conflict detector for Japanese markers (kedo, kamoshirenai, etc.) to illustrate cross-lingual portability of the conflict detection stage. This framework extends Non-Resolution Reasoning (NRR) by providing the missing algorithmic bridge between text and the NRR state space, enabling architectural collapse deferment in LLM inference. Design principles for state-to-state transformations are detailed in the Appendix, with empirical validation on 580 test cases (180 single states, 200 contradictory pairs, 200 temporal pairs), demonstrating 0% collapse for principle-satisfying operators versus up to 17.8% for violating operators.

翻译：大型语言模型表现出一种系统性的早期语义承诺倾向：面对歧义输入时，它们会在获得足够上下文之前，将多种有效解释坍缩为单一响应。这种过早的坍缩丢弃了在对话演进过程中可能至关重要的信息。我们提出了一个文本到状态映射（phi: T -> S）的形式化框架，该框架将自然语言转换到一个非坍缩的状态空间，使得多种解释能够共存。该映射分解为三个阶段：冲突检测、解释提取和状态构建。我们通过一个混合提取流水线实例化phi，该流水线结合了基于规则的显式冲突标记（转折连词、模糊表达）分割与基于LLM的隐式歧义（认知、词汇、结构）枚举。在一个包含68个歧义句子的测试集上，生成的状态保留了解释的多样性：使用混合提取，我们在各歧义类别上获得的平均状态熵H = 1.087比特，相比之下，基于坍缩并承诺单一解释的基线方法H = 0。我们还为日语标记（kedo、kamoshirenai等）实例化了基于规则的冲突检测器，以说明冲突检测阶段的跨语言可移植性。该框架通过提供文本与NRR状态空间之间缺失的算法桥梁，扩展了非消解推理，从而在LLM推理中实现了架构层面的坍缩延迟。状态到状态转换的设计原则在附录中详述，并在580个测试用例（180个单状态、200个矛盾对、200个时序对）上进行了实证验证，结果表明满足原则的算子坍缩率为0%，而违反原则的算子坍缩率高达17.8%。