Shannon's rate-distortion theory treats source symbols as unstructured labels. When the source is a knowledge base equipped with a logical proof system, a natural fidelity criterion is closure fidelity: a reconstruction is acceptable if it preserves the deductive closure of the original. This paper develops a rate-distortion theory under this criterion. Central to the theory is the irredundant core-a canonical generating set extracted by a fixed-order deletion procedure, from which the full deductive closure can be rederived. We prove that the zero-distortion semantic rate equals a quantity that is strictly below the classical entropy rate whenever the knowledge base contains redundant states. More generally, the full semantic rate-distortion function depends only on the core; redundant states are invisible to both rate and distortion. We derive a semantic source-channel separation theorem showing a semantic leverage phenomenon: under closure fidelity, the required source rate is reduced by an asymptotic leverage factor greater than one, allowing the same knowledge base to be communicated with proportionally fewer channel uses-not by violating Shannon capacity, but because redundant states become free. We also prove a strengthened Fano inequality that exploits core structure. For heterogeneous multi-agent communication, an overlap decomposition gives necessary and sufficient conditions for closure-reliable transmission and identifies a semantic bottleneck in broadcast settings that persists even over noiseless channels. All results are verified on Datalog instances with up to 24,000 base facts.
翻译:Shannon 的率失真理论将信源符号视为无结构的标签。当信源装备有逻辑证明系统的知识库时,一个自然的保真度准则是闭包保真度:若重建结果保留了原始知识库的演绎闭包,则该重建是可接受的。本文在此准则下发展了率失真理论。该理论的核心是“无冗余核心”——通过固定顺序删除过程提取的典范生成集,完整演绎闭包可由其重新推导得出。我们证明,当知识库包含冗余状态时,零失真语义率严格低于经典熵率。更一般地,完整语义率失真函数仅取决于核心;冗余状态在率和失真两方面均不可见。我们推导出语义信源信道分离定理,揭示语义杠杆现象:在闭包保真度下,所需信源率被大于1的渐近杠杆因子降低,使得同一知识库可用更少的信道使用次数传输——这并非违反香农容量,而是因为冗余状态变为免费。我们还证明了一个利用核心结构的强化Fano不等式。对于异构多智能体通信,重叠分解给出了闭包可靠传输的充要条件,并识别出广播设置中即使在无噪声信道下仍存在的语义瓶颈。所有结果均在包含多达24,000条基本事实的Datalog实例上得到验证。