Physics-Aware, Shannon-Optimal Compression via Arithmetic Coding for Distributional Fidelity - 专知论文

会员服务 ·

0

Physics-Aware, Shannon-Optimal Compression via Arithmetic Coding for Distributional Fidelity

翻译：暂无翻译

Cristiano Fanelli

from arxiv, 13 pages, 5 figures

Assessing whether two datasets are distributionally consistent is central to modern scientific analysis, particularly as generative artificial intelligence produces synthetic data whose fidelity must be validated against real observations in increasingly high-dimensional settings. Existing approaches are typically relative: they determine whether one dataset is more consistent with a reference than another, but do not provide a physically grounded absolute standard for fidelity. We propose an information-theoretic approach in which lossless compression via arithmetic coding provides an operational measure of dataset fidelity under a physics-informed probabilistic representation. Datasets sharing the same underlying physical correlations admit comparable optimal descriptions, while discrepancies-arising from miscalibration, mismodeling, or bias-manifest as an irreducible excess in codelength relative to the Shannon-optimal limit defined by the physics itself. This excess codelength defines an absolute fidelity metric, quantified directly in bits. Unlike conventional measures, which lack an intrinsic scale, zero excess provides a well-defined and physically meaningful target corresponding to consistency with the underlying distribution. We show that this metric is global, interpretable, additive across components, and asymptotically optimal, with differences in codelength corresponding to differences in expected negative log-likelihood under a common reference model. As a byproduct, our approach achieves improved compression relative to standard general-purpose algorithms such as gzip. These results establish arithmetic coding not merely as a compression tool, but as a measurement instrument for absolute, physics-grounded assessment of distributional fidelity.

翻译：暂无翻译

0

相关内容

AAAI 2026 | 构建模板-定理知识图谱以增强大语言模型的数学推理能力

AAAI 2026 | 构建模板-定理知识图谱以增强大语言模型的数学推理能力

专知会员服务

20+阅读 · 1月17日

【NeurIPS2022】序列(推荐)模型分布外泛化：因果视角与求解

【NeurIPS2022】序列(推荐)模型分布外泛化：因果视角与求解

专知会员服务

14+阅读 · 2022年12月11日

【MIT博士论文】分层概率多模态数据融合研究进展，289页pdf

【MIT博士论文】分层概率多模态数据融合研究进展，289页pdf

专知会员服务

76+阅读 · 2022年9月6日

【NeurIPS2021】SOLQ：基于学习查询的物体分割

【NeurIPS2021】SOLQ：基于学习查询的物体分割

专知会员服务

10+阅读 · 2021年11月9日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

中科院发布最新迁移学习综述论文，带你全面了解40种迁移学习方法

中科院发布最新迁移学习综述论文，带你全面了解40种迁移学习方法

专知会员服务

154+阅读 · 2019年11月19日

【伯克利马毅老师等重磅新书】低维模型进行高维数据分析:原理、计算和应用，710页pdf

【伯克利马毅老师等重磅新书】低维模型进行高维数据分析:原理、计算和应用，710页pdf

专知

45+阅读 · 2020年12月9日

KG 高引论文解读两篇 | 两种模型：多层卷积神经网络、知识感知路径递归网络

KG 高引论文解读两篇 | 两种模型：多层卷积神经网络、知识感知路径递归网络

学术头条

18+阅读 · 2019年12月8日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

AINLP

25+阅读 · 2019年8月21日

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

AINLP

15+阅读 · 2019年8月12日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新八篇主题模型相关论文—在线光谱学习、PAM变分推断、章节推荐、多芯片系统、文本分析、动态主题模型

【论文推荐】最新八篇主题模型相关论文—在线光谱学习、PAM变分推断、章节推荐、多芯片系统、文本分析、动态主题模型

专知

12+阅读 · 2018年5月6日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

读书报告 | Deep Learning for Extreme Multi-label Text Classification

读书报告 | Deep Learning for Extreme Multi-label Text Classification

科技创新与创业

48+阅读 · 2018年1月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

有效融合多源异构数据的集成分类器研究

国家自然科学基金

5+阅读 · 2015年12月31日

基于稳健估计方程的复杂纵向数据研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维不平衡数据的集成学习算法研究

国家自然科学基金

16+阅读 · 2015年12月31日

基于相依数据的梯度学习理论研究

国家自然科学基金

1+阅读 · 2015年12月31日

复杂数据模型中的分布逼近方法

国家自然科学基金

3+阅读 · 2014年12月31日

大规模轨迹数据的地理空间关联解译及分析挖掘研究

国家自然科学基金

1+阅读 · 2014年12月31日

融合式多址通信网络理论与控制协议研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于相关性的大数据分类理论与方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

信号时频分析与包络的数学模型

国家自然科学基金

1+阅读 · 2014年12月31日

某些分形集上拉普拉斯算子的谱分析及相关问题

国家自然科学基金

0+阅读 · 2014年12月31日

Random-Effects Algorithm for Random Objects in Metric Spaces

Arxiv

0+阅读 · 5月4日

Estimating Joint Interventional Distributions from Marginal Interventional Data

Arxiv

0+阅读 · 4月17日

High-Dimensional Data Analysis for Elliptically Symmetric Distributions

Arxiv

0+阅读 · 4月15日

Physics and causally constrained discrete-time neural models of turbulent dynamical systems

Arxiv

0+阅读 · 4月13日

An Empirical Comparison of Methods for Quantifying the Similarity of Categorical Datasets

Arxiv

0+阅读 · 4月13日

Maximum-of-Differences Test for Comparing Multivariate K-Sample Distributions

Arxiv

0+阅读 · 4月10日

Multiscale topology optimization of compressible and nearly incompressible anisotropic hyperelastic structures using physics-augmented neural networks

Arxiv

0+阅读 · 4月7日

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

Arxiv

0+阅读 · 4月5日

The Manipulate-and-Observe Attack on Quantum Key Distribution

Arxiv

0+阅读 · 3月31日

Synthetic Data, Information, and Prior Knowledge: Why Synthetic Data Augmentation to Boost Sample Doesn't Work for Statistical Inference

Arxiv

0+阅读 · 3月18日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

6+阅读 · 5月5日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

2+阅读 · 5月5日

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

2+阅读 · 5月5日

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

3+阅读 · 5月5日

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

4+阅读 · 5月5日

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

10+阅读 · 5月5日

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

3+阅读 · 5月5日

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

5+阅读 · 5月5日

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

2+阅读 · 5月5日

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

2+阅读 · 5月5日

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

11+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

8+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

8+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

6+阅读 · 5月4日

相关VIP内容

AAAI 2026 | 构建模板-定理知识图谱以增强大语言模型的数学推理能力

AAAI 2026 | 构建模板-定理知识图谱以增强大语言模型的数学推理能力

专知会员服务

20+阅读 · 1月17日

【NeurIPS2022】序列(推荐)模型分布外泛化：因果视角与求解

【NeurIPS2022】序列(推荐)模型分布外泛化：因果视角与求解

专知会员服务

14+阅读 · 2022年12月11日

【MIT博士论文】分层概率多模态数据融合研究进展，289页pdf

【MIT博士论文】分层概率多模态数据融合研究进展，289页pdf

专知会员服务

76+阅读 · 2022年9月6日

【NeurIPS2021】SOLQ：基于学习查询的物体分割

【NeurIPS2021】SOLQ：基于学习查询的物体分割

专知会员服务

10+阅读 · 2021年11月9日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

中科院发布最新迁移学习综述论文，带你全面了解40种迁移学习方法

中科院发布最新迁移学习综述论文，带你全面了解40种迁移学习方法

专知会员服务

154+阅读 · 2019年11月19日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

【伯克利马毅老师等重磅新书】低维模型进行高维数据分析:原理、计算和应用，710页pdf

【伯克利马毅老师等重磅新书】低维模型进行高维数据分析:原理、计算和应用，710页pdf

专知

45+阅读 · 2020年12月9日

KG 高引论文解读两篇 | 两种模型：多层卷积神经网络、知识感知路径递归网络

KG 高引论文解读两篇 | 两种模型：多层卷积神经网络、知识感知路径递归网络

学术头条

18+阅读 · 2019年12月8日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

AINLP

25+阅读 · 2019年8月21日

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

AINLP

15+阅读 · 2019年8月12日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新八篇主题模型相关论文—在线光谱学习、PAM变分推断、章节推荐、多芯片系统、文本分析、动态主题模型

【论文推荐】最新八篇主题模型相关论文—在线光谱学习、PAM变分推断、章节推荐、多芯片系统、文本分析、动态主题模型

专知

12+阅读 · 2018年5月6日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

读书报告 | Deep Learning for Extreme Multi-label Text Classification

读书报告 | Deep Learning for Extreme Multi-label Text Classification

科技创新与创业

48+阅读 · 2018年1月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Random-Effects Algorithm for Random Objects in Metric Spaces

Arxiv

0+阅读 · 5月4日

Estimating Joint Interventional Distributions from Marginal Interventional Data

Arxiv

0+阅读 · 4月17日

High-Dimensional Data Analysis for Elliptically Symmetric Distributions

Arxiv

0+阅读 · 4月15日

Physics and causally constrained discrete-time neural models of turbulent dynamical systems

Arxiv

0+阅读 · 4月13日

An Empirical Comparison of Methods for Quantifying the Similarity of Categorical Datasets

Arxiv

0+阅读 · 4月13日

Maximum-of-Differences Test for Comparing Multivariate K-Sample Distributions

Arxiv

0+阅读 · 4月10日

Multiscale topology optimization of compressible and nearly incompressible anisotropic hyperelastic structures using physics-augmented neural networks

Arxiv

0+阅读 · 4月7日

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

Arxiv

0+阅读 · 4月5日

The Manipulate-and-Observe Attack on Quantum Key Distribution

Arxiv

0+阅读 · 3月31日

Synthetic Data, Information, and Prior Knowledge: Why Synthetic Data Augmentation to Boost Sample Doesn't Work for Statistical Inference

Arxiv

0+阅读 · 3月18日

相关基金

有效融合多源异构数据的集成分类器研究

国家自然科学基金

5+阅读 · 2015年12月31日

基于稳健估计方程的复杂纵向数据研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维不平衡数据的集成学习算法研究

国家自然科学基金

16+阅读 · 2015年12月31日

基于相依数据的梯度学习理论研究

国家自然科学基金

1+阅读 · 2015年12月31日

复杂数据模型中的分布逼近方法

国家自然科学基金

3+阅读 · 2014年12月31日

大规模轨迹数据的地理空间关联解译及分析挖掘研究

国家自然科学基金

1+阅读 · 2014年12月31日

融合式多址通信网络理论与控制协议研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于相关性的大数据分类理论与方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

信号时频分析与包络的数学模型

国家自然科学基金

1+阅读 · 2014年12月31日

某些分形集上拉普拉斯算子的谱分析及相关问题

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员