Kernel Two-Sample Tests for Manifold Data - 专知论文

会员服务 ·

0

流形 · 样本 · 低维流形 · 统计量 · 高维 ·

2023 年 3 月 25 日

Kernel Two-Sample Tests for Manifold Data

翻译：核流形数据的两样本检验

Xiuyuan Cheng,Yao Xie

We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, we show that when data densities are supported on a $d$-dimensional sub-manifold $\mathcal{M}$ embedded in an $m$-dimensional space, the kernel two-sample test for data sampled from a pair of distributions $p$ and $q$ that are H\"older with order $\beta$ (up to 2) is powerful when the number of samples $n$ is large such that $\Delta_2 \gtrsim n^{- { 2 \beta/( d + 4 \beta ) }}$, where $\Delta_2$ is the squared $L^2$-divergence between $p$ and $q$ on manifold. We establish a lower bound on the test power for finite $n$ that is sufficiently large, where the kernel bandwidth parameter $\gamma$ scales as $n^{-1/(d+4\beta)}$. The analysis extends to cases where the manifold has a boundary, and the data samples contain high-dimensional additive noise. Our results indicate that the kernel two-sample test does not have a curse-of-dimensionality when the data lie on or near a low-dimensional manifold. We validate our theory and the properties of the kernel test for manifold data through a series of numerical experiments.

翻译：我们针对流形数据场景下基于核的两样本检验统计量开展了研究，该统计量与最大均值差异（MMD）相关，并假设高维观测数据逼近低维流形。我们刻画了检验水平和检验功效与核带宽、样本数量及流形内在维度的关系。具体而言，当数据密度支撑在嵌入$m$维空间的$d$维子流形$\mathcal{M}$上，且采样自阶数为$\beta$（最高2阶）的Hölder连续分布对$p$和$q$时，核两样本检验在样本量$n$足够大且满足$\Delta_2 \gtrsim n^{-2\beta/(d+4\beta)}$（其中$\Delta_2$为流形上$p$与$q$之间的平方$L^2$散度）时具有检验功效。对于充分大的有限样本量$n$，我们建立了检验功效的下界，此时核带宽参数$\gamma$的尺度为$n^{-1/(d+4\beta)}$。该分析可扩展至流形存在边界以及数据样本包含高维加性噪声的情形。我们的结果表明：当数据位于或接近低维流形时，核两样本检验不存在维数灾难。通过系列数值实验验证了核检验用于流形数据的理论性质。

0

相关内容

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

专知会员服务

19+阅读 · 2022年2月2日

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【CVPR2020】CONSAC: 基于条件样本一致性的稳健多模型拟合，Robust Multi-Model Fitting by Conditional Sample Consensus

【CVPR2020】CONSAC: 基于条件样本一致性的稳健多模型拟合，Robust Multi-Model Fitting by Conditional Sample Consensus

专知会员服务

32+阅读 · 2020年2月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

概率论和机器学习中的不等式

概率论和机器学习中的不等式

PaperWeekly

3+阅读 · 2022年11月9日

从NeurIPS 2022看域泛化：大规模实验分析和模型平均

从NeurIPS 2022看域泛化：大规模实验分析和模型平均

PaperWeekly

0+阅读 · 2022年10月23日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

全面讨论泛化 (generalization) 和正则化 (regularization) — Part 1

全面讨论泛化 (generalization) 和正则化 (regularization) — Part 1

PaperWeekly

0+阅读 · 2022年5月25日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

带时滞随机动力系统不变流形的光滑性

国家自然科学基金

0+阅读 · 2015年12月31日

Berezin变换及相关的算子理论

国家自然科学基金

1+阅读 · 2014年12月31日

两样本稀疏不平衡观测的纵向数据中的检验问题

国家自然科学基金

1+阅读 · 2013年12月31日

欧氏三维空间中公路线形微分几何参数安全特征及一致性评价方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

大维随机矩阵经验谱分布函数的收敛以及统计推断

国家自然科学基金

0+阅读 · 2013年12月31日

与薛定鄂算子和多线性算子相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

典型黎曼流形与子流形的分类研究

国家自然科学基金

0+阅读 · 2012年12月31日

截面相依数据的建模、理论及应用

国家自然科学基金

1+阅读 · 2012年12月31日

相关于算子的Orlicz-型函数空间的实变理论

国家自然科学基金

0+阅读 · 2011年12月31日

Manifold Regularized Tucker Decomposition Approach for Spatiotemporal Traffic Data Imputation

Arxiv

0+阅读 · 2023年5月16日

Online Continual Learning Without the Storage Constraint

Arxiv

0+阅读 · 2023年5月16日

Learning on Manifolds: Universal Approximations Properties using Geometric Controllability Conditions for Neural ODEs

Arxiv

0+阅读 · 2023年5月15日

A multilinear HJB-POD method for the optimal control of PDEs

A multilinear HJB-POD method for the optimal control of PDEs

Arxiv

0+阅读 · 2023年5月15日

Generalized Kernel Two-Sample Tests

Arxiv

0+阅读 · 2023年5月14日

Guided Deep Kernel Learning

Arxiv

0+阅读 · 2023年5月14日

Sampling recovery in the uniform norm

Arxiv

0+阅读 · 2023年5月12日

Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

Arxiv

0+阅读 · 2023年5月12日

Random Smoothing Regularization in Kernel Gradient Descent Learning

Arxiv

0+阅读 · 2023年5月12日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

6+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

5+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

7+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

7+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

9+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

8+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

8+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

9+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

11+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

10+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

7+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

10+阅读 · 6月24日

相关VIP内容

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

专知会员服务

19+阅读 · 2022年2月2日

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【CVPR2020】CONSAC: 基于条件样本一致性的稳健多模型拟合，Robust Multi-Model Fitting by Conditional Sample Consensus

【CVPR2020】CONSAC: 基于条件样本一致性的稳健多模型拟合，Robust Multi-Model Fitting by Conditional Sample Consensus

专知会员服务

32+阅读 · 2020年2月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

概率论和机器学习中的不等式

概率论和机器学习中的不等式

PaperWeekly

3+阅读 · 2022年11月9日

从NeurIPS 2022看域泛化：大规模实验分析和模型平均

从NeurIPS 2022看域泛化：大规模实验分析和模型平均

PaperWeekly

0+阅读 · 2022年10月23日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

全面讨论泛化 (generalization) 和正则化 (regularization) — Part 1

全面讨论泛化 (generalization) 和正则化 (regularization) — Part 1

PaperWeekly

0+阅读 · 2022年5月25日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Manifold Regularized Tucker Decomposition Approach for Spatiotemporal Traffic Data Imputation

Arxiv

0+阅读 · 2023年5月16日

Online Continual Learning Without the Storage Constraint

Arxiv

0+阅读 · 2023年5月16日

Learning on Manifolds: Universal Approximations Properties using Geometric Controllability Conditions for Neural ODEs

Arxiv

0+阅读 · 2023年5月15日

A multilinear HJB-POD method for the optimal control of PDEs

A multilinear HJB-POD method for the optimal control of PDEs

Arxiv

0+阅读 · 2023年5月15日

Generalized Kernel Two-Sample Tests

Arxiv

0+阅读 · 2023年5月14日

Guided Deep Kernel Learning

Arxiv

0+阅读 · 2023年5月14日

Sampling recovery in the uniform norm

Arxiv

0+阅读 · 2023年5月12日

Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

Arxiv

0+阅读 · 2023年5月12日

Random Smoothing Regularization in Kernel Gradient Descent Learning

Arxiv

0+阅读 · 2023年5月12日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

带时滞随机动力系统不变流形的光滑性

国家自然科学基金

0+阅读 · 2015年12月31日

Berezin变换及相关的算子理论

国家自然科学基金

1+阅读 · 2014年12月31日

两样本稀疏不平衡观测的纵向数据中的检验问题

国家自然科学基金

1+阅读 · 2013年12月31日

欧氏三维空间中公路线形微分几何参数安全特征及一致性评价方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

大维随机矩阵经验谱分布函数的收敛以及统计推断

国家自然科学基金

0+阅读 · 2013年12月31日

与薛定鄂算子和多线性算子相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

典型黎曼流形与子流形的分类研究

国家自然科学基金

0+阅读 · 2012年12月31日

截面相依数据的建模、理论及应用

国家自然科学基金

1+阅读 · 2012年12月31日

相关于算子的Orlicz-型函数空间的实变理论

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员