基于再生核希尔伯特空间的复杂抽样设计分布随机森林 (Distributional Random Forests for Complex Survey Designs on Reproducing Kernel Hilbert Spaces) - 专知论文

会员服务 ·

0

设计 · 随机森林 · 再生核希尔伯特空间 · 希尔伯特空间 · 准则 ·

Distributional Random Forests for Complex Survey Designs on Reproducing Kernel Hilbert Spaces

翻译：基于再生核希尔伯特空间的复杂抽样设计分布随机森林

Yating Zou,Marcos Matabuena,Michael R. Kosorok

We study estimation of the conditional law $P(Y|X=x)$ and continuous functionals $Ψ(P(Y|X=x))$ when $Y$ takes values in a locally compact Polish space, $X \in \mathbb{R}^p$, and the observations arise from a complex survey design. We propose a survey-calibrated distributional random forest (SDRF) that incorporates complex-design features via a pseudo-population bootstrap, PSU-level honesty, and a Maximum Mean Discrepancy (MMD) split criterion computed from kernel mean embeddings of Hájek-type (design-weighted) node distributions. We provide a framework for analyzing forest-style estimators under survey designs; establish design consistency for the finite-population target and model consistency for the super-population target under explicit conditions on the design, kernel, resampling multipliers, and tree partitions. As far as we are aware, these are the first results on model-free estimation of conditional distributions under survey designs. Simulations under a stratified two-stage cluster design provide finite sample performance and demonstrate the statistical error price of ignoring the survey design. The broad applicability of SDRF is demonstrated using NHANES: We estimate the tolerance regions of the conditional joint distribution of two diabetes biomarkers, illustrating how distributional heterogeneity can support subgroup-specific risk profiling for diabetes mellitus in the U.S. population.

翻译：本研究探讨了当响应变量Y取值于局部紧致波兰空间、协变量X∈ℝ^p，且观测数据来自复杂抽样设计时，条件分布P(Y|X=x)及其连续泛函Ψ(P(Y|X=x))的估计问题。我们提出了一种经过抽样校准的分布随机森林（SDRF）方法，该方法通过伪总体自助法、初级抽样单元层面的诚实性准则，以及基于Hájek型（设计加权）节点分布的核均值嵌入计算的最大均值差异（MMD）分割准则，将复杂设计特征纳入模型构建。我们建立了分析抽样设计下森林类估计量的理论框架，在明确的设计条件、核函数、重抽样乘子及树划分规则下，证明了该方法对有限总体目标的设计相合性以及对超总体目标的模型相合性。据我们所知，这是在抽样设计框架下首次实现条件分布的无模型估计。通过分层两阶段整群抽样设计的模拟实验，我们评估了该方法的有限样本性能，并量化了忽略抽样设计所带来的统计误差代价。利用美国国家健康与营养调查（NHANES）数据，我们展示了SDRF的广泛适用性：通过估计两种糖尿病生物标志物条件联合分布的容忍区域，揭示了分布异质性如何支持美国糖尿病人群亚组特异性风险画像的构建。

0

相关内容

设计是对现有状的一种重新认识和打破重组的过程，设计让一切变得更美。

《随机森林排列特征在离子迁移光谱特征选择中的重要性》2022最新美国陆军研究实验室24页论文

《随机森林排列特征在离子迁移光谱特征选择中的重要性》2022最新美国陆军研究实验室24页论文

专知会员服务

19+阅读 · 2022年10月28日

【干货书】面向工程师的随机过程，448页pdf

【干货书】面向工程师的随机过程，448页pdf

专知会员服务

80+阅读 · 2021年11月3日

【瑞典林大博士论文】基于高斯马尔可夫随机场的可扩展贝叶斯空间分析，66页pdf

【瑞典林大博士论文】基于高斯马尔可夫随机场的可扩展贝叶斯空间分析，66页pdf

专知会员服务

46+阅读 · 2020年9月19日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

最新《概率分布的希尔伯特空间表示的最新进展》136页ppt与147页电子书

最新《概率分布的希尔伯特空间表示的最新进展》136页ppt与147页电子书

专知会员服务

58+阅读 · 2020年7月13日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

48+阅读 · 2019年12月19日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

异常检测怎么做，试试孤立随机森林算法（附代码）

异常检测怎么做，试试孤立随机森林算法（附代码）

机器之心

16+阅读 · 2020年3月15日

《应用随机微分方程》，324页pdf新书免费分享

《应用随机微分方程》，324页pdf新书免费分享

专知

20+阅读 · 2019年5月6日

推荐：一文读懂随机森林的解释和实现（附python代码）

推荐：一文读懂随机森林的解释和实现（附python代码）

数据分析

38+阅读 · 2018年12月4日

数据分析师应该知道的16种回归方法：泊松回归

数据分析师应该知道的16种回归方法：泊松回归

数萃大数据

35+阅读 · 2018年9月13日

简明条件随机场CRF介绍 | 附带纯Keras实现

简明条件随机场CRF介绍 | 附带纯Keras实现

PaperWeekly

23+阅读 · 2018年5月22日

论文浅尝 | 基于知识图谱的子图匹配回答自然语言问题

论文浅尝 | 基于知识图谱的子图匹配回答自然语言问题

开放知识图谱

27+阅读 · 2018年5月17日

使用随机森林分类器预测森林火灾规模

使用随机森林分类器预测森林火灾规模

论智

13+阅读 · 2018年5月15日

【机器学习基本理论】详解最大似然估计（MLE）、最大后验概率估计（MAP），以及贝叶斯公式的理解

【机器学习基本理论】详解最大似然估计（MLE）、最大后验概率估计（MAP），以及贝叶斯公式的理解

机器学习研究会

19+阅读 · 2018年3月11日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

16+阅读 · 2018年2月13日

再生能源供电分布式无线随机接入关键技术

国家自然科学基金

0+阅读 · 2015年12月31日

分数阶随机共振行为机制及其自适应控制与强色噪声背景中的微弱信号检测

国家自然科学基金

0+阅读 · 2015年12月31日

树上生灭过程收敛速度及p-Laplacian特征值估计

国家自然科学基金

0+阅读 · 2015年12月31日

几类随机指数函数空间的应用

国家自然科学基金

0+阅读 · 2015年12月31日

随机波动率模型的统计推断及数值解

国家自然科学基金

1+阅读 · 2015年12月31日

复杂数据模型中的分布逼近方法

国家自然科学基金

3+阅读 · 2014年12月31日

基于多尺度分析的森林群落木本植物种-面积关系区域分异及其影响因素研究

国家自然科学基金

0+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

基于似然函数的统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

抽样单元空间相关性和变异性对农作物面积空间抽样效率的影响机理研究

国家自然科学基金

1+阅读 · 2014年12月31日

Principled Federated Random Forests for Heterogeneous Data

Arxiv

0+阅读 · 2月3日

Stochastic Interpolants in Hilbert Spaces

Arxiv

0+阅读 · 2月2日

Maximum-likelihood estimation of the Matérn covariance structure of isotropic spatial random fields on finite, sampled grids

Arxiv

0+阅读 · 1月27日

A Generative Approach to Quasi-Random Sampling from Copulas via Space-Filling Designs

Arxiv

0+阅读 · 1月26日

Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift

Arxiv

0+阅读 · 1月23日

Consistency of Honest Decision Trees and Random Forests

Arxiv

0+阅读 · 1月21日

Autoencoding Random Forests

Arxiv

0+阅读 · 1月15日

Hierarchical Importance Sampling for Estimating Occupation Time for SDE Solutions

Arxiv

0+阅读 · 1月14日

Unity Forests: Improving Interaction Modelling and Interpretability in Random Forests

Arxiv

0+阅读 · 1月11日

Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem

Arxiv

0+阅读 · 1月9日

VIP会员

文章信息

相关主题

再生核希尔伯特空间

希尔伯特空间

相关VIP内容

《随机森林排列特征在离子迁移光谱特征选择中的重要性》2022最新美国陆军研究实验室24页论文

《随机森林排列特征在离子迁移光谱特征选择中的重要性》2022最新美国陆军研究实验室24页论文

专知会员服务

19+阅读 · 2022年10月28日

【干货书】面向工程师的随机过程，448页pdf

【干货书】面向工程师的随机过程，448页pdf

专知会员服务

80+阅读 · 2021年11月3日

【瑞典林大博士论文】基于高斯马尔可夫随机场的可扩展贝叶斯空间分析，66页pdf

【瑞典林大博士论文】基于高斯马尔可夫随机场的可扩展贝叶斯空间分析，66页pdf

专知会员服务

46+阅读 · 2020年9月19日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

最新《概率分布的希尔伯特空间表示的最新进展》136页ppt与147页电子书

最新《概率分布的希尔伯特空间表示的最新进展》136页ppt与147页电子书

专知会员服务

58+阅读 · 2020年7月13日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

48+阅读 · 2019年12月19日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

《可信人工智能赋能系统的支柱》

【CMU博士论文】可靠轨迹预测的分层基石：数据、评估与方法

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

相关资讯

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

异常检测怎么做，试试孤立随机森林算法（附代码）

异常检测怎么做，试试孤立随机森林算法（附代码）

机器之心

16+阅读 · 2020年3月15日

《应用随机微分方程》，324页pdf新书免费分享

《应用随机微分方程》，324页pdf新书免费分享

专知

20+阅读 · 2019年5月6日

推荐：一文读懂随机森林的解释和实现（附python代码）

推荐：一文读懂随机森林的解释和实现（附python代码）

数据分析

38+阅读 · 2018年12月4日

数据分析师应该知道的16种回归方法：泊松回归

数据分析师应该知道的16种回归方法：泊松回归

数萃大数据

35+阅读 · 2018年9月13日

简明条件随机场CRF介绍 | 附带纯Keras实现

简明条件随机场CRF介绍 | 附带纯Keras实现

PaperWeekly

23+阅读 · 2018年5月22日

论文浅尝 | 基于知识图谱的子图匹配回答自然语言问题

论文浅尝 | 基于知识图谱的子图匹配回答自然语言问题

开放知识图谱

27+阅读 · 2018年5月17日

使用随机森林分类器预测森林火灾规模

使用随机森林分类器预测森林火灾规模

论智

13+阅读 · 2018年5月15日

【机器学习基本理论】详解最大似然估计（MLE）、最大后验概率估计（MAP），以及贝叶斯公式的理解

【机器学习基本理论】详解最大似然估计（MLE）、最大后验概率估计（MAP），以及贝叶斯公式的理解

机器学习研究会

19+阅读 · 2018年3月11日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

16+阅读 · 2018年2月13日

相关论文

Principled Federated Random Forests for Heterogeneous Data

Arxiv

0+阅读 · 2月3日

Stochastic Interpolants in Hilbert Spaces

Arxiv

0+阅读 · 2月2日

Maximum-likelihood estimation of the Matérn covariance structure of isotropic spatial random fields on finite, sampled grids

Arxiv

0+阅读 · 1月27日

A Generative Approach to Quasi-Random Sampling from Copulas via Space-Filling Designs

Arxiv

0+阅读 · 1月26日

Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift

Arxiv

0+阅读 · 1月23日

Consistency of Honest Decision Trees and Random Forests

Arxiv

0+阅读 · 1月21日

Autoencoding Random Forests

Arxiv

0+阅读 · 1月15日

Hierarchical Importance Sampling for Estimating Occupation Time for SDE Solutions

Arxiv

0+阅读 · 1月14日

Unity Forests: Improving Interaction Modelling and Interpretability in Random Forests

Arxiv

0+阅读 · 1月11日

Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem

Arxiv

0+阅读 · 1月9日

相关基金

再生能源供电分布式无线随机接入关键技术

国家自然科学基金

0+阅读 · 2015年12月31日

分数阶随机共振行为机制及其自适应控制与强色噪声背景中的微弱信号检测

国家自然科学基金

0+阅读 · 2015年12月31日

树上生灭过程收敛速度及p-Laplacian特征值估计

国家自然科学基金

0+阅读 · 2015年12月31日

几类随机指数函数空间的应用

国家自然科学基金

0+阅读 · 2015年12月31日

随机波动率模型的统计推断及数值解

国家自然科学基金

1+阅读 · 2015年12月31日

复杂数据模型中的分布逼近方法

国家自然科学基金

3+阅读 · 2014年12月31日

基于多尺度分析的森林群落木本植物种-面积关系区域分异及其影响因素研究

国家自然科学基金

0+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

基于似然函数的统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

抽样单元空间相关性和变异性对农作物面积空间抽样效率的影响机理研究

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员