可扩展的树集成邻近度计算在Python中的实现 (Scalable Tree Ensemble Proximities in Python) - 专知论文

会员服务 ·

0

集成 · 内存 · Python · 度量 · 碰撞 ·

Scalable Tree Ensemble Proximities in Python

翻译：可扩展的树集成邻近度计算在Python中的实现

Adrien Aumon,Guy Wolf,Kevin R. Moon,Jake S. Rhodes

Tree ensemble methods such as Random Forests naturally induce supervised similarity measures through their decision tree structure, but existing implementations of proximities derived from tree ensembles typically suffer from quadratic time or memory complexity, limiting their scalability. In this work, we introduce a general framework for efficient proximity computation by defining a family of Separable Weighted Leaf-Collision Proximities. We show that any proximity measure in this family admits an exact sparse matrix factorization, restricting computation to leaf-level collisions and avoiding explicit pairwise comparisons. This formulation enables low-memory, scalable proximity computation using sparse linear algebra in Python. Empirical benchmarks demonstrate substantial runtime and memory improvements over traditional approaches, allowing tree ensemble proximities to scale efficiently to datasets with hundreds of thousands of samples on standard CPU hardware.

翻译：随机森林等树集成方法通过其决策树结构自然诱导出监督相似性度量，但现有基于树集成的邻近度实现通常存在二次时间复杂度或内存复杂度问题，限制了其可扩展性。本研究通过定义一类可分离加权叶节点碰撞邻近度，提出了高效邻近度计算的通用框架。我们证明该族中的任何邻近度度量都允许精确的稀疏矩阵分解，将计算限制在叶节点碰撞层面，避免显式的成对比较。该公式支持在Python中使用稀疏线性代数进行低内存、可扩展的邻近度计算。实证基准测试表明，与传统方法相比，该方法在运行时间和内存使用方面均有显著改进，使得树集成邻近度计算能够在标准CPU硬件上高效扩展至数十万样本规模的数据集。

0

相关内容

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

专知会员服务

21+阅读 · 2022年2月12日

【AAAI2021】近似梯度下降的学习图神经网络

专知会员服务

20+阅读 · 2020年12月9日

【ICLR 2020】基于组合的多关系图卷积网络 Composition-Based Multi-Relational Graph Convolutional Networks

【ICLR 2020】基于组合的多关系图卷积网络 Composition-Based Multi-Relational Graph Convolutional Networks

专知会员服务

108+阅读 · 2020年3月29日

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【CVPR2021】跨模态检索的概率嵌入

【CVPR2021】跨模态检索的概率嵌入

专知

17+阅读 · 2021年3月2日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

低维有限典型群与线传递2-(v,k,1)设计

国家自然科学基金

0+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

Semismooth Newton Augmented Lagrangian Algorithm for Adaptive Lasso Penalized Least Squares in Semiparametric Regression

Arxiv

0+阅读 · 1月4日

Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees

Arxiv

0+阅读 · 1月3日

Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability

Arxiv

0+阅读 · 1月2日

Variable Elimination in Hybrid Factor Graphs for Discrete-Continuous Inference & Estimation

Arxiv

0+阅读 · 1月2日

From Continual Learning to SGD and Back: Better Rates for Continual Linear Models

Arxiv

0+阅读 · 1月1日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

专知会员服务

21+阅读 · 2022年2月12日

【AAAI2021】近似梯度下降的学习图神经网络

专知会员服务

20+阅读 · 2020年12月9日

【ICLR 2020】基于组合的多关系图卷积网络 Composition-Based Multi-Relational Graph Convolutional Networks

【ICLR 2020】基于组合的多关系图卷积网络 Composition-Based Multi-Relational Graph Convolutional Networks

专知会员服务

108+阅读 · 2020年3月29日

热门VIP内容

开通专知VIP会员享更多权益服务

DeepSeek突然更新R1论文：暴增64页，能公开的全公开了

《网络化部队中的任务式指挥：近期美海军与空军条令及作战概念对任务式指挥的采纳》最新报告

【ETZH博士论文】语言模型编程

智能体化人工智能 (Agentic AI) 的前行之路：挑战与机遇

相关资讯

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【CVPR2021】跨模态检索的概率嵌入

【CVPR2021】跨模态检索的概率嵌入

专知

17+阅读 · 2021年3月2日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

相关论文

Semismooth Newton Augmented Lagrangian Algorithm for Adaptive Lasso Penalized Least Squares in Semiparametric Regression

Arxiv

0+阅读 · 1月4日

Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees

Arxiv

0+阅读 · 1月3日

Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability

Arxiv

0+阅读 · 1月2日

Variable Elimination in Hybrid Factor Graphs for Discrete-Continuous Inference & Estimation

Arxiv

0+阅读 · 1月2日

From Continual Learning to SGD and Back: Better Rates for Continual Linear Models

Arxiv

0+阅读 · 1月1日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

低维有限典型群与线传递2-(v,k,1)设计

国家自然科学基金

0+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员