We propose methods for density estimation and data synthesis using a novel form of unsupervised random forests. Inspired by generative adversarial networks, we implement a recursive procedure in which trees gradually learn structural properties of the data through alternating rounds of generation and discrimination. The method is provably consistent under minimal assumptions. Unlike classic tree-based alternatives, our approach provides smooth (un)conditional densities and allows for fully synthetic data generation. We achieve comparable or superior performance to state-of-the-art probabilistic circuits and deep learning models on various tabular data benchmarks while executing about two orders of magnitude faster on average. An accompanying $\texttt{R}$ package, $\texttt{arf}$, is available on $\texttt{CRAN}$.
翻译:本文提出了一种利用新型无监督随机森林进行密度估计与数据合成的方法。受生成对抗网络启发,我们实现了一种递归流程,其中决策树通过交替进行生成与判别轮次,逐步学习数据的结构特性。该方法在极小假设条件下具有可证明的一致性。与经典基于树的替代方案不同,我们的方法能够提供平滑的(非)条件密度,并支持完全合成数据生成。在多个表格数据基准测试中,我们取得了与最先进的概率电路及深度学习模型相当或更优的性能,同时平均执行速度快约两个数量级。配套的$\texttt{R}$语言软件包$\texttt{arf}$已发布于$\texttt{CRAN}$。