GCondNet: A Novel Method for Improving Neural Networks on Small High-Dimensional Tabular Data

Neural network models often struggle with high-dimensional but small sample-size tabular datasets. One reason is that current weight initialisation methods assume independence between weights, which can be problematic when there are insufficient samples to estimate the model's parameters accurately. In such small data scenarios, leveraging additional structures can improve the model's training stability and performance. To address this, we propose GCondNet, a general approach to enhance neural networks by leveraging implicit structures present in tabular data. We create a graph between samples for each data dimension, and utilise Graph Neural Networks (GNNs) for extracting this implicit structure, and for conditioning the parameters of the first layer of an underlying predictor MLP network. By creating many small graphs, GCondNet exploits the data's high-dimensionality, and thus improves the performance of an underlying predictor network. We demonstrate the effectiveness of our method on nine real-world datasets, where GCondNet outperforms 14 standard and state-of-the-art methods. The results show that GCondNet is robust and can be applied to any small sample-size and high-dimensional tabular learning task.

翻译：神经网络模型在处理高维但样本量小的表格数据集时往往表现不佳。其中一个原因是当前权重初始化方法假设权重之间相互独立，这在样本不足以准确估计模型参数时会引发问题。在这种小样本数据场景中，利用额外结构可以提升模型的训练稳定性与性能。为此，我们提出GCondNet——一种通过利用表格数据中隐含结构来增强神经网络的通用方法。我们为每个数据维度在样本之间构建图，并利用图神经网络（GNN）提取这种隐含结构，同时用于调节底层预测器多层感知机（MLP）网络第一层的参数。通过构建大量小规模图，GCondNet充分利用了数据的高维特性，从而提升了底层预测器网络的性能。我们在九个真实数据集上验证了该方法的有效性，结果显示GCondNet优于14种标准与先进方法。实验表明，GCondNet具有鲁棒性，可适用于任何小样本高维表格学习任务。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日