Nowadays, many modern applications require heterogeneous tabular data, which is still a challenging task in terms of regression and classification. Many approaches have been proposed to adapt neural networks for this task, but still, boosting and bagging of decision trees are the best-performing methods for this task. In this paper, we show that a binomial initialized neural network can be used effectively on tabular data. The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks. We also show that this initializing schema can be used to jointly train ensembles by adding gradient masking to batch entries and using the binomial initialization for the last layer in a neural network. For this purpose, we modified the hinge binary loss and the soft max loss to make them applicable for joint ensemble training. We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches. In addition, we discuss the limitations and possible further research of our approach for improving the applicability of neural networks to tabular data. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FInitializationNeuronalNetworksTabularData&mode=list
翻译:如今,许多现代应用需要处理异构表格数据,这在回归和分类任务中仍具挑战性。已有多种方法尝试适配神经网络以解决该任务,但决策树的集成学习(如Boosting与Bagging)仍是当前表现最优的解决方案。本文证明,二项式初始化的神经网络可有效应用于表格数据。所提出的方法为神经网络第一隐藏层的初始化提供了一种简洁而有效的方案。我们同时表明,该初始化方案可通过向批次条目添加梯度掩码、并对神经网络最后一层采用二项式初始化,实现集成模型的联合训练。为此,我们修改了铰链二值损失和Softmax损失函数,使其适用于联合集成训练。我们在多个公开数据集上评估了该方法,并展示了相较于其他基于神经网络的方法的性能提升。此外,我们讨论了该方法在提升神经网络对表格数据适用性方面的局限性及未来研究方向。链接:https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FInitializationNeuronalNetworksTabularData&mode=list