Deep learning methods have demonstrated outstanding performances on classification and regression tasks on homogeneous data types (e.g., image, audio, and text data). However, tabular data still pose a challenge, with classic machine learning approaches being often computationally cheaper and equally effective than increasingly complex deep learning architectures. The challenge arises from the fact that, in tabular data, the correlation among features is weaker than the one from spatial or semantic relationships in images or natural language, and the dependency structures need to be modeled without any prior information. In this work, we propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations to gain relational information from sparse tabular inputs. The resulting model leverages the power of convolution and is centered on a limited number of concepts from network topology to guarantee: (i) a data-centric and deterministic building pipeline; (ii) a high level of interpretability over the inference process; and (iii) an adequate room for scalability. We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models, demonstrating that our approach reaches state-of-the-art performances on these challenging datasets. The code to reproduce all our experiments is provided at https://github.com/FinancialComputingUCL/HomologicalCNN.
翻译:深度学习方法在同质数据类型(如图像、音频和文本数据)的分类与回归任务中展现出卓越性能。然而,表格数据仍构成挑战,经典机器学习方法通常比日益复杂的深度学习架构在计算上更高效且效果相当。这一挑战源于表格数据中特征间的相关性弱于图像或自然语言中的空间或语义关联,且需要在无先验信息的情况下对依赖结构进行建模。本研究提出一种新型深度学习架构,通过拓扑约束的网络表示挖掘数据结构组织特性,从而从稀疏表格输入中获取关系信息。该模型利用卷积的强大能力,并基于网络拓扑学中的有限概念,确保:(i)以数据为中心的确定性构建流程;(ii)推理过程的高度可解释性;及(iii)充分的扩展空间。我们在18个基准数据集上,将本模型与5种经典机器学习模型及3种深度学习模型进行对比测试,证明该方法在这些具有挑战性的数据集上达到了最先进性能。重现所有实验的代码发布在https://github.com/FinancialComputingUCL/HomologicalCNN。