A new random forest based model for solving the Multiple Instance Learning (MIL) problem under small tabular data, called Soft Tree Ensemble MIL (STE-MIL), is proposed. A new type of soft decision trees is considered, which is similar to the well-known soft oblique trees, but with a smaller number of trainable parameters. In order to train the trees, it is proposed to convert them into neural networks of a specific form, which approximate the tree functions. It is also proposed to aggregate the instance and bag embeddings (output vectors) by using the attention mechanism. The whole STE-MIL model, including soft decision trees, neural networks, the attention mechanism and a classifier, is trained in an end-to-end manner. Numerical experiments with tabular datasets illustrate STE-MIL. The corresponding code implementing the model is publicly available.
翻译:提出了一种名为软树集成多实例学习(STE-MIL)的新随机森林模型,用于解决小表格数据下的多实例学习(MIL)问题。该模型考虑了一种新型软决策树,其与著名的软斜树类似,但具有更少的可训练参数。为训练这些树,本文提出将其转化为特定形式的神经网络,以近似树的函数。同时,建议通过注意力机制聚合实例和包嵌入(输出向量)。整个STE-MIL模型(包括软决策树、神经网络、注意力机制及分类器)采用端到端方式进行训练。基于表格数据集的数值实验验证了STE-MIL的性能,模型对应的实现代码已公开。