For tabular data generated from IIoT devices, traditional machine learning (ML) techniques based on the decision tree algorithm have been employed. However, these methods have limitations in processing tabular data where real number attributes dominate. To address this issue, DeepInsight, REFINED, and IGTD were proposed to convert tabular data into images for utilizing convolutional neural networks (CNNs). They gather similar features in some specific spots of an image to make the converted image look like an actual image. Gathering similar features contrasts with traditional ML techniques for tabular data, which drops some highly correlated attributes to avoid overfitting. Also, previous converting methods fixed the image size, and there are wasted or insufficient pixels according to the number of attributes of tabular data. Therefore, this paper proposes a new converting method, Vortex Feature Positioning (VFP). VFP considers the correlation of features and places similar features far away from each. Features are positioned in the vortex shape from the center of an image, and the number of attributes determines the image size. VFP shows better test performance than traditional ML techniques for tabular data and previous converting methods in five datasets: Iris, Wine, Dry Bean, Epileptic Seizure, and SECOM, which have differences in the number of attributes.
翻译:针对工业物联网设备生成的表格数据,传统基于决策树算法的机器学习技术已被广泛应用。然而,这类方法在处理以实数值属性为主的表格数据时存在局限性。为解决该问题,研究者提出DeepInsight、REFINED和IGTD等方法,通过将表格数据转换为图像来利用卷积神经网络(CNN)。这些方法通过将相似特征聚集在图像的特定区域,使转换后的图像呈现真实图像的特征。这种特征聚集策略与传统的表格数据机器学习技术形成鲜明对比——后者为规避过拟合而舍弃部分高相关性属性。此外,现有转换方法采用固定图像尺寸,当表格数据属性数量变化时易出现像素冗余或不足的问题。为此,本文提出新型转换方法——涡旋特征定位(VFP)。VFP充分考虑特征之间的相关性,将相似特征置于远离位置。特征以涡旋形状从图像中心向外分布,图像尺寸由属性数量动态决定。在鸢尾花、葡萄酒、干豆、癫痫发作和SECOM这五个属性数量各异的基准数据集上,VFP在测试性能上均优于传统表格数据机器学习技术及现有转换方法。