FMLFS: A Federated Multi-Label Feature Selection Based on Information Theory in IoT Environment

In certain emerging applications such as health monitoring wearable and traffic monitoring systems, Internet-of-Things (IoT) devices generate or collect a huge amount of multi-label datasets. Within these datasets, each instance is linked to a set of labels. The presence of noisy, redundant, or irrelevant features in these datasets, along with the curse of dimensionality, poses challenges for multi-label classifiers. Feature selection (FS) proves to be an effective strategy in enhancing classifier performance and addressing these challenges. Yet, there is currently no existing distributed multi-label FS method documented in the literature that is suitable for distributed multi-label datasets within IoT environments. This paper introduces FMLFS, the first federated multi-label feature selection method. Here, mutual information between features and labels serves as the relevancy metric, while the correlation distance between features, derived from mutual information and joint entropy, is utilized as the redundancy measure. Following aggregation of these metrics on the edge server and employing Pareto-based bi-objective and crowding distance strategies, the sorted features are subsequently sent back to the IoT devices. The proposed method is evaluated through two scenarios: 1) transmitting reduced-size datasets to the edge server for centralized classifier usage, and 2) employing federated learning with reduced-size datasets. Evaluation across three metrics - performance, time complexity, and communication cost - demonstrates that FMLFS outperforms five other comparable methods in the literature and provides a good trade-off on three real-world datasets.

翻译：在某些新兴应用（如健康监测可穿戴设备和交通监控系统）中，物联网设备生成或收集海量多标签数据集。在这些数据集中，每个实例都与一组标签相关联。数据集中存在的噪声、冗余或不相关特征，以及维度灾难问题，给多标签分类器带来了挑战。特征选择被证明是提升分类器性能并应对这些挑战的有效策略。然而，目前文献中尚无适用于物联网环境下分布式多标签数据集的分布式多标签特征选择方法。本文提出了FMLFS，即首个联邦多标签特征选择方法。该方法以特征与标签间的互信息作为相关性度量，并利用基于互信息与联合熵推导出的特征间相关距离作为冗余性度量。在边缘服务器上聚合这些度量指标后，采用基于帕累托的双目标与拥挤距离策略对特征排序，随后将排序后的特征发回物联网设备。通过两种场景对所提方法进行评估：1) 将降维后的数据集传输至边缘服务器供集中式分类器使用；2) 在降维后的数据集上采用联邦学习。通过在性能、时间复杂度和通信成本三个指标上的评估表明，FMLFS在三个真实数据集上优于文献中其他五种可比方法，并实现了良好的权衡。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日