Automated feature engineering (AutoFE) is used to automatically create new features from original features to improve predictive performance without needing significant human intervention and expertise. Many algorithms exist for AutoFE, but very few approaches exist for the federated learning (FL) setting where data is gathered across many clients and is not shared between clients or a central server. We introduce AutoFE algorithms for the horizontal, vertical, and hybrid FL settings, which differ in how the data is gathered across clients. To the best of our knowledge, we are the first to develop AutoFE algorithms for the horizontal and hybrid FL cases, and we show that the downstream model performance of federated AutoFE is similar to the case where data is held centrally and AutoFE is performed centrally.
翻译:自动特征工程(AutoFE)用于从原始特征自动创建新特征,以提升预测性能,无需大量人工干预和专业知识。目前存在多种AutoFE算法,但针对联邦学习(FL)场景的方法极少——在联邦学习中,数据分布于多个客户端,且不在客户端之间或与中央服务器共享。本文针对水平、垂直和混合联邦学习场景提出了AutoFE算法,这些场景的区别在于数据在客户端间的分布方式。据我们所知,我们是首个为水平和混合联邦学习场景开发AutoFE算法的研究,并证明联邦AutoFE的下游模型性能与数据集中存储并集中执行AutoFE的情况相近。