The paper proposes the Quantum-SMOTE method, a novel solution that uses quantum computing techniques to solve the prevalent problem of class imbalance in machine learning datasets. Quantum-SMOTE, inspired by the Synthetic Minority Oversampling Technique (SMOTE), generates synthetic data points using quantum processes such as swap tests and quantum rotation. The process varies from the conventional SMOTE algorithm's usage of K-Nearest Neighbors (KNN) and Euclidean distances, enabling synthetic instances to be generated from minority class data points without relying on neighbor proximity. The algorithm asserts greater control over the synthetic data generation process by introducing hyperparameters such as rotation angle, minority percentage, and splitting factor, which allow for customization to specific dataset requirements. The approach is tested on a public dataset of TelecomChurn and evaluated alongside two prominent classification algorithms, Random Forest and Logistic Regression, to determine its impact along with varying proportions of synthetic data.
翻译:本文提出量子SMOTE方法,这是一种利用量子计算技术解决机器学习数据集中普遍存在的类不平衡问题的新型解决方案。量子SMOTE受合成少数类过采样技术(SMOTE)启发,通过交换测试和量子旋转等量子过程生成合成数据点。该方法与传统SMOTE算法使用K近邻(KNN)和欧氏距离的机制不同,无需依赖邻近距离即可从少数类数据点生成合成样本。该算法通过引入旋转角度、少数类比例和分裂因子等超参数,实现了对合成数据生成过程的更强控制,并可根据具体数据集需求进行定制。该方法在TelecomChurn公开数据集上进行了测试,并与两种主流分类算法——随机森林和逻辑回归——进行了对比评估,以分析不同合成数据比例对其影响。