Machine Learning models thrive on vast datasets, continuously adapting to provide accurate predictions and recommendations. However, in an era dominated by privacy concerns, Machine Unlearning emerges as a transformative approach, enabling the selective removal of data from trained models. This paper examines methods such as Naive Retraining and Exact Unlearning via the SISA framework, evaluating their Computational Costs, Consistency, and feasibility using the $\texttt{HSpam14}$ dataset. We explore the potential of integrating unlearning principles into Positive Unlabeled (PU) Learning to address challenges posed by partially labeled datasets. Our findings highlight the promise of unlearning frameworks like $\textit{DaRE}$ for ensuring privacy compliance while maintaining model performance, albeit with significant computational trade-offs. This study underscores the importance of Machine Unlearning in achieving ethical AI and fostering trust in data-driven systems.
翻译:机器学习模型依赖于海量数据集,通过持续适应以提供精确的预测与推荐。然而,在隐私问题备受关注的时代,机器学习遗忘作为一种变革性方法应运而生,它能够从已训练模型中实现数据的选择性删除。本文研究了基于SISA框架的朴素重训练与精确遗忘等方法,并利用$\\texttt{HSpam14}$数据集评估了它们的计算成本、一致性与可行性。我们探讨了将遗忘原则整合到正例无标注学习中以应对部分标注数据集所带来挑战的潜力。研究结果表明,尽管存在显著的计算代价权衡,但如$\\textit{DaRE}$等遗忘框架在确保隐私合规的同时维持模型性能方面展现出广阔前景。本研究强调了机器学习遗忘在实现伦理人工智能与增强数据驱动系统可信度方面的重要性。