Membership Inference Attacks (MIAs) serve as a fundamental auditing tool for evaluating training data leakage in machine learning models. However, existing methodologies predominantly rely on static, handcrafted heuristics that lack adaptability, often leading to suboptimal performance when transferred across different large models. In this work, we propose AutoMIA, an agentic framework that reformulates membership inference as an automated process of self-exploration and strategy evolution. Given high-level scenario specifications, AutoMIA self-explores the attack space by generating executable logits-level strategies and progressively refining them through closed-loop evaluation feedback. By decoupling abstract strategy reasoning from low-level execution, our framework enables a systematic, model-agnostic traversal of the attack search space. Extensive experiments demonstrate that AutoMIA consistently matches or outperforms state-of-the-art baselines while eliminating the need for manual feature engineering.
翻译:成员推断攻击(Membership Inference Attacks,MIAs)是评估机器学习模型训练数据泄露的基础审计工具。然而,现有方法主要依赖静态的人工设计启发式规则,缺乏适应性,在跨不同大型模型迁移时往往导致性能欠佳。本文提出AutoMIA——一种将成员推断重构为自动化自我探索与策略演化过程的智能体框架。在给定高层场景规范后,AutoMIA通过生成可执行的logits级策略并经由闭环评估反馈逐步优化,自主探索攻击空间。通过将抽象策略推理与底层执行解耦,本框架实现了攻击搜索空间的系统化、模型无关遍历。大量实验证明,AutoMIA在消除手动特征工程需求的同时,持续达到或超越现有最佳基线方法的性能。