Modern machine learning systems, such as generative models and recommendation systems, often evolve through a cycle of deployment, user interaction, and periodic model updates. This differs from standard supervised learning frameworks, which focus on loss or regret minimization over a fixed sequence of prediction tasks. Motivated by this setting, we revisit the classical model of learning from equivalence queries, introduced by Angluin (1988). In this model, a learner repeatedly proposes hypotheses and, when a deployed hypothesis is inadequate, receives a counterexample. Under fully adversarial counterexample generation, however, the model can be overly pessimistic. In addition, most prior work assumes a \emph{full-information} setting, where the learner also observes the correct label of the counterexample, an assumption that is not always natural. We address these issues by restricting the environment to a broad class of less adversarial counterexample generators, which we call \emph{symmetric}. Informally, such generators choose counterexamples based only on the symmetric difference between the hypothesis and the target. This class captures natural mechanisms such as random counterexamples (Angluin and Dohrn, 2017; Bhatia, 2021; Chase, Freitag, and Reyzin, 2024), as well as generators that return the simplest counterexample according to a prescribed complexity measure. Within this framework, we study learning from equivalence queries under both full-information and bandit feedback. We obtain tight bounds on the number of learning rounds in both settings and highlight directions for future work. Our analysis combines a game-theoretic view of symmetric adversaries with adaptive weighting methods and minimax arguments.
翻译:现代机器学习系统(如生成模型和推荐系统)通常通过部署、用户交互和周期性模型更新的循环来演化。这与标准监督学习框架不同,后者关注在固定预测任务序列上最小化损失或遗憾。受此场景启发,我们重访了Angluin(1988)提出的经典等价查询学习模型。在该模型中,学习器反复提出假设,当部署的假设不充分时,会收到反例。然而,在完全对抗性的反例生成条件下,该模型可能过于悲观。此外,大多数先前研究假设一个"全信息"设置,其中学习器还能观察到反例的正确标签,这一假设并非总是自然的。我们通过将环境限制为一类较弱的对抗性反例生成器(称为"对称生成器")来解决这些问题。非正式地说,这种生成器仅基于假设与目标之间的对称差来选择反例。该类生成器包含自然机制,如随机反例(Angluin和Dohrn, 2017; Bhatia, 2021; Chase, Freitag和Reyzin, 2024),以及根据指定复杂度度量返回最简单反例的生成器。在此框架内,我们研究了全信息和赌博机反馈两种设置下的等价查询学习。我们在两种设置下获得了学习轮次数的紧界,并指出了未来工作的方向。我们的分析将对称对抗者的博弈视角与自适应加权方法和极小极大论证相结合。