We study the theoretical properties of the interactive learning protocol Discriminative Feature Feedback (DFF) (Dasgupta et al., 2018). The DFF learning protocol uses feedback in the form of discriminative feature explanations. We provide the first systematic study of DFF in a general framework that is comparable to that of classical protocols such as supervised learning and online learning. We study the optimal mistake bound of DFF in the realizable and the non-realizable settings, and obtain novel structural results, as well as insights into the differences between Online Learning and settings with richer feedback such as DFF. We characterize the mistake bound in the realizable setting using a new notion of dimension. In the non-realizable setting, we provide a mistake upper bound and show that it cannot be improved in general. Our results show that unlike Online Learning, in DFF the realizable dimension is insufficient to characterize the optimal non-realizable mistake bound or the existence of no-regret algorithms.
翻译:我们研究了交互式学习协议判别性特征反馈(DFF)(Dasgupta等人,2018)的理论性质。DFF学习协议使用判别性特征解释形式的反馈。我们在一个可与监督学习和在线学习等经典协议框架相媲美的通用框架中,首次对DFF进行了系统性研究。我们研究了DFF在可实现与非可实现设置中的最优错误界,获得了新颖的结构性结果,并对在线学习与DFF等反馈更丰富的设置之间的差异提出了见解。我们使用一个新的维度概念刻画了可实现设置中的错误界。在非可实现设置中,我们给出了一个错误上界,并证明该上界在一般情况下无法改进。我们的结果表明,与在线学习不同,在DFF中可实现维度不足以刻画最优的非可实现错误界或无悔算法的存在性。