Several machine learning (ML) applications are characterized by searching for an optimal solution to a complex task. The search space for this optimal solution is often very large, so large in fact that this optimal solution is often not computable. Part of the problem is that many candidate solutions found via ML are actually infeasible and have to be discarded. Restricting the search space to only the feasible solution candidates simplifies finding an optimal solution for the tasks. Further, the set of feasible solutions could be re-used in multiple problems characterized by different tasks. In particular, we observe that complex tasks can be decomposed into subtasks and corresponding skills. We propose to learn a reusable and transferable skill by training an actor to generate all feasible actions. The trained actor can then propose feasible actions, among which an optimal one can be chosen according to a specific task. The actor is trained by interpreting the feasibility of each action as a target distribution. The training procedure minimizes a divergence of the actor's output distribution to this target. We derive the general optimization target for arbitrary f-divergences using a combination of kernel density estimates, resampling, and importance sampling. We further utilize an auxiliary critic to reduce the interactions with the environment. A preliminary comparison to related strategies shows that our approach learns to visit all the modes in the feasible action space, demonstrating the framework's potential for learning skills that can be used in various downstream tasks.
翻译:若干机器学习应用的特点是在复杂任务中寻找最优解。寻找最优解的搜索空间通常非常庞大,以至于最优解往往无法计算得出。问题部分在于,通过机器学习找到的许多候选解实际上是不可行的,必须被丢弃。将搜索空间限制在仅包含可行解候选范围内,可以简化任务最优解的寻找过程。此外,可行解集可以在以不同任务为特征的多个问题中重复使用。特别地,我们观察到复杂任务可以分解为子任务及相应的技能。我们提出通过训练一个执行器生成所有可行动作来学习可复用且可迁移的技能。训练后的执行器能够提出可行动作,并可根据具体任务在这些动作中选择最优方案。该执行器的训练过程将每个动作的可行性解释为目标分布,通过最小化执行器输出分布与目标分布之间的散度来实现。我们结合核密度估计、重采样和重要性采样方法,推导出任意f-散度下的通用优化目标,并进一步利用辅助评论家减少与环境交互。与相关策略的初步比较表明,我们的方法能够遍历可行动作空间中的所有模态,验证了该框架在学习可用于多种下游任务的技能方面的潜力。