Recent inductive logic programming (ILP) approaches learn optimal hypotheses. An optimal hypothesis minimises a given cost function on the training data. There are many cost functions, such as minimising training error, textual complexity, or the description length of hypotheses. However, selecting an appropriate cost function remains a key question. To address this gap, we extend a constraint-based ILP system to learn optimal hypotheses for seven standard cost functions. We then empirically compare the generalisation error of optimal hypotheses induced under these standard cost functions. Our results on over 20 domains and 1000 tasks, including game playing, program synthesis, and image reasoning, show that, while no cost function consistently outperforms the others, minimising training error or description length has the best overall performance. Notably, our results indicate that minimising the size of hypotheses does not always reduce generalisation error.
翻译:近期的归纳逻辑编程(ILP)方法旨在学习最优假设。最优假设能在训练数据上最小化给定的成本函数。成本函数种类繁多,例如最小化训练误差、文本复杂度或假设的描述长度。然而,如何选择合适的成本函数仍是一个关键问题。为填补这一研究空白,我们扩展了一个基于约束的ILP系统,使其能够针对七种标准成本函数学习最优假设。随后,我们实证比较了在这些标准成本函数下推导出的最优假设的泛化误差。我们在超过20个领域和1000个任务(包括游戏博弈、程序合成和图像推理)上的实验结果表明,尽管没有一种成本函数能始终优于其他函数,但最小化训练误差或描述长度总体上表现最佳。值得注意的是,我们的结果表明,最小化假设的规模并不总能降低泛化误差。