A fitting algorithm for conjunctive queries (CQs) produces, given a set of positively and negatively labeled data examples, a CQ that fits these examples. In general, there may be many non-equivalent fitting CQs and thus the algorithm has some freedom in producing its output. Additional desirable properties of the produced CQ are that it generalizes well to unseen examples in the sense of PAC learning and that it is most general or most specific in the set of all fitting CQs. In this research note, we show that these desiderata are incompatible when we require PAC-style generalization from a polynomial sample: we prove that any fitting algorithm that produces a most-specific fitting CQ cannot be a sample-efficient PAC learning algorithm, and the same is true for fitting algorithms that produce a most-general fitting CQ (when it exists). Our proofs rely on a polynomial construction of relativized homomorphism dualities for path-shaped structures.
翻译:针对合取查询(CQs)的拟合算法,在给定一组正负标记的数据样本后,能够生成一个拟合这些样本的合取查询。一般而言,可能存在多个非等价的拟合合取查询,因此算法在生成输出时具有一定的自由度。所生成合取查询的其他理想性质包括:在PAC学习意义上能良好地泛化到未见示例,以及在所有拟合合取查询的集合中具有最一般性或最特殊性。在本研究笔记中,我们证明当要求从多项式样本中实现PAC风格泛化时,这些目标是不兼容的:我们证明任何生成最特殊拟合合取查询的拟合算法都不能成为样本高效的PAC学习算法,并且对于生成最一般拟合合取查询(当存在时)的拟合算法也是如此。我们的证明基于对路径形结构的相对化同态对偶性的一种多项式构造。