If $A$ and $B$ are sets such that $A \subset B$, generalisation may be understood as the inference from $A$ of a hypothesis sufficient to construct $B$. One might infer any number of hypotheses from $A$, yet only some of those may generalise to $B$. How can one know which are likely to generalise? One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise (a proxy for intelligence). We examine this in the context of a mathematical formalism of enactive cognition. We show that compression is neither necessary nor sufficient to maximise performance (measured in terms of the probability of a hypothesis generalising). We formulate a proxy unrelated to length or simplicity, called weakness. We show that if tasks are uniformly distributed, then there is no choice of proxy that performs at least as well as weakness maximisation in all tasks while performing strictly better in at least one. In other words, weakness is the pareto optimal choice of proxy. In experiments comparing maximum weakness and minimum description length in the context of binary arithmetic, the former generalised at between $1.1$ and $5$ times the rate of the latter. We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind's Apperception Engine is able to generalise effectively.
翻译:若 $A$ 和 $B$ 为集合且 $A \subset B$,则泛化可理解为从 $A$ 推断出足以构建 $B$ 的假设。从 $A$ 可推断出多种假设,但其中仅部分能泛化至 $B$。如何得知哪些假设可能泛化?一种策略是选择最短假设,将信息压缩能力等同于泛化能力(作为智能的代理指标)。我们在此背景下基于能动认知的数学形式化框架展开研究。结果表明,压缩既非最大化性能(以假设泛化概率为衡量标准)的必要条件,也非充分条件。我们提出一种与长度或简洁性无关的代理指标——弱度(weakness)。研究表明,若任务均匀分布,则不存在任何代理指标能在所有任务中至少达到与弱度最大化相当的性能,且至少在一个任务中严格优于后者。换言之,弱度是代理指标的帕累托最优选择。在二元算术任务的实验中,比较了最大弱度法与最小描述长度法,前者泛化率可达后者的1.1至5倍。我们认为这证明了弱度是更优的代理指标,并解释了DeepMind的感知引擎(Apperception Engine)为何能实现有效泛化。