Code generation has progressed more reliably than reinforcement learning, largely because code has an information structure that makes it learnable. Code provides dense, local, verifiable feedback at every token, whereas most reinforcement learning problems do not. This difference in feedback quality is not binary but graded. We propose a five-level hierarchy of learnability based on information structure and argue that the ceiling on ML progress depends less on model size than on whether a task is learnable at all. The hierarchy rests on a formal distinction among three properties of computational problems (expressibility, computability, and learnability). We establish their pairwise relationships, including where implications hold and where they fail, and present a unified template that makes the structural differences explicit. The analysis suggests why supervised learning on code scales predictably while reinforcement learning does not, and why the common assumption that scaling alone will solve remaining ML challenges warrants scrutiny.
翻译:代码生成比强化学习进展更可靠,主要是因为代码具有使其可学习的信息结构。代码在每个标记处提供密集、局部且可验证的反馈,而大多数强化学习问题则不具备这种特性。这种反馈质量的差异并非二元对立,而是存在梯度层次。我们基于信息结构提出了一个五级可学习性层次体系,并论证机器学习进展的上限更少取决于模型规模,而更多取决于任务本身是否具有可学习性。该层次体系建立在计算问题三个属性(可表达性、可计算性与可学习性)的形式化区分之上。我们确立了这三者之间的成对关系,包括蕴含关系成立与失效的条件,并提出了一个使结构差异显式化的统一模板。该分析揭示了为何代码的监督学习能够按预期扩展而强化学习则不能,以及为何“仅靠扩展就能解决机器学习剩余挑战”这一普遍假设值得重新审视。