Learning Patterns and Abstractions from Perceptual Sequences

Cognition swiftly breaks high-dimensional sensory streams into familiar parts and uncovers their relations. Why do structures emerge, and how do they enable learning, generalization, and prediction? What computational principles underlie this core aspect of perception and intelligence? A sensory stream, simplified, is a one-dimensional sequence. In learning such sequences, we naturally segment them into parts -- a process known as chunking. In the first project, I investigated factors influencing chunking in a serial reaction time task and showed that humans adapt to underlying chunks while balancing speed and accuracy. Building on this, I developed models that learn chunks and parse sequences chunk by chunk. Normatively, I proposed chunking as a rational strategy for discovering recurring patterns and nested hierarchies, enabling efficient sequence factorization. Learned chunks serve as reusable primitives for transfer, composition, and mental simulation -- letting the model compose the new from the known. I demonstrated this model's ability to learn hierarchies in single and multi-dimensional sequences and highlighted its utility for unsupervised pattern discovery. The second part moves from concrete to abstract sequences. I taxonomized abstract motifs and examined their role in sequence memory. Behavioral evidence suggests that humans exploit pattern redundancies for compression and transfer. I proposed a non-parametric hierarchical variable model that learns both chunks and abstract variables, uncovering invariant symbolic patterns. I showed its similarity to human learning and compared it to large language models. Taken together, this thesis suggests that chunking and abstraction as simple computational principles enable structured knowledge acquisition in hierarchically organized sequences, from simple to complex, concrete to abstract.

翻译：认知系统能迅速将高维感官流分解为熟悉的部分并揭示它们之间的关系。结构为何产生？它们又如何支持学习、泛化和预测？何种计算原理构成了感知与智能这一核心方面？简化而言，感官流即一维序列。在学习此类序列时，我们自然地将其分割成部分——这一过程称为组块化。在第一个项目中，我研究了影响序列反应时任务中组块化的因素，并表明人类在平衡速度与准确性的同时适应底层组块。在此基础上，我开发了能够学习组块并逐块解析序列的模型。从规范角度，我将组块化提出为一种理性策略，用于发现重复模式与嵌套层次结构，从而实现高效的序列分解。所学组块作为可复用的基本单元，支持迁移、组合与心理模拟——使模型能从已知组合出未知。我展示了该模型在单维与多维序列中学习层次结构的能力，并强调了其在无监督模式发现中的效用。第二部分从具体序列转向抽象序列。我对抽象主题进行了分类，并考察了它们在序列记忆中的作用。行为证据表明，人类利用模式冗余进行压缩与迁移。我提出了一种非参数层次变量模型，该模型同时学习组块与抽象变量，揭示了不变符号模式。我展示了该模型与人类学习的相似性，并将其与大型语言模型进行了比较。综上所述，本论文表明，组块化与抽象作为简单的计算原理，能够使结构化的知识获取在层次化组织的序列中实现——从简单到复杂、从具体到抽象。