Modern generative machine learning models demonstrate surprising ability to create realistic outputs far beyond their training data, such as photorealistic artwork, accurate protein structures, or conversational text. These successes suggest that generative models learn to effectively parametrize and sample arbitrarily complex distributions. Beginning half a century ago, foundational works in nonlinear dynamics used tools from information theory to infer properties of chaotic attractors from time series, motivating the development of algorithms for parametrizing chaos in real datasets. In this perspective, we aim to connect these classical works to emerging themes in large-scale generative statistical learning. We first consider classical attractor reconstruction, which mirrors constraints on latent representations learned by state space models of time series. We next revisit early efforts to use symbolic approximations to compare minimal discrete generators underlying complex processes, a problem relevant to modern efforts to distill and interpret black-box statistical models. Emerging interdisciplinary works bridge nonlinear dynamics and learning theory, such as operator-theoretic methods for complex fluid flows, or detection of broken detailed balance in biological datasets. We anticipate that future machine learning techniques may revisit other classical concepts from nonlinear dynamics, such as transinformation decay and complexity-entropy tradeoffs.
翻译:现代生成式机器学习模型展现出惊人的能力,能够创造出远超训练数据的逼真输出,例如照片级艺术品、精确的蛋白质结构或对话文本。这些成功表明,生成模型能够有效参数化并采样任意复杂分布。半个世纪前,非线性动力学的基础性工作利用信息论工具从时间序列中推断混沌吸引子的特性,从而推动了真实数据集中混沌参数化算法的开发。本文从这一视角出发,旨在将经典研究与大规模生成式统计学习的新兴主题联系起来。我们首先考虑经典的吸引子重构,这反映了时间序列状态空间模型所学潜在表示的约束。接着,我们重新审视早期利用符号近似比较复杂过程背后最小离散生成器的努力,这一问题与当代提炼和解释黑箱统计模型的工作密切相关。新兴的跨学科研究将非线性动力学与学习理论相结合,例如复杂流体流动的算子理论方法,或生物数据集中详细平衡破坏的检测。我们预计,未来的机器学习技术可能会重新审视非线性动力学中的其他经典概念,如传递信息衰减和复杂度-熵权衡。