Every automaton can be decomposed into a cascade of basic prime automata. This is the Prime Decomposition Theorem by Krohn and Rhodes. Guided by this theory, we propose automata cascades as a structured, modular, way to describe automata as complex systems made of many components, each implementing a specific functionality. Any automaton can serve as a component; using specific components allows for a fine-grained control of the expressivity of the resulting class of automata; using prime automata as components implies specific expressivity guarantees. Moreover, specifying automata as cascades allows for describing the sample complexity of automata in terms of their components. We show that the sample complexity is linear in the number of components and the maximum complexity of a single component, modulo logarithmic factors. This opens to the possibility of learning automata representing large dynamical systems consisting of many parts interacting with each other. It is in sharp contrast with the established understanding of the sample complexity of automata, described in terms of the overall number of states and input letters, which implies that it is only possible to learn automata where the number of states is linear in the amount of data available. Instead our results show that one can learn automata with a number of states that is exponential in the amount of data available.
翻译:每个自动机都可以分解为基本素自动机的级联,此即Krohn与Rhodes的素分解定理。在该理论指导下,我们提出自动机级联作为一种结构化、模块化的方式,将自动机描述为由众多组件构成的复杂系统,每个组件实现特定功能。任何自动机均可作为组件;使用特定组件可实现对所得自动机类表达能力粒度的精细控制;而采用素自动机作为组件则能提供特定的表达能力保障。此外,将自动机描述为级联形式,可依据其组件描述自动机的样本复杂度。我们证明,样本复杂度与组件数量以及单个组件的最大复杂度呈线性关系(对数因子不计)。这为学习表示由众多相互交互部分构成的大型动态系统的自动机开辟了可能性。这与现有关于自动机样本复杂度的认知形成鲜明对比,后者以状态总数和输入字母数来描述,意味着仅能学习状态数量与可用数据量呈线性关系的自动机。而我们的结果表明,可以学习状态数量与可用数据量呈指数关系的自动机。