Control of networked systems, comprised of interacting agents, is often achieved through modeling the underlying interactions. Constructing accurate models of such interactions--in the meantime--can become prohibitive in applications. Data-driven control methods avoid such complications by directly synthesizing a controller from the observed data. In this paper, we propose an algorithm referred to as Data-driven Structured Policy Iteration (D2SPI), for synthesizing an efficient feedback mechanism that respects the sparsity pattern induced by the underlying interaction network. In particular, our algorithm uses temporary "auxiliary" communication links in order to enable the required information exchange on a (smaller) sub-network during the "learning phase" -- links that will be removed subsequently for the final distributed feedback synthesis. We then proceed to show that the learned policy results in a stabilizing structured policy for the entire network. Our analysis is then followed by showing the stability and convergence of the proposed distributed policies throughout the learning phase, exploiting a construct referred to as the "Patterned monoid.'' The performance of D2SPI is then demonstrated using representative simulation scenarios.
翻译:由交互智能体构成的网络化系统,通常通过建模底层交互来实现控制。然而,对此类交互构建精确模型在实际应用中往往代价高昂。数据驱动控制方法通过直接从观测数据综合控制器来避免此类复杂问题。本文提出一种名为数据驱动结构化策略迭代(Data-driven Structured Policy Iteration, D2SPI)的算法,用于合成能够保留底层交互网络所诱导稀疏模式的反馈机制。具体而言,该算法在“学习阶段”临时采用“辅助”通信链路,以在(更小规模的)子网络上实现必要的信息交换——这些链路将在后续最终分布式反馈综合中被移除。随后我们证明,学习得到的策略能为整个网络生成稳定化结构化策略。进一步地,借助名为“模式化幺半群”(Patterned monoid)的构造,我们展示了所提出分布式策略在学习阶段全过程中的稳定性与收敛性。最后,通过代表性仿真场景验证了D2SPI的性能。