Modern analytics systems are fundamentally reactive, requiring users to define queries over increasingly complex and continuously evolving data. In real-time streaming environments, this paradigm breaks down, as the space of potential insights becomes too large to enumerate manually. We present a multi-agent architecture for autonomous insight discovery over real-time data streams. The system implements a continuous discovery loop in which agents generate hypotheses, compile them into executable analytics, validate generated artifacts, and produce visualizations and deployable applications. The architecture leverages Apache Kafka for event-driven coordination, Apache Flink for stream processing, and large language models to implement specialized agents. A key contribution is a contract-driven design based on typed intermediate artifacts, enabling modularity, observability, lineage, and safer execution of dynamically generated analytics. Through use cases in retail, finance, and public data, we show how this architecture supports a shift from query-driven analytics to proactive, discovery-driven systems.
翻译:现代分析系统本质上是反应式的,要求用户在日益复杂且持续演化的数据上定义查询。在实时流处理环境中,这种范式会失效,因为潜在洞察空间过于庞大而无法手工枚举。我们提出一种用于实时数据流自主洞察发现的多智能体架构。该系统实现了一个持续发现循环,其中智能体生成假设、将其编译为可执行分析程序、验证生成产物,并产出可视化结果及可部署应用。该架构利用Apache Kafka实现事件驱动协调,采用Apache Flink进行流处理,并借助大语言模型实现专用智能体。核心贡献在于一种基于类型化中间产物的合约驱动设计,能够实现模块化、可观测性、血缘追踪以及动态生成分析代码的更高安全性执行。通过零售、金融和公共数据领域的应用案例,我们展示了该架构如何推动从查询驱动型分析向主动式发现驱动型系统的范式转变。