Modern distributed systems demand low-latency, fault-tolerant event processing that exceeds traditional messaging architecture limits. While frameworks including Apache Kafka, RabbitMQ, Apache Pulsar, NATS JetStream, and serverless event buses have matured significantly, no unified comparative study evaluates them holistically under standardized conditions. This paper presents the first comprehensive benchmarking framework evaluating 12 messaging systems across three representative workloads: e-commerce transactions, IoT telemetry ingestion, and AI inference pipelines. We introduce AIEO (AI-Enhanced Event Orchestration), employing machine learning-driven predictive scaling, reinforcement learning for dynamic resource allocation, and multi-objective optimization. Our evaluation reveals fundamental trade-offs: Apache Kafka achieves peak throughput (1.2M messages/sec, 18ms p95 latency) but requires substantial operational expertise; Apache Pulsar provides balanced performance (950K messages/sec, 22ms p95) with superior multi-tenancy; serverless solutions offer elastic scaling for variable workloads despite higher baseline latency (80-120ms p95). AIEO demonstrates 34\% average latency reduction, 28\% resource utilization improvement, and 42% cost optimization across all platforms. We contribute standardized benchmarking methodologies, open-source intelligent orchestration, and evidence-based decision guidelines. The evaluation encompasses 2,400+ experimental configurations with rigorous statistical analysis, providing comprehensive performance characterization and establishing foundations for next-generation distributed system design.
翻译:现代分布式系统需要超越传统消息架构极限的低延迟、容错事件处理。尽管包括 Apache Kafka、RabbitMQ、Apache Pulsar、NATS JetStream 和无服务器事件总线在内的框架已显著成熟,但尚无统一比较研究在标准化条件下对其进行整体评估。本文提出了首个综合性基准测试框架,评估了12种消息系统在三种代表性工作负载下的表现:电子商务交易、物联网遥测数据摄取和AI推理流水线。我们提出了AIEO(AI增强事件编排),采用机器学习驱动的预测性扩展、强化学习进行动态资源分配以及多目标优化。我们的评估揭示了基本权衡:Apache Kafka实现了峰值吞吐量(120万条消息/秒,p95延迟18毫秒)但需要大量运维专业知识;Apache Pulsar提供了平衡的性能(95万条消息/秒,p95延迟22毫秒)和卓越的多租户能力;无服务器解决方案为可变工作负载提供了弹性扩展能力,尽管其基线延迟较高(p95延迟80-120毫秒)。AIEO在所有平台上平均实现了34%的延迟降低、28%的资源利用率提升和42%的成本优化。我们贡献了标准化基准测试方法、开源智能编排工具以及基于证据的决策指南。该评估涵盖了2400多个实验配置,并进行了严格的统计分析,提供了全面的性能特征描述,为下一代分布式系统设计奠定了基础。