Distributed Kafka Clusters: A Novel Approach to Global Message Ordering

In contemporary distributed systems, logs are produced at an astounding rate, generating terabytes of data within mere seconds. These logs, containing pivotal details like system metrics, user actions, and diverse events, are foundational to the system's consistent and accurate operations. Precise log ordering becomes indispensable to avert potential ambiguities and discordances in system functionalities. Apache Kafka, a prevalent distributed message queue, offers significant solutions to various distributed log processing challenges. However, it presents an inherent limitation while Kafka ensures the in-order delivery of messages within a single partition to the consumer, it falls short in guaranteeing a global order for messages spanning multiple partitions. This research delves into innovative methodologies to achieve global ordering of messages within a Kafka topic, aiming to bolster the integrity and consistency of log processing in distributed systems. Our code is available on GitHub.

翻译：在当代分布式系统中，日志以惊人的速度产生，短短数秒内便可生成TB级数据。这些日志包含系统指标、用户行为及各类事件等关键信息，是系统稳定准确运行的基础。精确的日志排序对于避免系统功能潜在的不确定性与不一致性至关重要。Apache Kafka作为一种广泛使用的分布式消息队列，为多种分布式日志处理难题提供了有效解决方案，但存在固有局限：Kafka虽能保证单个分区内消息按序投递给消费者，却无法实现跨多分区消息的全局排序。本研究深入探索了在Kafka主题内实现消息全局排序的创新方法，旨在提升分布式系统中日志处理的完整性与一致性。我们的代码已在GitHub上开源。

相关内容

Kafka

关注 162

Kafka是一种高吞吐量的分布式发布订阅消息系统，它可以处理消费者规模的网站中的所有动作流数据。这种动作（网页浏览，搜索和其他用户的行动）是在现代网络上的许多社会功能的一个关键因素。这些数据通常是由于吞吐量的要求而通过处理日志和日志聚合来解决。对于像Hadoop的一样的日志数据和离线分析系统，但又要求实时处理的限制，这是一个可行的解决方案。Kafka的目的是通过Hadoop的并行加载机制来统一线上和离线的消息处理，也是为了通过集群来提供实时的消费。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日