The volume of data generated and stored in contemporary global data centers is experiencing exponential growth. This rapid data growth necessitates efficient processing and analysis to extract valuable business insights. In distributed data processing systems, data undergoes exchanges between the compute servers that contribute significantly to the total data processing duration in adequately large clusters, necessitating efficient data transport protocols. Traditionally, data transport frameworks such as JDBC and ODBC have used TCP/IP-over-Ethernet as their underlying network protocol. Such frameworks require serializing the data into a single contiguous buffer before handing it off to the network card, primarily due to the requirement of contiguous data in TCP/IP. In OLAP use cases, this serialization process is costly for columnar data batches as it involves numerous memory copies that hurt data transport duration and overall data processing performance. We study the serialization overhead in the context of a widely-used columnar data format, Apache Arrow, and propose leveraging RDMA to transport Arrow data over Infiniband in a zero-copy manner. We design and implement Thallus, an RDMA-based columnar data transport protocol for Apache Arrow based on the Thallium framework from the Mochi ecosystem, compare it with a purely Thallium RPC-based implementation, and show substantial performance improvements can be achieved by using RDMA for columnar data transport.
翻译:当代全球数据中心生成和存储的数据量正经历指数级增长。这种数据的快速增长需要高效的处理与分析,以提取有价值的商业洞察。在分布式数据处理系统中,数据在计算服务器之间进行交换,这在足够大规模集群中会显著影响整体数据处理时长,因此需要高效的数据传输协议。传统上,诸如JDBC和ODBC等数据传输框架采用TCP/IP-over-Ethernet作为底层网络协议。此类框架需要先将数据序列化为单个连续缓冲区,再将其移交给网卡,这主要是由于TCP/IP对数据连续性的要求。在OLAP应用场景中,这种序列化过程对于列式数据批次而言成本高昂,因为它涉及大量内存拷贝,从而损害数据传输时长和整体数据处理性能。我们研究了广泛使用的列式数据格式Apache Arrow中的序列化开销,并提出利用RDMA以零拷贝方式在Infiniband上传输Arrow数据。我们基于Mochi生态系统中的Thallium框架,设计并实现了Thallus——一种用于Apache Arrow的基于RDMA的列式数据传输协议,将其与纯Thallium RPC实现进行了对比,并展示了使用RDMA进行列式数据传输可实现的显著性能提升。