From hardware offloads like RDMA to software ones like eBPF, offloads are everywhere and their value is in performance. However, there is evidence that fully offloading -- even when feasible -- does not always give the expected speedups. Starting from the observation that this is due to changes the offloads make -- by moving tasks from the application/CPU closer to the network/link layer -- we argue that to further accelerate offloads, we need to make offloads reversible by unloading them -- moving back part of the offloaded tasks. Unloading comes with a set of challenges that we start answering in this paper by focusing on (offloaded) RDMA writes: which part of the write operation does it make sense to unload? how do we dynamically decide which writes to execute on the unload or offload path to improve performance? how do we maintain compatibility between the two paths? Our current prototype shows the potential of unloading by accelerating RDMA writes by up to 31%.
翻译:从RDMA等硬件卸载到eBPF等软件卸载,卸载技术无处不在,其价值在于性能提升。然而,有证据表明,即使完全卸载可行,也并非总能带来预期的加速效果。基于卸载通过将任务从应用层/CPU移至更靠近网络/链路层而引发系统结构变化的观察,本文主张:要进一步加速卸载,需通过"反卸载"使卸载可逆——即将部分已卸载任务移回原处。反卸载面临一系列挑战,本文以(已卸载的)RDMA写操作为切入点,开始探索以下问题:写操作的哪个部分适合进行反卸载?如何动态决定哪些写操作应通过反卸载路径或卸载路径执行以提升性能?如何保持两条路径间的兼容性?现有原型系统通过反卸载将RDMA写操作加速最高达31%,初步验证了该技术的潜力。