Privacy-preserving Chunk Scheduling in a BitTorrent Implementation of Federated Learning

from arxiv, This paper has been accepted to the 46th IEEE International Conference on Distributed Computing Systems (ICDCS 2026). Please cite the IEEE proceedings version once it becomes available

Traditional federated learning (FL) relies on a central aggregator server, which can create performance bottlenecks and privacy risks. Decentralized mix-and-forward designs remove the server, but repeated local mixing can attenuate global information under heterogeneity and expose peer-to-peer neighborhoods as a privacy attack surface. To preserve FedAvg-style aggregation semantics over updates reconstructable by the round deadline while scaling dissemination, we present FLTorrent, a BitTorrent-based dissemination layer for serverless FL with a short warm-up. Warm-up hardens within-round source unlinkability, a dissemination-layer goal orthogonal to content protections such as DP or secure aggregation, via pre-round obfuscation, randomized lags, and coordination-only non-owner-first scheduling with the tracker off the data path, before switching to vanilla BitTorrent swarming. We upper-bound the per-transfer attribution posterior by the fraction of owner chunks in a sender's eligible cover set, and derive a tighter high-probability bound that improves with early non-owner mass. A simple heuristic, GreedyFastestFirst, attains about 92% of a bandwidth-optimal max-flow upper bound, while warm-up remains a stable about 12% share of a round across 100-500 peers. Under an observation-only local adversary, FLTorrent drives attribution success close to neighborhood-level random guessing for typical nodes, improves with network size, and remains robust under collusion. In LLM-scale dissemination stress tests over 7-10 Gbps access links, FLTorrent adds only about 6-10% round-time overhead relative to BitTorrent-only. Overall, FLTorrent shows that within-round unlinkability and BitTorrent-level efficiency can co-exist with predictable, low overheads at scale.

翻译：传统联邦学习依赖中心化的聚合服务器，这会带来性能瓶颈和隐私风险。去中心化的混合-转发设计取消了服务器，但重复的局部混合会削弱异构性下的全局信息，并使对等网络邻域成为隐私攻击面。为了在保留FedAvg式聚合语义的前提下，使更新可按时重建并扩展传播规模，我们提出FLTorrent——一种基于BitTorrent的无服务器联邦学习传播层，具有短预热阶段。预热通过轮次前混淆、随机延迟以及协调性非所有者优先调度（追踪器不参与数据路径），实现了轮次内来源不可链接性——这是独立于差分隐私或安全聚合等内容保护的传播层目标——之后切换至标准BitTorrent群组模式。我们给出了单次传输后验归因的上界，其值等于发送方覆盖集中所有者块的比例，并推导出一个更紧的高概率上界，该上界随早期非所有者块数量的增加而改善。简单启发式算法GreedyFastestFirst能达到带宽最优最大流上界的约92%，而预热在100-500节点规模下仍稳定占用轮次时长的约12%。在仅观测的局部敌手攻击下，FLTorrent使典型节点的归因成功率接近邻域级随机猜测水平，且随网络规模扩大而改善，并在合谋攻击下保持鲁棒性。在7-10 Gbps接入链路的LLM规模传播压力测试中，FLTorrent相比纯BitTorrent仅增加约6-10%的轮次时间开销。总体而言，FLTorrent表明，轮次内不可链接性与BitTorrent级效率可在可预测的低开销下大规模共存。