数据并行论文 - 专知

会员服务 ·

数据并行

MuLoCo: Muon is a practical inner optimizer for DiLoCo

Arxiv

0+阅读 · 2月25日

veScale-FSDP: Flexible and High-Performance FSDP at Scale

Arxiv

0+阅读 · 2月27日

SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training

Arxiv

0+阅读 · 3月3日

FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model Serving

Arxiv

0+阅读 · 2月26日

FCDP: Fully Cached Data Parallel for Communication-Avoiding Large-Scale Training

Arxiv

0+阅读 · 2月6日

AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism

Arxiv

0+阅读 · 1月30日

Training LLMs with Fault Tolerant HSDP on 100,000 GPUs

Arxiv

0+阅读 · 1月30日

The Energy-Throughput Trade-off in Lossless-Compressed Source Code Storage

Arxiv

0+阅读 · 1月19日

Revisiting Parameter Server in LLM Post-Training

Arxiv

0+阅读 · 1月27日

Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies

Arxiv

0+阅读 · 1月5日

FastMPS: Revisit Data Parallel in Large-scale Matrix Product State Sampling

Arxiv

0+阅读 · 2025年12月23日

Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference

Arxiv

0+阅读 · 2025年12月18日

BS-tree: A gapped data-parallel B-tree

Arxiv

0+阅读 · 2025年11月13日

ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training

Arxiv

0+阅读 · 2025年10月14日

Model Parallelism With Subnetwork Data Parallelism

Arxiv

0+阅读 · 2025年10月2日

参考链接

微信扫码咨询专知VIP会员