负载论文 - 专知

会员服务 ·

CacheWise: Understanding Workloads and Optimizing KVCache Management for Efficiently Serving LLM Coding Agents

Arxiv

0+阅读 · 6月15日

RISE: Relay Inference and Online Scheduling for Efficient Edge-Device Collaborative Diffusion Model Services

Arxiv

0+阅读 · 6月16日

Compass: Co-Exploration of Mapping and Hardware for Heterogeneous Multi-Chiplet Accelerators Targeting LLM Inference Service Workloads

Arxiv

0+阅读 · 6月15日

SNAS: A Multi-Layer Defense-in-Depth Architecture for Secure Egress in Sandboxed Workloads

Arxiv

0+阅读 · 6月16日

AIA: A Customized Multi-core RISC-V SoC for Discrete Sampling Workloads in 16 nm

Arxiv

0+阅读 · 6月15日

MemBoost: A Memory-Boosted Framework for Cost-Aware LLM Inference

Arxiv

0+阅读 · 6月15日

Distributed Load Balancing with Workload-Dependent Service Rates

Arxiv

0+阅读 · 6月15日

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

Arxiv

0+阅读 · 6月14日

Stannic: Systolic STochAstic ONliNe SchedulIng AcCelerator

Arxiv

0+阅读 · 6月15日

Solyx AI Grid: Hardware-Telemetry-Aware Routing Across Geographically Distributed GPU Clusters

Arxiv

0+阅读 · 6月13日

From MWM to iSLIP: A Linear-Algebraic Tutorial on Input-Queued Switch Scheduling

Arxiv

0+阅读 · 6月9日

Coordinated Scheduling for MoE LLM Serving

Arxiv

0+阅读 · 6月13日

Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving

Arxiv

0+阅读 · 6月15日

Improved Parallel Algorithms for EF1 Allocations

Arxiv

0+阅读 · 6月14日

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

Arxiv

0+阅读 · 6月13日

参考链接

微信扫码咨询专知VIP会员