GPU论文 - 专知

会员服务 ·

GPU

ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression

Arxiv

0+阅读 · 3月18日

HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

Arxiv

0+阅读 · 3月17日

LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis

LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis

Arxiv

0+阅读 · 3月18日

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency

Arxiv

0+阅读 · 3月18日

AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models

Arxiv

0+阅读 · 3月17日

Fluids You Can Trust: Property-Preserving Operator Learning for Incompressible Flows

Fluids You Can Trust: Property-Preserving Operator Learning for Incompressible Flows

Arxiv

0+阅读 · 3月17日

Towards heterogeneous parallelism for SPHinXsys

Arxiv

0+阅读 · 3月17日

ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation

Arxiv

0+阅读 · 3月17日

Dataflow-Oriented Classification and Performance Analysis of GPU-Accelerated Homomorphic Encryption

Arxiv

0+阅读 · 3月17日

FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism

Arxiv

0+阅读 · 3月17日

inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference

Arxiv

0+阅读 · 3月17日

Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking

Arxiv

0+阅读 · 3月17日

Guaranteeing Semantic and Performance Determinism in Flexible GPU Sharing

Arxiv

0+阅读 · 3月17日

An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU

Arxiv

0+阅读 · 3月17日

Guaranteeing Semantic and Performance Determinism in Flexible GPU Sharing

Arxiv

0+阅读 · 3月16日

参考链接

微信扫码咨询专知VIP会员