计算性能论文 - 专知

会员服务 ·

计算性能

AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models

Arxiv

0+阅读 · 6月19日

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

Arxiv

0+阅读 · 6月23日

Node-Level Performance and Energy Characterization of Flagship Science Applications on SuperMUC-NG Phase 2

Arxiv

0+阅读 · 6月22日

When Is a Columnar Scan Bandwidth-Bound? A Decode-Throughput Law and Its Cross-Hardware Validation

Arxiv

0+阅读 · 6月21日

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

Arxiv

0+阅读 · 6月23日

FP8 is All You Need (Part 2): Efficient Ozaki-Bailey Style FFT Through Tensor-core Garner Reformulation and Kulisch Escape Route

Arxiv

0+阅读 · 5月28日

Randomized Sketching is Robust to Low-Precision Rounding on GPUs

Arxiv

0+阅读 · 6月18日

AI-PAVE-Br: Leveraging Large Language Models for Enhanced Product Attribute Value Extraction through a Golden Set Approach

Arxiv

0+阅读 · 6月23日

Look Before You Leap: Checking In on Type Tag Checking

Arxiv

0+阅读 · 6月18日

Memory Layouts for GPU-Data Transfer Buffering in SPH

Arxiv

0+阅读 · 6月22日

Does Mixture-of-Experts Actually Help Inference on Consumer and Edge Hardware? An Empirical Study

Arxiv

0+阅读 · 6月23日

LMS-AR: LMS Prediction-based Adaptive Regulator for Memory Bandwidth in Multicore Systems

Arxiv

0+阅读 · 6月22日

The Serialized Bridge: Understanding and Recovering LLM Serving Performance under Blackwell GPU Confidential Computing

Arxiv

0+阅读 · 6月22日

AI Tokenomics: The Economics of Tokens, Computation, and Pricing in Foundation Models

Arxiv

0+阅读 · 6月10日

Apple Neural Engine: Architecture, Programming, and Performance

Arxiv

0+阅读 · 6月21日

参考链接

微信扫码咨询专知VIP会员