CUDA论文 - 专知

会员服务 ·

CUDA

KineticSim: A Lightweight, High-Performance Execution Engine for Real-Time Market Simulators

Arxiv

0+阅读 · 6月19日

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Arxiv

0+阅读 · 6月16日

From Tokens to Regions: CUDA-Sensitive Instruction Tuning for GPU Kernel Generation

Arxiv

0+阅读 · 6月15日

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

Arxiv

0+阅读 · 6月10日

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

Arxiv

0+阅读 · 6月7日

Gerrymandering the Warp: Non-Control-Data Attacks on CUDA Collective Decision

Arxiv

0+阅读 · 6月10日

The Model Parking Tax: Quantifying the Hidden Energy Cost of Always-On GPU Model Deployment

Arxiv

0+阅读 · 4月15日

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA

Arxiv

0+阅读 · 4月7日

Fast-Vollib: A Fast Implied Volatility Library for Pythonwith PyTorch, JAX, and CUDA Fused-Kernel Backends

Arxiv

0+阅读 · 6月8日

Caspar: CUDA Accelerator for Symbolic Programming with Adaptive Reordering

Arxiv

0+阅读 · 5月28日

Computing statistical solutions of a Mach 2000 astrophysical jet

Arxiv

0+阅读 · 5月28日

Computing weak-strong uniqueness of a Mach 2000 astrophysical jet

Arxiv

0+阅读 · 5月24日

MultiPath Memory Access: Breaking Host-GPU Bandwidth Bottlenecks in LLM Services

Arxiv

0+阅读 · 5月13日

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

Arxiv

0+阅读 · 6月3日

Characterizing Software Aging in GPU-Based LLM Serving Systems

Arxiv

0+阅读 · 6月10日

参考链接

微信扫码咨询专知VIP会员