cache论文 - 专知

会员服务 ·

cache

Amoeba: Runtime Tensor Parallel Transformation for LLM Inference Services

Arxiv

0+阅读 · 4月22日

Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning

Arxiv

0+阅读 · 3月25日

VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness

Arxiv

0+阅读 · 3月10日

d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching

Arxiv

0+阅读 · 2月16日

Dual-Signal Adaptive KV-Cache Optimization for Long-Form Video Understanding in Vision-Language Models

Arxiv

0+阅读 · 2月15日

KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs

Arxiv

0+阅读 · 2月5日

DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs

Arxiv

0+阅读 · 2月5日

Q Cache: Visual Attention is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model

Arxiv

0+阅读 · 2月2日

VidLaDA: Bidirectional Diffusion Large Language Models for Efficient Video Understanding

Arxiv

0+阅读 · 1月29日

VidLaDA: Bidirectional Diffusion Large Language Models for Efficient Video Understanding

Arxiv

0+阅读 · 1月25日

Joint Encoding of KV-Cache Blocks for Scalable LLM Serving

Arxiv

0+阅读 · 1月6日

Enhancing Reliability of STT-MRAM Caches by Eliminating Read Disturbance Accumulation

Arxiv

0+阅读 · 1月1日

NVM-in-Cache: Repurposing Commodity 6T SRAM Cache into NVM Analog Processing-in-Memory Engine using a Novel Compute-on-Powerline Scheme

Arxiv

0+阅读 · 2025年12月27日

VNF-Cache: An In-Network Key-Value Store Cache Based on Network Function Virtualization

Arxiv

0+阅读 · 2025年12月23日

TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0

Arxiv

0+阅读 · 2025年12月10日

参考链接

微信扫码咨询专知VIP会员