知识蒸馏论文 - 专知

会员服务 ·

知识蒸馏

ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models

Arxiv

0+阅读 · 2月18日

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Arxiv

0+阅读 · 2月16日

DistillLens: Symmetric Knowledge Distillation Through Logit Lens

Arxiv

0+阅读 · 2月14日

Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Arxiv

0+阅读 · 2月11日

Efficient Graph Knowledge Distillation from GNNs to Kolmogorov--Arnold Networks via Self-Attention Dynamic Sampling

Arxiv

0+阅读 · 2月9日

A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?

Arxiv

0+阅读 · 2月9日

DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices

Arxiv

0+阅读 · 2月15日

RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation

Arxiv

0+阅读 · 2月6日

Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation

Arxiv

0+阅读 · 2月12日

Life Cycle-Aware Evaluation of Knowledge Distillation for Machine Translation: Environmental Impact and Translation Quality Trade-offs

Arxiv

0+阅读 · 2月10日

$\mathcal{X}$-KD: General Experiential Knowledge Distillation for Large Language Models

Arxiv

0+阅读 · 2月13日

Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation

Arxiv

0+阅读 · 2月12日

REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency

Arxiv

0+阅读 · 2月4日

Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

Arxiv

0+阅读 · 2月4日

Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation

Arxiv

0+阅读 · 2月2日

参考链接

微信扫码咨询专知VIP会员