学习率论文 - 专知

会员服务 ·

学习率

Minimax learning rates for estimating binary classifiers under margin conditions

Arxiv

0+阅读 · 3月13日

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

Arxiv

0+阅读 · 3月10日

Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective

Arxiv

0+阅读 · 3月3日

High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models

Arxiv

0+阅读 · 2月17日

Explaining Grokking in Transformers through the Lens of Inductive Bias

Arxiv

0+阅读 · 2月6日

Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules

Arxiv

0+阅读 · 2月15日

Step by Step: Adaptive Gradient Descent for Training L-Lipschitz Neural Networks

Arxiv

0+阅读 · 2月6日

Weight Decay may matter more than muP for Learning Rate Transfer in Practice

Arxiv

0+阅读 · 2月13日

Learning Rate Annealing Improves Tuning Robustness in Stochastic Optimization

Arxiv

0+阅读 · 2月16日

Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay

Arxiv

0+阅读 · 2月6日

Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay

Arxiv

0+阅读 · 2月15日

Unsupervised Layer-Wise Dynamic Test Time Adaptation for LLMs

Arxiv

0+阅读 · 2月10日

Dueling over Multiple Pieces of Dessert

Arxiv

0+阅读 · 2月12日

Where Does Warm-Up Come From? Adaptive Scheduling for Norm-Constrained Optimizers

Arxiv

0+阅读 · 2月5日

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

Arxiv

0+阅读 · 2月4日

参考链接

微信扫码咨询专知VIP会员