Softmax论文 - 专知

会员服务 ·

Softmax

Inverse classification with logistic and softmax classifiers: efficient optimization

Arxiv

0+阅读 · 3月19日

Screening Is Enough

Arxiv

0+阅读 · 4月1日

Screening Is Enough

Arxiv

0+阅读 · 4月6日

Similarity-Distance-Magnitude Activations

Arxiv

0+阅读 · 4月16日

Gradient Boosting within a Single Attention Layer

Arxiv

0+阅读 · 4月3日

K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks

Arxiv

0+阅读 · 4月13日

On Bayesian Softmax-Gated Mixture-of-Experts Models

Arxiv

0+阅读 · 4月22日

Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference

Arxiv

0+阅读 · 4月2日

Winner-Take-All Spiking Transformer for Language Modeling

Arxiv

0+阅读 · 4月13日

Linearizing Vision Transformer with Test-Time Training

Linearizing Vision Transformer with Test-Time Training

Arxiv

0+阅读 · 5月4日

On the Expressive Power of Contextual Relations in Transformers

Arxiv

0+阅读 · 5月1日

Why Softmax Attention Outperforms Linear Attention

Arxiv

0+阅读 · 3月13日

Rethinking Attention: Polynomial Alternatives to Softmax in Transformers

Arxiv

0+阅读 · 3月13日

The Counting Power of Transformers

Arxiv

0+阅读 · 3月2日

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Arxiv

0+阅读 · 2月20日

参考链接

微信扫码咨询专知VIP会员