探针论文 - 专知

会员服务 ·

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

Arxiv

0+阅读 · 6月16日

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

Arxiv

0+阅读 · 6月15日

"OpenBloom": A Stigma-Sensitive LLM Design Probe for Reproductive Well-Being

Arxiv

0+阅读 · 6月14日

Medical Heuristic Learning: An LLM-Driven Framework for Interpretable and Auditable Clinical Decision Rules

Arxiv

0+阅读 · 6月15日

Do Safety Monitors Stay Reliable After an Update? Benchmarking and Predicting Activation-Monitor Staleness

Arxiv

0+阅读 · 6月14日

Probing and graph coloring techniques for trace estimation in Lattice QCD

Arxiv

0+阅读 · 5月29日

Coherent Swap Regret and Channel-Proof Learning

Arxiv

0+阅读 · 6月1日

Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits

Arxiv

0+阅读 · 5月31日

Quality Is Not a Safety Proxy Under Quantization

Arxiv

0+阅读 · 6月8日

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework

Arxiv

0+阅读 · 6月10日

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

Arxiv

0+阅读 · 6月11日

When Does Routing Become Interpretable? Causal Probes on Block Attention Residuals

Arxiv

0+阅读 · 6月11日

Mirage Probes: How Vision Models Fake Visual Understanding

Arxiv

0+阅读 · 6月11日

Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries

Arxiv

0+阅读 · 6月8日

CUBE: Contrastive Understanding by Balanced Experiments

Arxiv

0+阅读 · 6月4日

参考链接

微信扫码咨询专知VIP会员