模型性能论文 - 专知

会员服务 ·

模型性能

ViMedCSS: A Vietnamese Medical Code-Switching Speech Dataset & Benchmark

Arxiv

0+阅读 · 6月22日

Legal Reasoning Is Not Lawyering: Rethinking Legal Benchmarks for Pro Se Access to Justice

Arxiv

0+阅读 · 6月16日

Event-Grounded Question Answering over Long Audio via Structured Retrieval

Event-Grounded Question Answering over Long Audio via Structured Retrieval

Arxiv

0+阅读 · 6月23日

Which Models Perform Better in Inheritance Reasoning?

Arxiv

0+阅读 · 6月19日

The Watermark Shortcut: How Provenance Marking Sabotages Audio Deepfake Detection

Arxiv

0+阅读 · 6月22日

Towards Engineering Scaling Laws with Pretraining Data Composition

Arxiv

0+阅读 · 6月18日

MetaboNet-Bench: A Multi-modal Benchmark for Glucose Forecasting in Type 1 Diabetes

Arxiv

0+阅读 · 6月17日

Recursive Learning Without Collapse: A Weighting-Based Stabilization Framework

Arxiv

0+阅读 · 6月16日

TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins

Arxiv

0+阅读 · 6月16日

Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts

Arxiv

0+阅读 · 6月14日

DCP-Prune: Ultra-Low Token Pruning with Distribution Consistency Preservation

Arxiv

0+阅读 · 6月15日

Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization

Arxiv

0+阅读 · 6月8日

SpectCount: Spectrotemporal Counting via Synthetic Signals Improves Large Audio Language Models

Arxiv

0+阅读 · 6月5日

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Arxiv

0+阅读 · 6月8日

Logit Distillation on Manifolds: Mapping by Learning

Arxiv

0+阅读 · 5月30日

参考链接

微信扫码咨询专知VIP会员