自动化评估论文 - 专知

会员服务 ·

自动化评估

自动化评估

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Arxiv

0+阅读 · 6月5日

LaQual: An Automated Framework for LLM App Quality Evaluation

Arxiv

0+阅读 · 6月10日

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Arxiv

0+阅读 · 4月1日

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

Arxiv

0+阅读 · 4月1日

Toward LLM-Supported Automated Assessment of Critical Thinking Subskills

Arxiv

0+阅读 · 2月18日

A Scalable Framework for Evaluating Health Language Models

Arxiv

0+阅读 · 2月18日

Automated Assessment of Kidney Ureteroscopy Exploration for Training

Arxiv

0+阅读 · 2月17日

Supporting Humans in Evaluating AI Summaries of Legal Depositions

Arxiv

0+阅读 · 1月21日

SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks

Arxiv

0+阅读 · 1月22日

DR-Arena: an Automated Evaluation Framework for Deep Research Agents

Arxiv

0+阅读 · 1月15日

LegalRikai: Open Benchmark -- Benchmark for Complex Japanese Corporate Legal Tasks

Arxiv

0+阅读 · 2025年12月15日

LegalRikai: Open Benchmark -- A Benchmark for Complex Japanese Corporate Legal Tasks

Arxiv

0+阅读 · 2025年12月12日

AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding

Arxiv

0+阅读 · 2025年12月11日

Building Trust in Virtual Immunohistochemistry: Automated Assessment of Image Quality

Arxiv

0+阅读 · 2025年11月6日

参考链接

微信扫码咨询专知VIP会员