大语言模型基准论文 - 专知

会员服务 ·

大语言模型基准

大语言模型基准

InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social Systems

InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social Systems

Arxiv

0+阅读 · 3月16日

NC-Bench: An LLM Benchmark for Evaluating Conversational Competence

Arxiv

0+阅读 · 3月8日

Beyond the Rubric: Cultural Misalignment in LLM Benchmarks for Sexual and Reproductive Health

Arxiv

0+阅读 · 2025年11月26日

SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth

Arxiv

0+阅读 · 2025年11月24日

参考链接

微信扫码咨询专知VIP会员