Towards a Benchmark for Scientific Understanding in Humans and Machines - 专知论文

会员服务 ·

0

基准 · 质量控制 · 性能评估 · 效用 · 智能系统 ·

2023 年 4 月 20 日

Towards a Benchmark for Scientific Understanding in Humans and Machines

翻译：迈向人机科学理解基准

Kristian Gonzalez Barman,Sascha Caron,Tom Claassen,Henk de Regt

Scientific understanding is a fundamental goal of science, allowing us to explain the world. There is currently no good way to measure the scientific understanding of agents, whether these be humans or Artificial Intelligence systems. Without a clear benchmark, it is challenging to evaluate and compare different levels of and approaches to scientific understanding. In this Roadmap, we propose a framework to create a benchmark for scientific understanding, utilizing tools from philosophy of science. We adopt a behavioral notion according to which genuine understanding should be recognized as an ability to perform certain tasks. We extend this notion by considering a set of questions that can gauge different levels of scientific understanding, covering information retrieval, the capability to arrange information to produce an explanation, and the ability to infer how things would be different under different circumstances. The Scientific Understanding Benchmark (SUB), which is formed by a set of these tests, allows for the evaluation and comparison of different approaches. Benchmarking plays a crucial role in establishing trust, ensuring quality control, and providing a basis for performance evaluation. By aligning machine and human scientific understanding we can improve their utility, ultimately advancing scientific understanding and helping to discover new insights within machines.

翻译：科学理解是科学的基本目标，使我们能够解释世界。目前尚无有效方法衡量主体（无论是人类还是人工智能系统）的科学理解水平。缺乏明确的基准，评估和比较不同层次及方法的科学理解便面临挑战。在本路线图中，我们利用科学哲学工具提出了构建科学理解基准的框架。我们采纳行为主义观念，认为真正的理解应被视作执行特定任务的能力。通过设计一系列问题，我们拓展了这一观念——这些问题可衡量不同层次的科学理解，涵盖信息检索、组织信息以形成解释的能力，以及推断不同情境下结果差异的能力。由这些测试构成的科学理解基准（SUB）能够评估和比较不同方法。基准测试在建立信任、确保质量控制以及提供性能评估基础方面发挥着关键作用。通过使机器与人类的科学理解相协调，我们可提升其实用性，最终推动科学理解的进步，并帮助机器发掘新见解。

0

相关内容

多模态认知计算

多模态认知计算

专知会员服务

182+阅读 · 2022年9月16日

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

专知会员服务

31+阅读 · 2022年3月11日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【机器推理可解释性】Machine Reasoning Explainability

【机器推理可解释性】Machine Reasoning Explainability

专知会员服务

35+阅读 · 2020年9月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

机器之心

1+阅读 · 2022年10月30日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

模块化的机器学习系统就够了吗？Bengio师生告诉你答案

模块化的机器学习系统就够了吗？Bengio师生告诉你答案

机器之心

0+阅读 · 2022年6月8日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于新型聚合物/无机杂化空穴传输材料的高效钙钛矿太阳能电池研究

国家自然科学基金

0+阅读 · 2015年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属氧化物界面的自旋极化电子输运研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cu/Al复合带固-液铸轧电流强化复合成形技术基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

下白垩统热河群鸟类化石形态和分类学研究

国家自然科学基金

0+阅读 · 2012年12月31日

聚合物/氧化物杂化太阳电池中多元复合界面结构特性对光电转换过程的影响

国家自然科学基金

0+阅读 · 2012年12月31日

肝细胞肝癌中抑癌基因DLC1表达沉默的遗传学与表观遗传学机制

国家自然科学基金

0+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

黑曲霉（Aspergillus niger）对含钾矿物的生物风化与调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

神经病理性疼痛的脑结构和功能网络研究

国家自然科学基金

0+阅读 · 2010年12月31日

A Study of Situational Reasoning for Traffic Understanding

Arxiv

0+阅读 · 2023年6月5日

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Arxiv

1+阅读 · 2023年6月2日

Towards In-context Scene Understanding

Arxiv

0+阅读 · 2023年6月2日

Toward an Ethics of AI Belief

Arxiv

0+阅读 · 2023年6月2日

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Arxiv

0+阅读 · 2023年6月2日

Benchmark dataset and instance generator for Real-World Three-Dimensional Bin Packing Problems

Arxiv

0+阅读 · 2023年6月2日

What-is and How-to for Fairness in Machine Learning: A Survey, Reflection, and Perspective

Arxiv

0+阅读 · 2023年6月2日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

26+阅读 · 2020年10月20日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

最新内容

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

4+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

6+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

6+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

4+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

7+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

6+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

9+阅读 · 7月19日

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

10+阅读 · 7月19日

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

15+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

8+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

16+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

11+阅读 · 7月18日

相关VIP内容

多模态认知计算

多模态认知计算

专知会员服务

182+阅读 · 2022年9月16日

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

专知会员服务

31+阅读 · 2022年3月11日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【机器推理可解释性】Machine Reasoning Explainability

【机器推理可解释性】Machine Reasoning Explainability

专知会员服务

35+阅读 · 2020年9月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

印度精确打击与指挥架构的断层

美空军AI完成F-16战斗机自主空战历史性试飞

相关资讯

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

机器之心

1+阅读 · 2022年10月30日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

模块化的机器学习系统就够了吗？Bengio师生告诉你答案

模块化的机器学习系统就够了吗？Bengio师生告诉你答案

机器之心

0+阅读 · 2022年6月8日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Study of Situational Reasoning for Traffic Understanding

Arxiv

0+阅读 · 2023年6月5日

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Arxiv

1+阅读 · 2023年6月2日

Towards In-context Scene Understanding

Arxiv

0+阅读 · 2023年6月2日

Toward an Ethics of AI Belief

Arxiv

0+阅读 · 2023年6月2日

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Arxiv

0+阅读 · 2023年6月2日

Benchmark dataset and instance generator for Real-World Three-Dimensional Bin Packing Problems

Arxiv

0+阅读 · 2023年6月2日

What-is and How-to for Fairness in Machine Learning: A Survey, Reflection, and Perspective

Arxiv

0+阅读 · 2023年6月2日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

26+阅读 · 2020年10月20日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

基于新型聚合物/无机杂化空穴传输材料的高效钙钛矿太阳能电池研究

国家自然科学基金

0+阅读 · 2015年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属氧化物界面的自旋极化电子输运研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cu/Al复合带固-液铸轧电流强化复合成形技术基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

下白垩统热河群鸟类化石形态和分类学研究

国家自然科学基金

0+阅读 · 2012年12月31日

聚合物/氧化物杂化太阳电池中多元复合界面结构特性对光电转换过程的影响

国家自然科学基金

0+阅读 · 2012年12月31日

肝细胞肝癌中抑癌基因DLC1表达沉默的遗传学与表观遗传学机制

国家自然科学基金

0+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

黑曲霉（Aspergillus niger）对含钾矿物的生物风化与调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

神经病理性疼痛的脑结构和功能网络研究

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员