ANALOGICAL - A New Benchmark for Analogy of Long Text for Large Language Models - 专知论文

会员服务 ·

0

语言模型化 · 可辨认的 · Taxonomy · 词向量表示 · MoDELS ·

2023 年 5 月 14 日

ANALOGICAL - A New Benchmark for Analogy of Long Text for Large Language Models

翻译：ANALOGICAL——大语言模型长文本类比推理新基准

Thilini Wijesiriwardene,Ruwan Wickramarachchi,Bimal G. Gajera,Shreeyash Mukul Gowaikar,Chandan Gupta,Aman Chadha,Aishwarya Naresh Reganti,Amit Sheth,Amitava Das

from arxiv, Accepted as a long paper at Findings of ACL 2023

Over the past decade, analogies, in the form of word-level analogies, have played a significant role as an intrinsic measure of evaluating the quality of word embedding methods such as word2vec. Modern large language models (LLMs), however, are primarily evaluated on extrinsic measures based on benchmarks such as GLUE and SuperGLUE, and there are only a few investigations on whether LLMs can draw analogies between long texts. In this paper, we present ANALOGICAL, a new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of long text with six levels of complexity -- (i) word, (ii) word vs. sentence, (iii) syntactic, (iv) negation, (v) entailment, and (vi) metaphor. Using thirteen datasets and three different distance measures, we evaluate the abilities of eight LLMs in identifying analogical pairs in the semantic vector space. Our evaluation finds that it is increasingly challenging for LLMs to identify analogies when going up the analogy taxonomy.

翻译：过去十年间，词级类比作为词嵌入方法（如word2vec）质量的内在评估指标发挥了重要作用。然而，现代大语言模型主要基于GLUE和SuperGLUE等基准的外在指标进行评测，鲜有研究探究其能否在长文本间建立类比关系。本文提出ANALOGICAL——一个全新基准，通过六个复杂度层级——(i) 词汇、(ii) 词汇vs句子、(iii) 句法、(iv) 否定、(v) 蕴含、(vi) 隐喻——对长文本类比进行系统分类，实现LLM的内在评估。我们采用十三个数据集与三种不同距离度量，于语义向量空间中评测了八种LLM识别类比对的能力。研究发现，随着类比层级复杂度递增，LLM识别类比的难度呈显著上升趋势。

1

相关内容

语言模型化

语言模型化

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于CAS-CA建模的山地城市适应性规划分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

列车振动荷载下海底盾构隧道交叉结构动力特性及车致损伤机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

玉米miR390a/b在雌穗发育及穗部性状杂种优势形成中的分子调控机理

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

甘肃金鳟生长性状候选基因的关联分析及功能标记开发

国家自然科学基金

0+阅读 · 2011年12月31日

面向产品知识的云知识建模研究

国家自然科学基金

3+阅读 · 2011年12月31日

专家检索资源获取与学习排序方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

桃果实中脱落酸受体ABAR的功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Provable Robust Watermarking for AI-Generated Text

Arxiv

0+阅读 · 2023年6月30日

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Arxiv

0+阅读 · 2023年6月29日

An approach for benchmarking the numerical solutions of stochastic compartmental models

Arxiv

0+阅读 · 2023年6月29日

Benchmarking Large Language Model Capabilities for Conditional Generation

Arxiv

0+阅读 · 2023年6月29日

Relevant Entity Selection: Knowledge Graph Bootstrapping via Zero-Shot Analogical Pruning

Arxiv

0+阅读 · 2023年6月28日

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Arxiv

0+阅读 · 2023年6月28日

A Survey on Large Language Models for Recommendation

Arxiv

12+阅读 · 2023年5月31日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

24+阅读 · 2021年8月12日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

VIP会员

文章信息

相关主题

语言模型化

词向量表示

最新内容

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

1+阅读 · 今天13:56

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

1+阅读 · 今天13:54

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

3+阅读 · 今天8:18

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

3+阅读 · 今天7:39

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

9+阅读 · 今天7:33

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

3+阅读 · 今天7:28

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

6+阅读 · 今天7:14

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

18+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

7+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

8+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

8+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

8+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

7+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

6+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

6+阅读 · 6月14日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

多模态代码智能综述：从视觉输入到可执行代码系统

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

相关论文

Provable Robust Watermarking for AI-Generated Text

Arxiv

0+阅读 · 2023年6月30日

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Arxiv

0+阅读 · 2023年6月29日

An approach for benchmarking the numerical solutions of stochastic compartmental models

Arxiv

0+阅读 · 2023年6月29日

Benchmarking Large Language Model Capabilities for Conditional Generation

Arxiv

0+阅读 · 2023年6月29日

Relevant Entity Selection: Knowledge Graph Bootstrapping via Zero-Shot Analogical Pruning

Arxiv

0+阅读 · 2023年6月28日

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Arxiv

0+阅读 · 2023年6月28日

A Survey on Large Language Models for Recommendation

Arxiv

12+阅读 · 2023年5月31日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

24+阅读 · 2021年8月12日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

相关基金

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于CAS-CA建模的山地城市适应性规划分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

列车振动荷载下海底盾构隧道交叉结构动力特性及车致损伤机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

玉米miR390a/b在雌穗发育及穗部性状杂种优势形成中的分子调控机理

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

甘肃金鳟生长性状候选基因的关联分析及功能标记开发

国家自然科学基金

0+阅读 · 2011年12月31日

面向产品知识的云知识建模研究

国家自然科学基金

3+阅读 · 2011年12月31日

专家检索资源获取与学习排序方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

桃果实中脱落酸受体ABAR的功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员