低资源语言自动文本摘要方法比较研究 (Comparing Approaches to Automatic Summarization in Less-Resourced Languages) - 专知论文

会员服务 ·

0

低资源 · 自动文本摘要 · 文本摘要 · 方法比较 · 样本 ·

2025 年 12 月 30 日

Comparing Approaches to Automatic Summarization in Less-Resourced Languages

翻译：低资源语言自动文本摘要方法比较研究

Chester Palen-Michel,Constantine Lignos

from arxiv, Under review

Automatic text summarization has achieved high performance in high-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced languages. This work compares a variety of different approaches to summarization from zero-shot prompting of LLMs large and small to fine-tuning smaller models like mT5 with and without three data augmentation approaches and multilingual transfer. We also explore an LLM translation pipeline approach, translating from the source language to English, summarizing and translating back. Evaluating with five different metrics, we find that there is variation across LLMs in their performance across similar parameter sizes, that our multilingual fine-tuned mT5 baseline outperforms most other approaches including zero-shot LLM performance for most metrics, and that LLM as judge may be less reliable on less-resourced languages.

翻译：自动文本摘要技术在高资源语言（如英语）中已取得优异性能，但对低资源语言摘要任务的关注相对不足。本研究系统比较了多种摘要方法：从大规模与小规模语言模型的零样本提示，到结合三种数据增强策略与多语言迁移的mT5等小型模型微调方法，并探索了基于LLM的翻译流水线方法（将源语言文本翻译为英语、执行摘要、再回译至源语言）。通过五项评估指标的综合分析，我们发现：相似参数量级的LLM在不同指标上存在性能差异；经多语言微调的mT5基线模型在多数指标上优于包括零样本LLM在内的大多数方法；针对低资源语言，基于LLM的自动评估机制可能存在可靠性不足的问题。

0

相关内容

低资源

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

专知会员服务

15+阅读 · 2022年3月11日

【AAAI 2022】用于文本摘要任务的序列级对比学习模型

【AAAI 2022】用于文本摘要任务的序列级对比学习模型

专知会员服务

25+阅读 · 2022年1月11日

自然语言处理中的文本表示研究

自然语言处理中的文本表示研究

专知会员服务

58+阅读 · 2022年1月10日

自动文本摘要研究综述

自动文本摘要研究综述

专知会员服务

68+阅读 · 2021年1月31日

稀缺资源语言神经网络机器翻译研究综述

稀缺资源语言神经网络机器翻译研究综述

专知会员服务

27+阅读 · 2020年12月2日

最新《低资源自然语言处理》综述论文，21页pdf

最新《低资源自然语言处理》综述论文，21页pdf

专知会员服务

61+阅读 · 2020年10月27日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

专知会员服务

80+阅读 · 2020年3月5日

【图机器学习论文】图摘要方法与应用综述（Graph Summarization Methods and Applications: A Survey）

【图机器学习论文】图摘要方法与应用综述（Graph Summarization Methods and Applications: A Survey）

专知会员服务

42+阅读 · 2019年12月16日

【论文笔记】韩家炜团队AutoPhrase：从大量文本库中自动挖掘短语

【论文笔记】韩家炜团队AutoPhrase：从大量文本库中自动挖掘短语

专知

41+阅读 · 2019年11月2日

【ACL】文本摘要研究工作总结

【ACL】文本摘要研究工作总结

中国人工智能学会

30+阅读 · 2019年8月10日

面试题：文本摘要中的NLP技术

面试题：文本摘要中的NLP技术

七月在线实验室

15+阅读 · 2019年5月13日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

用深度学习做文本摘要

用深度学习做文本摘要

专知

24+阅读 · 2019年3月30日

微软最新论文解读 | 基于预训练自然语言生成的文本摘要方法

微软最新论文解读 | 基于预训练自然语言生成的文本摘要方法

PaperWeekly

14+阅读 · 2019年3月18日

独家 | 基于TextRank算法的文本摘要（附Python代码）

独家 | 基于TextRank算法的文本摘要（附Python代码）

数据派THU

14+阅读 · 2018年12月21日

自动文本摘要

自动文本摘要

AI研习社

21+阅读 · 2018年10月27日

干货｜当深度学习遇见自动文本摘要，seq2seq+attention

干货｜当深度学习遇见自动文本摘要，seq2seq+attention

机器学习算法与Python学习

10+阅读 · 2018年5月28日

TextInfoExp:自然语言处理相关实验（基于sougou数据集）

TextInfoExp:自然语言处理相关实验（基于sougou数据集）

全球人工智能

12+阅读 · 2017年11月12日

“自然语言-草图”耦合的地理场景查询方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

多标记文本数据流分类方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于复杂语义的个性化图像集摘要研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于生态演替的文本大数据特征学习研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向异构数据库的查询语言设计及其基础理论研究

国家自然科学基金

1+阅读 · 2015年12月31日

中文句子语义概念图自动构建方法及应用研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于概率图的文本检索模型及算法研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向词汇功能的学术文本语义识别与知识图谱构建

国家自然科学基金

5+阅读 · 2014年12月31日

基于深度学习的机器译文质量估计方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

Large Multimodal Models for Low-Resource Languages: A Survey

Arxiv

0+阅读 · 2月2日

From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development

Arxiv

0+阅读 · 1月31日

Large Multimodal Models for Low-Resource Languages: A Survey

Arxiv

0+阅读 · 1月27日

StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs

Arxiv

0+阅读 · 1月21日

Get away with less: Need of source side data curation to build parallel corpus for low resource Machine Translation

Arxiv

0+阅读 · 1月13日

Exploring Iterative Controllable Summarization with Large Language Models

Arxiv

0+阅读 · 1月7日

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Arxiv

0+阅读 · 1月5日

Bridging the Data Gap: Creating a Hindi Text Summarization Dataset from the English XSUM

Arxiv

0+阅读 · 1月4日

Large Multimodal Models for Low-Resource Languages: A Survey

Arxiv

0+阅读 · 2025年12月31日

Complementary Learning Approach for Text Classification using Large Language Models

Arxiv

0+阅读 · 2025年12月28日

VIP会员

文章信息

相关主题

自动文本摘要

相关VIP内容

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

专知会员服务

15+阅读 · 2022年3月11日

【AAAI 2022】用于文本摘要任务的序列级对比学习模型

【AAAI 2022】用于文本摘要任务的序列级对比学习模型

专知会员服务

25+阅读 · 2022年1月11日

自然语言处理中的文本表示研究

自然语言处理中的文本表示研究

专知会员服务

58+阅读 · 2022年1月10日

自动文本摘要研究综述

自动文本摘要研究综述

专知会员服务

68+阅读 · 2021年1月31日

稀缺资源语言神经网络机器翻译研究综述

稀缺资源语言神经网络机器翻译研究综述

专知会员服务

27+阅读 · 2020年12月2日

最新《低资源自然语言处理》综述论文，21页pdf

最新《低资源自然语言处理》综述论文，21页pdf

专知会员服务

61+阅读 · 2020年10月27日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

专知会员服务

80+阅读 · 2020年3月5日

【图机器学习论文】图摘要方法与应用综述（Graph Summarization Methods and Applications: A Survey）

【图机器学习论文】图摘要方法与应用综述（Graph Summarization Methods and Applications: A Survey）

专知会员服务

42+阅读 · 2019年12月16日

热门VIP内容

开通专知VIP会员享更多权益服务

论学习、公平性与复杂度

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

2025中国人工智能学会系列白皮书⸺棋盘上的人工智能|附下载

通用智能体评估的逻辑架构

相关资讯

【论文笔记】韩家炜团队AutoPhrase：从大量文本库中自动挖掘短语

【论文笔记】韩家炜团队AutoPhrase：从大量文本库中自动挖掘短语

专知

41+阅读 · 2019年11月2日

【ACL】文本摘要研究工作总结

【ACL】文本摘要研究工作总结

中国人工智能学会

30+阅读 · 2019年8月10日

面试题：文本摘要中的NLP技术

面试题：文本摘要中的NLP技术

七月在线实验室

15+阅读 · 2019年5月13日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

用深度学习做文本摘要

用深度学习做文本摘要

专知

24+阅读 · 2019年3月30日

微软最新论文解读 | 基于预训练自然语言生成的文本摘要方法

微软最新论文解读 | 基于预训练自然语言生成的文本摘要方法

PaperWeekly

14+阅读 · 2019年3月18日

独家 | 基于TextRank算法的文本摘要（附Python代码）

独家 | 基于TextRank算法的文本摘要（附Python代码）

数据派THU

14+阅读 · 2018年12月21日

自动文本摘要

自动文本摘要

AI研习社

21+阅读 · 2018年10月27日

干货｜当深度学习遇见自动文本摘要，seq2seq+attention

干货｜当深度学习遇见自动文本摘要，seq2seq+attention

机器学习算法与Python学习

10+阅读 · 2018年5月28日

TextInfoExp:自然语言处理相关实验（基于sougou数据集）

TextInfoExp:自然语言处理相关实验（基于sougou数据集）

全球人工智能

12+阅读 · 2017年11月12日

相关论文

Large Multimodal Models for Low-Resource Languages: A Survey

Arxiv

0+阅读 · 2月2日

From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development

Arxiv

0+阅读 · 1月31日

Large Multimodal Models for Low-Resource Languages: A Survey

Arxiv

0+阅读 · 1月27日

StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs

Arxiv

0+阅读 · 1月21日

Get away with less: Need of source side data curation to build parallel corpus for low resource Machine Translation

Arxiv

0+阅读 · 1月13日

Exploring Iterative Controllable Summarization with Large Language Models

Arxiv

0+阅读 · 1月7日

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Arxiv

0+阅读 · 1月5日

Bridging the Data Gap: Creating a Hindi Text Summarization Dataset from the English XSUM

Arxiv

0+阅读 · 1月4日

Large Multimodal Models for Low-Resource Languages: A Survey

Arxiv

0+阅读 · 2025年12月31日

Complementary Learning Approach for Text Classification using Large Language Models

Arxiv

0+阅读 · 2025年12月28日

相关基金

“自然语言-草图”耦合的地理场景查询方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

多标记文本数据流分类方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于复杂语义的个性化图像集摘要研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于生态演替的文本大数据特征学习研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向异构数据库的查询语言设计及其基础理论研究

国家自然科学基金

1+阅读 · 2015年12月31日

中文句子语义概念图自动构建方法及应用研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于概率图的文本检索模型及算法研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向词汇功能的学术文本语义识别与知识图谱构建

国家自然科学基金

5+阅读 · 2014年12月31日

基于深度学习的机器译文质量估计方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员