JP-TL-Bench：用于双向日英翻译的锚定配对大语言模型评估基准 (JP-TL-Bench: Anchored Pairwise LLM Evaluation for Bidirectional Japanese-English Translation) - 专知论文

会员服务 ·

0

基准 · 语言模型 · 模型评估 · 大语言模型 · 基准测试 ·

JP-TL-Bench: Anchored Pairwise LLM Evaluation for Bidirectional Japanese-English Translation

翻译：JP-TL-Bench：用于双向日英翻译的锚定配对大语言模型评估基准

Leonard Lin,Adam Lensenmayer

from arxiv, 24 pages, 5 figures, 8 tables

We introduce JP-TL-Bench, a lightweight, open benchmark designed to guide the iterative development of Japanese-English translation systems. In this context, the challenge is often "which of these two good translations is better?" rather than "is this translation acceptable?" This distinction matters for Japanese-English, where subtle choices in politeness, implicature, ellipsis, and register strongly affect perceived naturalness. JP-TL-Bench uses a protocol built to make LLM judging both reliable and affordable: it evaluates a candidate model via reference-free, pairwise LLM comparisons against a fixed, versioned anchor set. Pairwise results are aggregated with a Bradley-Terry model and reported as win rates plus a normalized 0-10 "LT" score derived from a logistic transform of fitted log-strengths. Because each candidate is scored against the same frozen anchor set, scores are structurally stable given the same base set, judge, and aggregation code.

翻译：我们介绍了JP-TL-Bench，一个轻量级、开放的基准测试，旨在指导日英翻译系统的迭代开发。在此背景下，挑战通常是“这两个好的翻译中哪一个更好？”，而不是“这个翻译可以接受吗？”。这种区别对于日英翻译至关重要，因为在礼貌程度、言外之意、省略和语域等方面的微妙选择会强烈影响感知的自然度。JP-TL-Bench采用一种旨在使大语言模型评判既可靠又经济的协议：它通过无参考、配对的大语言模型比较，将候选模型与一个固定的、版本化的锚定集合进行对比来评估。配对结果使用Bradley-Terry模型进行聚合，并报告为胜率以及一个归一化的0-10“LT”分数，该分数源自拟合对数强度的逻辑变换。由于每个候选模型都是针对同一个冻结的锚定集合进行评分，因此在给定相同的基础集合、评判模型和聚合代码的情况下，分数在结构上是稳定的。

0

相关内容

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

专知会员服务

78+阅读 · 2020年5月31日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

专知会员服务

51+阅读 · 2020年3月17日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

专知会员服务

219+阅读 · 2019年12月18日

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言中文社区

10+阅读 · 2019年4月6日

NLP Chinese Corpus：大规模中文自然语言处理语料

NLP Chinese Corpus：大规模中文自然语言处理语料

PaperWeekly

14+阅读 · 2019年2月18日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

NLP自然语言处理（二）——基础文本分析

NLP自然语言处理（二）——基础文本分析

乐享数据DataScientists

12+阅读 · 2017年2月7日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于形态和多词的有限语料蒙汉互译调序优化方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

Arxiv

0+阅读 · 1月6日

Qomhra: A Bilingual Irish and English Large Language Model

Qomhra: A Bilingual Irish and English Large Language Model

Arxiv

0+阅读 · 1月6日

TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios

Arxiv

0+阅读 · 1月6日

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Arxiv

0+阅读 · 1月5日

MACA: A Framework for Distilling Trustworthy LLMs into Efficient Retrievers

Arxiv

0+阅读 · 1月1日

VIP会员

文章信息

相关主题

大语言模型

相关VIP内容

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

专知会员服务

78+阅读 · 2020年5月31日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

专知会员服务

51+阅读 · 2020年3月17日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

专知会员服务

219+阅读 · 2019年12月18日

热门VIP内容

开通专知VIP会员享更多权益服务

面向现代战场的特种作战无人机网络

《面向无GPS及复杂环境的鲁棒自主探索导航框架》350页

《网络化部队中的任务式指挥：近期美海军与空军条令及作战概念对任务式指挥的采纳》最新报告

《俄乌战场：俄罗斯“沙希德”-136无人机部署月度分析（2025.12）》

相关资讯

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言中文社区

10+阅读 · 2019年4月6日

NLP Chinese Corpus：大规模中文自然语言处理语料

NLP Chinese Corpus：大规模中文自然语言处理语料

PaperWeekly

14+阅读 · 2019年2月18日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

NLP自然语言处理（二）——基础文本分析

NLP自然语言处理（二）——基础文本分析

乐享数据DataScientists

12+阅读 · 2017年2月7日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

相关论文

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

Arxiv

0+阅读 · 1月6日

Qomhra: A Bilingual Irish and English Large Language Model

Qomhra: A Bilingual Irish and English Large Language Model

Arxiv

0+阅读 · 1月6日

TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios

Arxiv

0+阅读 · 1月6日

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Arxiv

0+阅读 · 1月5日

MACA: A Framework for Distilling Trustworthy LLMs into Efficient Retrievers

Arxiv

0+阅读 · 1月1日

相关基金

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于形态和多词的有限语料蒙汉互译调序优化方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

微信扫码咨询专知VIP会员