ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval - 专知论文

会员服务 ·

0

ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval

翻译：暂无翻译

Hyewon Choi,Jooyoung Choi,Hansol Jang,Hyun Kim,Chulmin Yun,ChangWook Jun,Stanley Jungkyu Choi

from arxiv, Accepted to SIGIR 2026

Neural retrievers are often trained on large-scale triplet data comprising a query, a positive passage, and a set of hard negatives. In practice, hard-negative mining can introduce false negatives and other ambiguous negatives, including passages that are relevant or contain partial answers to the query. Such label noise yields inconsistent supervision and can degrade retrieval effectiveness. We propose ARHN (Answer-centric Relabeling of Hard Negatives), a two-stage framework that leverages open-source LLMs to refine hard negative samples using answer-centric relevance signals. In the first stage, for each query-passage pair, ARHN prompts the LLM to generate a passage-grounded answer snippet or to indicate that the passage does not support an answer. In the second stage, ARHN applies an LLM-based listwise ranking over the candidate set to order passages by direct answerability to the query. Passages ranked above the original positive are relabeled to additional positives. Among passages ranked below the positive, ARHN excludes any that contain an answer snippet from the negative set to avoid ambiguous supervision. We evaluated ARHN on the BEIR benchmark under three configurations: relabeling only, filtering only, and their combination. Across datasets, the combined strategy consistently improves over either step in isolation, indicating that jointly relabeling false negatives and filtering ambiguous negatives yields cleaner supervision for training neural retrieval models. By relying strictly on open-source models, ARHN establishes a cost-effective and scalable refinement pipeline suitable for large-scale training.

翻译：暂无翻译

0

相关内容

【NeurIPS22系列】几何视角下 GNN 的拓扑知识表示与迁移

【NeurIPS22系列】几何视角下 GNN 的拓扑知识表示与迁移

专知会员服务

20+阅读 · 2022年12月7日

【KDD2022】弱监督图神经网络：标签结构联合预测解决数据缺失问题

【KDD2022】弱监督图神经网络：标签结构联合预测解决数据缺失问题

专知会员服务

29+阅读 · 2022年8月28日

【NeurIPS 2021】BernNet: 通过Bernstein学习任意图谱滤波器

专知会员服务

10+阅读 · 2021年10月1日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

专知会员服务

10+阅读 · 2020年2月12日

【USC-Sean (Xiang) Ren】用解释和先验知识快速学习（Learning from Explanations with Neural Execution Tree），47页ppt

【USC-Sean (Xiang) Ren】用解释和先验知识快速学习（Learning from Explanations with Neural Execution Tree），47页ppt

专知会员服务

21+阅读 · 2020年2月11日

八篇NeurIPS 2019【图神经网络（GNN）】相关论文

八篇NeurIPS 2019【图神经网络（GNN）】相关论文

专知会员服务

44+阅读 · 2020年1月10日

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

专知会员服务

13+阅读 · 2019年11月25日

【斯坦福大学NeuralPS2019】GNN解释器，GNNExplainer: Generating Explanations for Graph Neural Networks，斯坦福大学|Jure Leskovec

【斯坦福大学NeuralPS2019】GNN解释器，GNNExplainer: Generating Explanations for Graph Neural Networks，斯坦福大学|Jure Leskovec

专知会员服务

89+阅读 · 2019年10月13日

基于深度神经网络的关键词提取，Keywords extraction with DNN

基于深度神经网络的关键词提取，Keywords extraction with DNN

专知

10+阅读 · 2020年5月7日

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

专知

36+阅读 · 2019年9月29日

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

哈工大SCIR

17+阅读 · 2019年9月23日

【NeurIPS 2019】7篇自动化神经网络搜索(NAS)论文简读

【NeurIPS 2019】7篇自动化神经网络搜索(NAS)论文简读

中国人工智能学会

15+阅读 · 2019年9月13日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

赛尔原创 | ACL 2019 检索增强的对抗式回复生成

赛尔原创 | ACL 2019 检索增强的对抗式回复生成

哈工大SCIR

12+阅读 · 2019年7月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

OpenAI官方发布：强化学习中的关键论文

OpenAI官方发布：强化学习中的关键论文

专知

14+阅读 · 2018年12月12日

BERT 现已开源：最先进的 NLP 预训练技术，支持中文和更多语言

BERT 现已开源：最先进的 NLP 预训练技术，支持中文和更多语言

谷歌开发者

16+阅读 · 2018年11月6日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

CXCL12通过增强细胞间通讯调控神经血管单元促进脊髓损伤修复

国家自然科学基金

1+阅读 · 2017年12月31日

抗噪、抗假频叠前地震数据插值方法研究

国家自然科学基金

1+阅读 · 2016年12月31日

人Muse细胞诱导分化为神经前体细胞及功能性神经元并修复脊髓损伤

国家自然科学基金

0+阅读 · 2015年12月31日

RC框架-阶梯墙新型抗震结构关键问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

力感应细胞源性外泌体及其microRNA货物介导的细胞间通讯在骨重建稳态中调控机制及干预策略的研究

国家自然科学基金

0+阅读 · 2015年12月31日

IL-10修饰的骨髓来源的神经干细胞对视网膜缺血再灌注损伤的作用及机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

微管稳定剂缓释体增强脊髓固有束接力促进脊髓损伤的功能恢复

国家自然科学基金

0+阅读 · 2014年12月31日

多尺度NED/DEM生成的数字综合理论和关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ghrelin整合调控神经血管单元网络抑制脑缺血再灌注损伤并促进神经修复

国家自然科学基金

0+阅读 · 2014年12月31日

可重构的环境自适应RS码软判决译码器研究

国家自然科学基金

0+阅读 · 2014年12月31日

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

Arxiv

0+阅读 · 5月3日

Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

Arxiv

0+阅读 · 5月1日

Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

Arxiv

0+阅读 · 4月24日

Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval

Arxiv

0+阅读 · 4月6日

A Divide-and-Conquer Strategy for Hard-Label Extraction of Deep Neural Networks via Side-Channel Attacks

Arxiv

0+阅读 · 4月1日

How iteration order influences convergence and stability in deep learning

Arxiv

0+阅读 · 3月27日

On Neural Scaling Laws for Weather Emulation through Continual Training

Arxiv

0+阅读 · 3月26日

Deep Neural Regression Collapse

Arxiv

0+阅读 · 3月25日

ECI: Effective Contrastive Information to Evaluate Hard-Negatives

Arxiv

0+阅读 · 3月22日

rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks

Arxiv

0+阅读 · 3月18日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

9+阅读 · 5月5日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

5+阅读 · 5月5日

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

6+阅读 · 5月5日

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

7+阅读 · 5月5日

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

9+阅读 · 5月5日

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

14+阅读 · 5月5日

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

6+阅读 · 5月5日

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

9+阅读 · 5月5日

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

3+阅读 · 5月5日

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

3+阅读 · 5月5日

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

13+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

9+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

10+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

7+阅读 · 5月4日

相关VIP内容

【NeurIPS22系列】几何视角下 GNN 的拓扑知识表示与迁移

【NeurIPS22系列】几何视角下 GNN 的拓扑知识表示与迁移

专知会员服务

20+阅读 · 2022年12月7日

【KDD2022】弱监督图神经网络：标签结构联合预测解决数据缺失问题

【KDD2022】弱监督图神经网络：标签结构联合预测解决数据缺失问题

专知会员服务

29+阅读 · 2022年8月28日

【NeurIPS 2021】BernNet: 通过Bernstein学习任意图谱滤波器

专知会员服务

10+阅读 · 2021年10月1日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

专知会员服务

10+阅读 · 2020年2月12日

【USC-Sean (Xiang) Ren】用解释和先验知识快速学习（Learning from Explanations with Neural Execution Tree），47页ppt

【USC-Sean (Xiang) Ren】用解释和先验知识快速学习（Learning from Explanations with Neural Execution Tree），47页ppt

专知会员服务

21+阅读 · 2020年2月11日

八篇NeurIPS 2019【图神经网络（GNN）】相关论文

八篇NeurIPS 2019【图神经网络（GNN）】相关论文

专知会员服务

44+阅读 · 2020年1月10日

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

专知会员服务

13+阅读 · 2019年11月25日

【斯坦福大学NeuralPS2019】GNN解释器，GNNExplainer: Generating Explanations for Graph Neural Networks，斯坦福大学|Jure Leskovec

【斯坦福大学NeuralPS2019】GNN解释器，GNNExplainer: Generating Explanations for Graph Neural Networks，斯坦福大学|Jure Leskovec

专知会员服务

89+阅读 · 2019年10月13日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

基于深度神经网络的关键词提取，Keywords extraction with DNN

基于深度神经网络的关键词提取，Keywords extraction with DNN

专知

10+阅读 · 2020年5月7日

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

专知

36+阅读 · 2019年9月29日

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

哈工大SCIR

17+阅读 · 2019年9月23日

【NeurIPS 2019】7篇自动化神经网络搜索(NAS)论文简读

【NeurIPS 2019】7篇自动化神经网络搜索(NAS)论文简读

中国人工智能学会

15+阅读 · 2019年9月13日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

赛尔原创 | ACL 2019 检索增强的对抗式回复生成

赛尔原创 | ACL 2019 检索增强的对抗式回复生成

哈工大SCIR

12+阅读 · 2019年7月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

OpenAI官方发布：强化学习中的关键论文

OpenAI官方发布：强化学习中的关键论文

专知

14+阅读 · 2018年12月12日

BERT 现已开源：最先进的 NLP 预训练技术，支持中文和更多语言

BERT 现已开源：最先进的 NLP 预训练技术，支持中文和更多语言

谷歌开发者

16+阅读 · 2018年11月6日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

相关论文

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

Arxiv

0+阅读 · 5月3日

Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

Arxiv

0+阅读 · 5月1日

Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

Arxiv

0+阅读 · 4月24日

Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval

Arxiv

0+阅读 · 4月6日

A Divide-and-Conquer Strategy for Hard-Label Extraction of Deep Neural Networks via Side-Channel Attacks

Arxiv

0+阅读 · 4月1日

How iteration order influences convergence and stability in deep learning

Arxiv

0+阅读 · 3月27日

On Neural Scaling Laws for Weather Emulation through Continual Training

Arxiv

0+阅读 · 3月26日

Deep Neural Regression Collapse

Arxiv

0+阅读 · 3月25日

ECI: Effective Contrastive Information to Evaluate Hard-Negatives

Arxiv

0+阅读 · 3月22日

rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks

Arxiv

0+阅读 · 3月18日

相关基金

CXCL12通过增强细胞间通讯调控神经血管单元促进脊髓损伤修复

国家自然科学基金

1+阅读 · 2017年12月31日

抗噪、抗假频叠前地震数据插值方法研究

国家自然科学基金

1+阅读 · 2016年12月31日

人Muse细胞诱导分化为神经前体细胞及功能性神经元并修复脊髓损伤

国家自然科学基金

0+阅读 · 2015年12月31日

RC框架-阶梯墙新型抗震结构关键问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

力感应细胞源性外泌体及其microRNA货物介导的细胞间通讯在骨重建稳态中调控机制及干预策略的研究

国家自然科学基金

0+阅读 · 2015年12月31日

IL-10修饰的骨髓来源的神经干细胞对视网膜缺血再灌注损伤的作用及机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

微管稳定剂缓释体增强脊髓固有束接力促进脊髓损伤的功能恢复

国家自然科学基金

0+阅读 · 2014年12月31日

多尺度NED/DEM生成的数字综合理论和关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ghrelin整合调控神经血管单元网络抑制脑缺血再灌注损伤并促进神经修复

国家自然科学基金

0+阅读 · 2014年12月31日

可重构的环境自适应RS码软判决译码器研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员