We present H-RAG, our submission to SemEval-2026 Task 8 (MTRAGEval), addressing both Task A (Retrieval) and Task C (Generation with Retrieved Passages). Task A evaluates standalone retrieval quality, while Task C assesses end-to-end retrieval-augmented generation (RAG) in multi-turn conversational settings, requiring both accurate answer generation and faithful grounding in retrieved evidence. Our approach implements a hierarchical parent-child RAG pipeline that separates fine-grained child-level retrieval from parent-level context reconstruction during generation. Documents are segmented into overlapping sentence-based child chunks, while full documents are preserved as parent units to provide coherent context. Retrieval combines hybrid dense-sparse search, tunable weighting, and embedding-based similarity rescoring over child chunks. Retrieved evidence is aggregated at the parent level and supplied to an instruction-tuned language model for response generation. H-RAG achieves an nDCG@5 score of 0.4271 on Task A and a harmonic mean score of 0.3241 on Task C (RB_agg: 0.2488, RL_F: 0.2703, RB_llm: 0.6508), underscoring the importance of retrieval configuration and parent-level aggregation in multi-turn RAG performance.
翻译:我们提出H-RAG系统,这是我们对SemEval-2026任务八(MTRAGEval)的提交方案,同时解决任务A(检索)和任务C(基于检索段落的生成)。任务A评估独立检索质量,而任务C评估多轮对话场景下的端到端检索增强生成(RAG),要求同时实现准确答案生成与检索证据的忠实依据。我们的方法实现了一种层次化父子RAG流水线,该流水线在生成过程中将细粒度子节点级检索与父节点级上下文重构相分离。文档被分割为基于句子的重叠子节点块,同时保留完整文档作为父节点单元以提供连贯上下文。检索结合了混合稠密-稀疏搜索、可调权重以及基于嵌入的子节点块相似度重排序。检索到的证据在父节点层面聚合,并提供给指令微调语言模型用于响应生成。H-RAG在任务A上达到0.4271的nDCG@5分数,在任务C上达到0.3241的调和平均分数(RB_agg: 0.2488, RL_F: 0.2703, RB_llm: 0.6508),凸显了检索配置与父节点级聚合在多轮RAG性能中的关键作用。