We describe team ielab from CSIRO and The University of Queensland's approach to the 2023 TREC Clinical Trials Track. Our approach was to use neural rankers but to utilise Large Language Models to overcome the issue of lack of training data for such rankers. Specifically, we employ ChatGPT to generate relevant patient descriptions for randomly selected clinical trials from the corpus. This synthetic dataset, combined with human-annotated training data from previous years, is used to train both dense and sparse retrievers based on PubmedBERT. Additionally, a cross-encoder re-ranker is integrated into the system. To further enhance the effectiveness of our approach, we prompting GPT-4 as a TREC annotator to provide judgments on our run files. These judgments are subsequently employed to re-rank the results. This architecture tightly integrates strong PubmedBERT-based rankers with the aid of SOTA Large Language Models, demonstrating a new approach to clinical trial retrieval.
翻译:摘要:本文描述了来自 CSIRO 和昆士兰大学的 Team ielab 在 2023 年 TREC 临床试验赛道中的方法。我们采用神经排序器,并借助大语言模型来解决此类排序器缺乏训练数据的问题。具体而言,我们利用 ChatGPT 为语料库中随机选取的临床试验生成相关患者描述。该合成数据集结合往年人工标注的训练数据,用于训练基于 PubmedBERT 的稠密检索与稀疏检索模型。此外,系统集成了交叉编码器重排序器。为进一步提升方法效果,我们引导 GPT-4 充当 TREC 标注员,对输出文件进行判断,并利用这些判断结果对结果进行重排序。该架构将基于 PubmedBERT 的强排序器与前沿大语言模型紧密集成,展示了临床试验检索的新方法。