Learning study similarity to investigate heterogeneity in meta-analysis using LLMs and triplet loss

Meta-analyses of observational studies often show substantial between-study heterogeneity, limiting the interpretability of pooled estimates. Meta-regression can be used to explore heterogeneity, but it is often underpowered to handle multiple effect modifiers. We propose a novel framework that integrates large language models (LLMs) with deep metric learning to infer study-level similarity prior to meta-analysis. Study-level clinical and methodological characteristics were processed by an LLM to generate study triplets (anchor, similar, dissimilar). These triplets were constructed by treating each study as an anchor and comparing it with pairs of other studies to identify, in each instance, the study most similar to the anchor. Then, the triplets were used into an embedding model trained with triplet loss; a deep learning approach that learns an embedding space where clinically and methodologically similar studies are clustered together. We apply our framework to a meta-analysis dataset of 58 observational studies comparing cognitive outcomes between preterm- and term-born children. Subsequently, we fit meta-analysis models within the identified study clusters and compare the results with those of the overall analysis. Results suggested three clusters two of which retained considerable between-study heterogeneity. The remaining cluster comprised the most homogeneous group of studies and exhibited a more extreme pooled effect estimate together with a narrower prediction interval compared with the overall analysis. This work presents a novel approach for exploring heterogeneity in meta-analysis by incorporating study characteristics prior to model fitting. By transforming study information into a similarity space, the framework identifies coherent subgroups and supports more precise inference in heterogeneous real-world evidence.

翻译：观察性研究的荟萃分析通常表现出显著的研究间异质性，限制了合并估计值的可解释性。荟萃回归可用于探索异质性，但在处理多个效应修饰因子时往往统计功效不足。我们提出了一种新颖框架，将大语言模型与深度度量学习相结合，在荟萃分析之前推断研究层面的相似性。首先，通过大语言模型处理研究层面的临床和方法学特征，生成研究三元组（锚点、相似、不相似）。这些三元组通过将每项研究作为锚点，并与其他研究对进行比较来构建，以在每次比较中识别出与锚点最相似的研究。随后，将这些三元组输入到使用三重损失函数训练的嵌入模型中——这是一种深度学习技术，能够学习出临床和方法学研究相似的嵌入空间。我们将该框架应用于一个包含58项观察性研究的荟萃分析数据集，这些研究比较了早产儿与足月儿的认知结局。之后，我们在识别出的研究簇内拟合荟萃分析模型，并将结果与整体分析进行比较。结果表明存在三个簇，其中两个仍保留显著的研究间异质性。剩余簇由最同质的研究组构成，与整体分析相比，其合并效应估计值更为极端，且预测区间更窄。本研究提出了一种新颖方法，通过在模型拟合前纳入研究特征来探索荟萃分析中的异质性。通过将研究信息转化为相似性空间，该框架能够识别出连贯的亚组，并支持对异质性真实世界证据进行更精确的推断。