Quantitative evidence synthesis methods aim to combine data from multiple medical trials to infer relative effects of different interventions. A challenge arises when trials report continuous outcomes on different measurement scales. To include all evidence in one coherent analysis, we require methods to `map' the outcomes onto a single scale. This is particularly challenging when trials report aggregate rather than individual data. We are motivated by a meta-analysis of interventions to prevent obesity in children. Trials report aggregate measurements of body mass index (BMI) either expressed as raw values or standardised for age and sex. We develop three methods for mapping between aggregate BMI data using known relationships between individual measurements on different scales. The first is an analytical method based on the mathematical definitions of z-scores and percentiles. The other two approaches involve sampling individual participant data on which to perform the conversions. One method is a straightforward sampling routine, while the other involves optimization with respect to the reported outcomes. In contrast to the analytical approach, these methods also have wider applicability for mapping between any pair of measurement scales with known or estimable individual-level relationships. We verify and contrast our methods using trials from our data set which report outcomes on multiple scales. We find that all methods recreate mean values with reasonable accuracy, but for standard deviations, optimization outperforms the other methods. However, the optimization method is more likely to underestimate standard deviations and is vulnerable to non-convergence.
翻译:定量证据合成方法旨在整合多个医学试验的数据,以推断不同干预措施的相对效果。当试验采用不同测量尺度报告连续型结局指标时,挑战随之而来。为将所有证据纳入统一分析,我们需要将结局指标“映射”至同一尺度的方法。当试验报告汇总数据而非个体数据时,这一挑战尤为突出。本研究以儿童肥胖预防干预措施的元分析为动机:试验报告的身体质量指数(BMI)汇总测量值,或以原始值形式呈现,或经年龄和性别标准化处理。我们利用不同尺度个体测量值间的已知关系,开发了三种映射汇总BMI数据的方法。第一种是基于z分数和百分位数数学定义的解析方法。另外两种方法涉及采样个体参与者数据以进行转换:一种为直接采样流程,另一种则需针对报告结局进行优化。相较于解析方法,后两种方法在已知或可估计个体水平关系的任意两种测量尺度间的映射中具有更广泛的适用性。我们利用数据集中报告多尺度结局的试验验证并对比了这些方法。结果表明,所有方法均能合理精确地重建平均值,但在标准差方面,优化方法优于其他方法。然而,优化方法更倾向于低估标准差,且存在不收敛风险。