Analysis of clinical data is a cornerstone of biomedical research with applications in areas such as genomic testing and response characterization of therapeutic drugs. Maintaining strict privacy controls is essential because such data typically contains personally identifiable health information of patients. At the same time, regulatory compliance often requires study managers to demonstrate the integrity and authenticity of participant data used in analyses. Balancing these competing requirements, privacy preservation and verifiable accountability, remains a critical challenge. In this paper, we present CoSMeTIC, a zero-knowledge computational framework that proposes computational Sparse Merkle Trees (SMTs) as a means to generate verifiable inclusion and exclusion proofs for individual participants' data in clinical studies. We formally analyze the zero-knowledge properties of CoSMeTIC and evaluate its computational efficiency through extensive experiments. Using the Kolmogorov-Smirnov and likelihood-ratio hypothesis tests, along with logistic-regression-based genomic analyses on real-world Huntington's disease datasets, we demonstrate that CoSMeTIC achieves strong privacy guarantees while maintaining statistical fidelity. Our results suggest that CoSMeTIC provides a scalable and practical alternative for achieving regulatory compliance with rigorous privacy protection in large-scale clinical research.
翻译:临床数据分析是生物医学研究的基石,应用于基因组检测和治疗药物反应表征等领域。由于此类数据通常包含患者的个人可识别健康信息,维持严格的隐私控制至关重要。同时,法规遵从性通常要求研究管理者证明分析所用参与者数据的完整性和真实性。在隐私保护与可验证问责这两个相互竞争的需求之间取得平衡,仍然是一个关键挑战。本文提出CoSMeTIC——一个零知识计算框架,该框架提出将计算稀疏默克尔树(SMTs)作为生成临床研究中个体参与者数据的可验证包含证明与排除证明的方法。我们形式化分析了CoSMeTIC的零知识特性,并通过大量实验评估了其计算效率。利用Kolmogorov-Smirnov检验和似然比假设检验,结合对真实世界亨廷顿病数据集的基于逻辑回归的基因组分析,我们证明CoSMeTIC在保持统计保真度的同时实现了强大的隐私保证。我们的结果表明,CoSMeTIC为在大规模临床研究中实现严格隐私保护下的法规遵从性提供了一个可扩展且实用的替代方案。