Respiratory viral infections pose a global health burden, yet the cellular immune mechanisms underlying protection and pathology remain unclear. Natural infection cohorts often lack pre-exposure baselines and time-controlled sampling, whereas inoculation and vaccination trials generate well-structured longitudinal transcriptomic data. However, these datasets are scattered across repositories and processed inconsistently, hindering integrative and AI-driven analyses. To address these challenges, we developed the Human Respiratory Viral Immunization LongitudinAl Gene Expression (HR-VILAGE-3K3M) repository: an AI-ready resource integrating bulk and single-cell transcriptomic profiles from 3,178 subjects across 66 studies. The dataset spans vaccination, inoculation, and mixed exposures, with samples from blood and nasal swabs collected from public repositories including GEO, ImmPort, and ArrayExpress. We curated and harmonized subject-level metadata, standardized outcome measures, and applied unified preprocessing with rigorous quality control. We further provide benchmark analyses illustrating its utility. This resource supports discovery of biomarkers, immune mechanisms, and methodological development. As one of the largest longitudinal transcriptomic resources for human respiratory viral immunization, HR-VILAGE-3K3M enables reproducible and scalable analyses to accelerate vaccine and antiviral research.
翻译:呼吸道病毒感染构成全球健康负担,但其保护机制与病理机制所依赖的细胞免疫机制尚不明确。自然感染队列常缺乏暴露前基线数据和时序可控采样,而接种与免疫攻毒试验则可生成结构良好的纵向转录组数据。然而,这些数据集分散存储于不同数据库且处理流程不统一,阻碍了整合分析与基于人工智能的深入研究。为应对上述挑战,我们构建了人类呼吸道病毒免疫纵向基因表达(HR-VILAGE-3K3M)数据库:该人工智能就绪资源整合了来自66项研究的3178名受试者的批量及单细胞转录组图谱。数据集涵盖疫苗接种、免疫攻毒及混合暴露类型,样本采集自血液和鼻拭子,来源于GEO、ImmPort和ArrayExpress等公共数据库。我们通过人工筛选与规范化处理受试者层级元数据、标准化结局指标,并应用统一预处理流程伴随严格质量控制。此外,我们提供了基准分析以阐明其应用价值。该资源可支持生物标志物发现、免疫机制解析及方法学开发。作为针对人类呼吸道病毒免疫的最大规模纵向转录组资源之一,HR-VILAGE-3K3M将推动可重复、可扩展的分析研究,从而加速疫苗及抗病毒药物研发。