Robust preprocessing is rarely quantified in deep-learning pipelines for low-dose CT (LDCT) lung cancer screening. We develop and validate Virtual-Eyes, a clinically motivated 16-bit CT quality-control pipeline, and measure its differential impact on generalist foundation models versus specialist models. Virtual-Eyes enforces strict 512x512 in-plane resolution, rejects short or non-diagnostic series, and extracts a contiguous lung block using Hounsfield-unit filtering and bilateral lung-coverage scoring while preserving the native 16-bit grid. Using 765 NLST patients (182 cancer, 583 non-cancer), we compute slice-level embeddings from RAD-DINO and Merlin with frozen encoders and train leakage-free patient-level MLP heads; we also evaluate Sybil and a 2D ResNet-18 baseline under Raw versus Virtual-Eyes inputs without backbone retraining. Virtual-Eyes improves RAD-DINO slice-level AUC from 0.576 to 0.610 and patient-level AUC from 0.646 to 0.683 (mean pooling) and from 0.619 to 0.735 (max pooling), with improved calibration (Brier score 0.188 to 0.112). In contrast, Sybil and ResNet-18 degrade under Virtual-Eyes (Sybil AUC 0.886 to 0.837; ResNet-18 AUC 0.571 to 0.596) with evidence of context dependence and shortcut learning, and Merlin shows limited transferability (AUC approximately 0.507 to 0.567) regardless of preprocessing. These results demonstrate that anatomically targeted QC can stabilize and improve generalist foundation-model workflows but may disrupt specialist models adapted to raw clinical context.
翻译:在低剂量CT(LDCT)肺癌筛查的深度学习流程中,稳健的预处理环节鲜少得到量化评估。我们开发并验证了Virtual-Eyes,这是一个基于临床动机的16位CT质量控制流程,并测量了其对通用基础模型与专用模型的差异化影响。Virtual-Eyes强制执行严格的512x512平面内分辨率,剔除过短或非诊断性序列,并通过亨氏单位滤波和双侧肺部覆盖评分提取连续的肺部区块,同时保留原始的16位数据网格。使用765名NLST患者数据(182例癌症,583例非癌症),我们通过冻结编码器计算了RAD-DINO和Merlin的切片级嵌入表示,并训练了无信息泄露的患者级MLP头部;同时,我们在未重新训练主干网络的情况下,评估了Sybil和2D ResNet-18基线模型在原始输入与Virtual-Eyes输入下的表现。Virtual-Eyes将RAD-DINO的切片级AUC从0.576提升至0.610,患者级AUC从0.646提升至0.683(平均池化)以及从0.619提升至0.735(最大池化),并改善了校准性能(Brier分数从0.188降至0.112)。相比之下,Sybil和ResNet-18在Virtual-Eyes处理下性能下降(Sybil AUC从0.886降至0.837;ResNet-18 AUC从0.571降至0.596),显示出上下文依赖性和捷径学习的迹象;而Merlin则表现出有限的迁移能力(AUC约从0.507至0.567),不受预处理方式的影响。这些结果表明,针对解剖结构的质量控制可以稳定并改进通用基础模型的工作流程,但可能破坏已适应原始临床环境的专用模型。