MetaRanker: Human-in-the-loop Active Ranking for Metalens Image Quality

Image quality in modern imaging systems emerges from the coupled effects of the sensor, optics, and computational reconstruction. Ultra-thin metalenses offer a path toward substantial miniaturization of optical modules, but practical designs often exhibit pronounced chromatic and field-dependent aberrations that necessitate computational reconstruction. In current metalens pipelines, reconstruction models are commonly trained and selected using distortion-based fidelity objectives, such as PSNR, yet these proxies can be weakly correlated with human preference and downstream utility, reflecting the well-known perception--distortion trade-off. We introduce MetaRanker, a human-in-the-loop active ranking framework that formalizes metalens image quality in terms of semantic interpretability, defined as the degree to which humans can reliably recognize objects and structures in the presence of optical artifacts. MetaRanker combines a probabilistic preference model with uncertainty-aware query selection, and leverages vision--language models to provide lightweight semantic priors. Importantly, these priors are used only to guide the sampling of informative comparisons; human judgments remain the primary supervision signal throughout. Across real-world and synthetic metalens datasets with distinct degradation profiles, MetaRanker produces rankings that align most closely with human assessments, while reducing the number of pairwise annotations required by approximately 80% relative to exhaustive pairwise evaluation. Finally, we show that standard image quality assessment metrics exhibit limited alignment with human interpretability in the metalens domain, positioning MetaRanker as a practical step toward perceptually grounded metalens evaluation and co-design.

翻译：现代成像系统的图像质量源自传感器、光学元件与计算重建的耦合效应。超薄金属透镜为实现光学模组的微型化提供了可行路径，但实际设计中往往存在显著的色差和像场相关像差，亟需计算重建予以补偿。在现有金属透镜处理流程中，重建模型通常基于PSNR等失真导向的保真度目标进行训练与筛选，然而这些代理指标与人类偏好及下游效用的相关性较弱，体现了广为人知的感知-失真权衡。我们提出MetaRanker框架——一种人机协同的主动排序方法，通过语义可解释性来形式化定义金属透镜图像质量。该可解释性量化为人类在光学伪影存在条件下可靠识别目标与结构的程度。MetaRanker将概率偏好模型与不确定性感知的查询选择相结合，并利用视觉-语言模型提供轻量级语义先验。关键之处在于，这些先验仅用于引导具有信息量的对比采样，人类判断始终作为主要监督信号。在具有不同退化特征的真实与合成金属透镜数据集上，MetaRanker生成的排序结果与人类评估高度一致，同时将所需的成对标注数量相较于穷举式成对评估减少了约80%。最后，我们证明标准图像质量评估指标在金属透镜领域与人类可解释性的对齐程度有限，从而使MetaRanker成为实现感知驱动的金属透镜评估与协同设计的实用方案。