The transmission electron microscope facilitates the highest-resolution imaging of any instrument ever created, and its limiting factor is no longer spatial resolution but dose efficiency. Low electron doses avoid sample damage but produce noisy images for which, unlike in classical computer vision, there is no ground truth. Autonomous materials experimentation poses a related problem, since closed-loop instruments need representations grounded in the microscope state at acquisition. Both demand representations grounded in how an image was acquired. We release 7,330 paired high-angle annular dark-field scanning-TEM (HAADF-STEM) images and their seven-dimensional acquisition metadata, and propose Contrastive Image-Metadata Pre-training (CIMP), a CLIP-style encoder that aligns the two modalities and reaches 84.4% Top-1 cross-modal retrieval on a held-out split. All seven parameters are individually recoverable from the frozen visual embedding through a linear probe, and we use the embedding to condition a metadata-conditioned style-transfer model that re-renders experimental images under different acquisition parameters. Virtually scaling dwell time and beam current of low-dose images turns this model into a physics-informed denoiser; in a blind user study, experimental microscopists prefer it over the current state-of-the-art denoiser for STEM imagery on 70.2% of trials.
翻译:透射电子显微镜实现了人类有史以来最高分辨率的成像,其限制因素已不再是空间分辨率,而是剂量效率。低电子剂量虽可避免样品损伤,但会产生噪声图像——与经典计算机视觉不同,这类图像不存在真实标签。自主材料实验也面临类似问题:闭环仪器需要基于采集时的显微镜状态进行表征。两者都要求表征方法建立在图像采集方式的基础上。我们发布了7,330对高角环形暗场扫描透射电子显微镜(HAADF-STEM)图像及其七维采集元数据,并提出对比图像-元数据预训练(CIMP)方法——这是一种类CLIP的编码器,可对齐两种模态,在独立测试集上达到84.4%的Top-1跨模态检索准确率。通过线性探针可从冻结的视觉嵌入中单独恢复全部七个参数,我们利用该嵌入训练了元数据条件风格迁移模型,该模型可在不同采集参数下重绘实验图像。将低剂量图像的驻留时间与束流进行虚拟缩放,该模型便成为基于物理信息的去噪器;在盲测用户研究中,实验显微学家在70.2%的测试中更倾向于选择该模型,而非当前最先进的STEM图像去噪器。