In the past decades, automated high-content microscopy demonstrated its ability to deliver large quantities of image-based data powering the versatility of phenotypic drug screening and systems biology applications. However, as the sizes of image-based datasets grew, it became infeasible for humans to control, avoid and overcome the presence of imaging and sample preparation artefacts in the images. While novel techniques like machine learning and deep learning may address these shortcomings through generative image inpainting, when applied to sensitive research data this may come at the cost of undesired image manipulation. Undesired manipulation may be caused by phenomena such as neural hallucinations, to which some artificial neural networks are prone. To address this, here we evaluate the state-of-the-art inpainting methods for image restoration in a high-content fluorescence microscopy dataset of cultured cells with labelled nuclei. We show that architectures like DeepFill V2 and Edge Connect can faithfully restore microscopy images upon fine-tuning with relatively little data. Our results demonstrate that the area of the region to be restored is of higher importance than shape. Furthermore, to control for the quality of restoration, we propose a novel phenotype-preserving metric design strategy. In this strategy, the size and count of the restored biological phenotypes like cell nuclei are quantified to penalise undesirable manipulation. We argue that the design principles of our approach may also generalise to other applications.
翻译:在过去几十年中,自动化高内涵显微镜技术展示了其提供大量基于图像的數據的能力,为表型药物筛选和系统生物学应用的多功能性提供了支持。然而,随着基于图像数据集的规模增长,人类已无法有效控制、避免和克服图像中的成像及样品制备伪影。尽管机器学习与深度学习等新技术可通过生成式图像填充来解决这些缺陷,但当应用于敏感研究数据时,这可能导致非期望的图像操作。此类非期望操作可能源于某些人工神经网络易出现的“神经幻觉”现象。为此,本研究评估了当前最先进的图像修复方法在培养细胞(标记细胞核)的高内涵荧光显微镜数据集上的图像复原效果。我们表明,像DeepFill V2和Edge Connect这样的架构在通过相对少量数据进行微调后,能够可靠地复原显微镜图像。我们的结果证明,待修复区域的面积比形状更为重要。此外,为了控制修复质量,我们提出了一种新颖的保留表型度量设计策略。该策略通过量化已修复生物表型(如细胞核)的大小和数量来惩罚非期望操作。我们认为,本方法的设计原则也可推广至其他应用场景。