In the past decades, automated high-content microscopy demonstrated its ability to deliver large quantities of image-based data powering the versatility of phenotypic drug screening and systems biology applications. However, as the sizes of image-based datasets grew, it became infeasible for humans to control, avoid and overcome the presence of imaging and sample preparation artefacts in the images. While novel techniques like machine learning and deep learning may address these shortcomings through generative image inpainting, when applied to sensitive research data this may come at the cost of undesired image manipulation. Undesired manipulation may be caused by phenomena such as neural hallucinations, to which some artificial neural networks are prone. To address this, here we evaluate the state-of-the-art inpainting methods for image restoration in a high-content fluorescence microscopy dataset of cultured cells with labelled nuclei. We show that architectures like DeepFill V2 and Edge Connect can faithfully restore microscopy images upon fine-tuning with relatively little data. Our results demonstrate that the area of the region to be restored is of higher importance than shape. Furthermore, to control for the quality of restoration, we propose a novel phenotype-preserving metric design strategy. In this strategy, the size and count of the restored biological phenotypes like cell nuclei are quantified to penalise undesirable manipulation. We argue that the design principles of our approach may also generalise to other applications.
翻译:过去数十年间,自动化高内涵显微镜技术展现出其生成大量基于图像数据的能力,为表型药物筛选和系统生物学应用的多样性提供了动力。然而,随着基于图像数据集规模的扩大,人类已无法有效控制、避免并克服图像中成像与样本制备伪影的存在。尽管机器学习与深度学习等新技术可通过生成式图像修复解决上述缺陷,但当应用于敏感研究数据时,可能以非预期的图像操作为代价。此类非预期操作可能由部分人工神经网络易出现的神经幻觉等现象引发。为解决这一问题,本研究在培养细胞(细胞核已标记)的高内涵荧光显微镜数据集中,评估了当前最先进的图像修复方法。结果表明,DeepFill V2和Edge Connect等架构在相对少量数据微调后,可忠实修复显微镜图像。我们的实验证明,待修复区域的面积比形状更为关键。此外,为控制修复质量,我们提出了一种新型保留表型的度量设计策略。该策略通过量化修复后生物学表型(如细胞核)的大小与数量,对非预期操作施加惩罚。我们认为,本方法的设计原则也可推广至其他应用场景。