Breast cancer is one of the leading causes of death for women worldwide. Early screening is essential for early identification, but the chance of survival declines as the cancer progresses into advanced stages. For this study, the most recent BRACS dataset of histological (H\&E) stained images was used to classify breast cancer tumours, which contains both the whole-slide images (WSI) and region-of-interest (ROI) images, however, for our study we have considered ROI images. We have experimented using different pre-trained deep learning models, such as Xception, EfficientNet, ResNet50, and InceptionResNet, pre-trained on the ImageNet weights. We pre-processed the BRACS ROI along with image augmentation, upsampling, and dataset split strategies. For the default dataset split, the best results were obtained by ResNet50 achieving 66% f1-score. For the custom dataset split, the best results were obtained by performing upsampling and image augmentation which results in 96.2% f1-score. Our second approach also reduced the number of false positive and false negative classifications to less than 3% for each class. We believe that our study significantly impacts the early diagnosis and identification of breast cancer tumors and their subtypes, especially atypical and malignant tumors, thus improving patient outcomes and reducing patient mortality rates. Overall, this study has primarily focused on identifying seven (7) breast cancer tumor subtypes, and we believe that the experimental models can be fine-tuned further to generalize over previous breast cancer histology datasets as well.
翻译:乳腺癌是全球女性死亡的主要原因之一。早期筛查对早期识别至关重要,但随着癌症进展至晚期,生存几率显著下降。本研究采用最新的BRACS组织学(H&E)染色图像数据集对乳腺肿瘤进行分类。该数据集包含全切片图像(WSI)和感兴趣区域(ROI)图像,本研究仅使用ROI图像。我们实验了多种预训练深度学习模型,包括基于ImageNet权重的Xception、EfficientNet、ResNet50和InceptionResNet。我们对BRACS ROI图像进行了预处理,并采用图像增强、上采样和数据集划分策略。在默认数据集划分下,ResNet50取得最佳结果,F1分数达66%。采用自定义数据集划分时,通过上采样和图像增强获得最佳结果,F1分数达96.2%。该方案还将每个类别的假阳性和假阴性分类错误率降低至3%以下。我们认为本研究对乳腺癌肿瘤及其亚型(尤其是非典型性和恶性肿瘤)的早期诊断与识别具有重要价值,可显著改善患者预后并降低死亡率。综上所述,本研究主要聚焦于七种乳腺肿瘤亚型的识别,且实验模型可通过进一步微调泛化至既往乳腺癌组织学数据集。