ClassifyViStA:WCE Classification with Visual understanding through Segmentation and Attention

Gastrointestinal (GI) bleeding is a serious medical condition that presents significant diagnostic challenges, particularly in settings with limited access to healthcare resources. Wireless Capsule Endoscopy (WCE) has emerged as a powerful diagnostic tool for visualizing the GI tract, but it requires time-consuming manual analysis by experienced gastroenterologists, which is prone to human error and inefficient given the increasing number of patients.To address this challenge, we propose ClassifyViStA, an AI-based framework designed for the automated detection and classification of bleeding and non-bleeding frames from WCE videos. The model consists of a standard classification path, augmented by two specialized branches: an implicit attention branch and a segmentation branch.The attention branch focuses on the bleeding regions, while the segmentation branch generates accurate segmentation masks, which are used for classification and interpretability. The model is built upon an ensemble of ResNet18 and VGG16 architectures to enhance classification performance. For the bleeding region detection, we implement a Soft Non-Maximum Suppression (Soft NMS) approach with YOLOv8, which improves the handling of overlapping bounding boxes, resulting in more accurate and nuanced detections.The system's interpretability is enhanced by using the segmentation masks to explain the classification results, offering insights into the decision-making process similar to the way a gastroenterologist identifies bleeding regions. Our approach not only automates the detection of GI bleeding but also provides an interpretable solution that can ease the burden on healthcare professionals and improve diagnostic efficiency. Our code is available at ClassifyViStA.

翻译：胃肠道出血是一种严重的医学病症，其诊断面临重大挑战，尤其是在医疗资源有限的环境中。无线胶囊内窥镜已成为可视化胃肠道的有力诊断工具，但它需要经验丰富的胃肠病学家进行耗时的手动分析，鉴于患者数量不断增加，这种方法容易出错且效率低下。为应对这一挑战，我们提出了ClassifyViStA，这是一个基于人工智能的框架，旨在自动检测和分类WCE视频中的出血与非出血帧。该模型由一个标准分类路径构成，并辅以两个专门分支：一个隐式注意力分支和一个分割分支。注意力分支专注于出血区域，而分割分支则生成精确的分割掩码，用于分类和可解释性。该模型基于ResNet18和VGG16架构的集成构建，以提升分类性能。对于出血区域检测，我们采用基于YOLOv8的软非极大值抑制方法，该方法改善了重叠边界框的处理，从而实现了更精确和细致的检测。通过使用分割掩码来解释分类结果，系统的可解释性得到增强，提供了对决策过程的深入洞察，类似于胃肠病学家识别出血区域的方式。我们的方法不仅自动化了胃肠道出血的检测，还提供了一种可解释的解决方案，可以减轻医疗专业人员的负担并提高诊断效率。我们的代码可在ClassifyViStA获取。