KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment

We present KAYRA, an end-to-end karyotyping system that operates inside the operational constraints of a clinical cytogenetic laboratory. KAYRA is architected as a containerized microservice pipeline whose ML stack combines an EfficientNet-B5 + U-Net semantic segmenter, a Mask R-CNN (ResNet-50 + FPN) instance detector, and a ResNet-18 classifier, orchestrated through a cascaded ROI-narrowing strategy that focuses each downstream model on the chromosome-bearing region. The same container images are deployed both as a cloud service and as an on-premise installation, supporting clinical environments where patient-data egress is not permitted as well as those where it is. A pilot clinical evaluation against two commercial reference karyotyping systems on 459 chromosomes from 10 metaphase spreads shows segmentation accuracy of 98.91 % (vs. 78.21 % / 40.52 %), classification accuracy of 89.1 % (vs. 86.9 % / 54.5 %), and rotation accuracy of 89.76 % (vs. 94.55 % / 78.43 %). KAYRA improves over the older density-thresholding reference on all three axes (p < 0.0001 for segmentation and classification by Fisher's exact test on chromosome-level counts), and on segmentation also against the modern AI- supported reference (p < 0.0001); on classification the difference vs. the modern AI reference is not statistically significant at the present test-set size (p = 0.34). The system reaches TRL 6 maturity and integrates the human-in-the-loop expert-review workflow that diagnostic cytogenetic practice requires. The thesis of this paper is that a multi-model cytogenetic AI service can be packaged as a microservice architecture supporting flexible deployment - cloud-hosted or on-premise - while delivering strong empirical performance on a pilot clinical evaluation.

翻译：我们提出KAYRA，一种端到端的核型分析系统，可在临床细胞遗传学实验室的运行约束下工作。KAYRA被设计为容器化微服务流水线，其机器学习堆栈结合了EfficientNet-B5 + U-Net语义分割器、Mask R-CNN（ResNet-50 + FPN）实例检测器和ResNet-18分类器，通过级联ROI缩小策略进行编排，使每个下游模型聚焦于染色体承载区域。同一组容器镜像既可部署为云服务，也可作为本地安装，同时支持不允许患者数据外泄及允许外泄的临床环境。针对来自10个中期分裂象的459条染色体，与两套商业参考核型分析系统进行的试点临床评估显示：分割准确率98.91%（对比78.21% / 40.52%），分类准确率89.1%（对比86.9% / 54.5%），旋转准确率89.76%（对比94.55% / 78.43%）。KAYRA在三个维度上均优于基于密度阈值的旧参考系统（以染色体级计数的Fisher精确检验：分割与分类p < 0.0001），在分割维度上亦优于现代AI辅助参考系统（p < 0.0001）；但分类维度上，在当前测试集规模下与现代AI参考系统的差异无统计学显著性（p = 0.34）。该系统达到TRL 6成熟度，并整合了诊断性细胞遗传学实践所需的人机协同专家审核流程。本文的核心论点在于：多模型细胞遗传学AI服务可封装为支持灵活部署（云端或本地）的微服务架构，同时在试点临床评估中展现出强劲的实证性能。