Industrial fruit inspection systems must operate reliably under dense multi-object interactions and continuous motion, yet most existing works evaluate detection or classification at the image level without ensuring temporal stability in video streams. We present a two-stage detection-tracking framework for stable multi-apple quality inspection in conveyor-belt environments. An orchard-trained YOLOv8 model performs apple localization, followed by ByteTrack multi-object tracking to maintain persistent identities. A ResNet18 defect classifier, fine-tuned on a healthy-defective fruit dataset, is applied to cropped apple regions. Track-level aggregation is introduced to enforce temporal consistency and reduce prediction oscillation across frames. We define video-level industrial metrics such as track-level defect ratio and temporal consistency to evaluate system robustness under realistic processing conditions. Results demonstrate improved stability compared to frame-wise inference, suggesting that integrating tracking is essential for practical automated fruit grading systems.
翻译:工业水果检测系统必须在密集多目标交互和连续运动条件下可靠运行,然而现有研究大多仅在图像层面评估检测或分类性能,未能确保视频流中的时间稳定性。本文提出一种用于传送带环境中稳定多苹果质量检测的两阶段检测-跟踪框架。首先采用果园训练的YOLOv8模型进行苹果定位,随后通过ByteTrack多目标跟踪算法维持目标身份连续性。在裁剪的苹果区域上应用基于健康-缺陷水果数据集微调的ResNet18缺陷分类器。引入轨迹级聚合机制以增强时间一致性,减少跨帧预测振荡。我们定义了轨迹级缺陷率和时间一致性等视频级工业指标,用于评估系统在实际加工条件下的鲁棒性。实验结果表明,与逐帧推理相比,该系统显著提升了稳定性,证明集成跟踪技术对于实用化自动水果分级系统至关重要。