In recirculating aquaculture systems, accurate and effective assessment of fish feeding intensity is crucial for reducing feed costs and calculating optimal feeding times. However, current studies have limitations in modality selection, feature extraction and fusion, and co-inference for decision making, which restrict further improvement in the accuracy, applicability and reliability of multimodal fusion models. To address this problem, this study proposes a Multi-stage Augmented Multimodal Interaction Network (MAINet) for quantifying fish feeding intensity. Firstly, a general feature extraction framework is proposed to efficiently extract feature information from input image, audio and water wave datas. Second, an Auxiliary-modality Reinforcement Primary-modality Mechanism (ARPM) is designed for inter-modal interaction and generate enhanced features, which consists of a Channel Attention Fusion Network (CAFN) and a Dual-mode Attention Fusion Network (DAFN). Finally, an Evidence Reasoning (ER) rule is introduced to fuse the output results of each modality and make decisions, thereby completing the quantification of fish feeding intensity. The experimental results show that the constructed MAINet reaches 96.76%, 96.78%, 96.79% and 96.79% in accuracy, precision, recall and F1-Score respectively, and its performance is significantly higher than the comparison models. Compared with models that adopt single-modality, dual-modality fusion and different decision-making fusion methods, it also has obvious advantages. Meanwhile, the ablation experiments further verified the key role of the proposed improvement strategy in improving the robustness and feature utilization efficiency of model, which can effectively improve the accuracy of the quantitative results of fish feeding intensity. The dataset is available at: https://huggingface.co/datasets/ShulongZhang/Multimodal_Fish_Feeding_Intensity.
翻译:在循环水养殖系统中,准确有效地评估鱼类摄食强度对于降低饲料成本与计算最佳投喂时间至关重要。然而,现有研究在模态选择、特征提取与融合以及决策协同推理方面存在局限,制约了多模态融合模型在准确性、适用性与可靠性方面的进一步提升。为解决该问题,本研究提出一种用于量化鱼类摄食强度的多阶段增强多模态交互网络(MAINet)。首先,提出通用特征提取框架以高效提取输入图像、音频与水波数据的特征信息。其次,设计辅助模态增强主模态机制(ARPM)进行模态间交互并生成增强特征,该机制由通道注意力融合网络(CAFN)与双模注意力融合网络(DAFN)构成。最后,引入证据推理(ER)规则融合各模态输出结果并完成决策,从而实现鱼类摄食强度的量化。实验结果表明,所构建的MAINet在准确率、精确率、召回率与F1分数上分别达到96.76%、96.78%、96.79%与96.79%,其性能显著优于对比模型。相较于采用单模态、双模态融合及不同决策融合方法的模型亦具有明显优势。同时,消融实验进一步验证了所提改进策略对提升模型鲁棒性与特征利用效率的关键作用,能有效提高鱼类摄食强度量化结果的准确性。数据集发布于:https://huggingface.co/datasets/ShulongZhang/Multimodal_Fish_Feeding_Intensity。