Accurate fruit counting in real-world agricultural environments is a longstanding challenge due to visual occlusions, semantic ambiguity, and the high computational demands of 3D reconstruction. Existing methods based on neural radiance fields suffer from low inference speed, limited generalization, and lack support for open-set semantic control. This paper presents FruitLangGS, a real-time 3D fruit counting framework that addresses these limitations through spatial reconstruction, semantic embedding, and language-guided instance estimation. FruitLangGS first reconstructs orchard-scale scenes using an adaptive Gaussian splatting pipeline with radius-aware pruning and tile-based rasterization for efficient rendering. To enable semantic control, each Gaussian encodes a compressed CLIP-aligned language embedding, forming a compact and queryable 3D representation. At inference time, prompt-based semantic filtering is applied directly in 3D space, without relying on image-space segmentation or view-level fusion. The selected Gaussians are then converted into dense point clouds via distribution-aware sampling and clustered to estimate fruit counts. Experimental results on real orchard data demonstrate that FruitLangGS achieves higher rendering speed, semantic flexibility, and counting accuracy compared to prior approaches, offering a new perspective for language-driven, real-time neural rendering across open-world scenarios.
翻译:在真实农业环境中实现精确的果实计数是一项长期存在的挑战,这主要源于视觉遮挡、语义模糊性以及三维重建的高计算需求。现有基于神经辐射场的方法存在推理速度慢、泛化能力有限且缺乏开放集语义控制支持等问题。本文提出FruitLangGS,一种实时三维果实计数框架,通过空间重建、语义嵌入和语言引导的实例估计来解决这些局限性。FruitLangGS首先采用自适应高斯溅射流程重建果园级场景,该流程结合半径感知剪枝和基于图块的光栅化以实现高效渲染。为实现语义控制,每个高斯单元编码一个压缩的CLIP对齐语言嵌入,形成紧凑且可查询的三维表征。在推理阶段,基于提示的语义过滤直接在三维空间中进行,无需依赖图像空间分割或视图级融合。随后通过分布感知采样将选中的高斯单元转换为稠密点云,并进行聚类以估计果实数量。在真实果园数据上的实验结果表明,与现有方法相比,FruitLangGS在渲染速度、语义灵活性和计数准确性方面均表现更优,为开放世界场景下的语言驱动实时神经渲染提供了新视角。