Most robotic grasping systems rely on converting sensor data into explicit 3D point clouds, which is a computational step not found in biological intelligence. This paper explores a fundamentally different, neuro-inspired paradigm for 6-DoF grasp detection. We introduce SpikeGrasp, a framework that mimics the biological visuomotor pathway, processing raw, asynchronous events from stereo spike cameras, similarly to retinas, to directly infer grasp poses. Our model fuses these stereo spike streams and uses a recurrent spiking neural network, analogous to high-level visual processing, to iteratively refine grasp hypotheses without ever reconstructing a point cloud. To validate this approach, we built a large-scale synthetic benchmark dataset. Experiments show that SpikeGrasp surpasses traditional point-cloud-based baselines, especially in cluttered and textureless scenes, and demonstrates remarkable data efficiency. By establishing the viability of this end-to-end, neuro-inspired approach, SpikeGrasp paves the way for future systems capable of the fluid and efficient manipulation seen in nature, particularly for dynamic objects.
翻译:多数机器人抓取系统依赖将传感器数据转化为显式三维点云,这一计算步骤在生物智能中并不存在。本文探索了一种根本不同的、受神经启发的六自由度抓取检测范式。我们提出SpikeGrasp框架,该框架模拟生物视觉运动通路,通过处理来自立体脉冲相机的原始异步事件(类似视网膜机制),直接推断抓取姿态。该模型融合立体脉冲流,并采用类高级视觉处理的递归脉冲神经网络,在无需重建点云的前提下迭代优化抓取假设。为验证该方法,我们构建了大规模合成基准数据集。实验表明,SpikeGrasp在杂波场景与无纹理环境中尤其优于传统基于点云的基线方法,并展现出显著的数据效率。通过验证这种端到端神经启发范式的可行性,SpikeGrasp为未来实现自然界中流畅高效的动态物体操控系统奠定了基础。