This article is a sequel to "GPU implementation of a ray-surface intersection algorithm in CUDA" (arXiv:2209.02878) [1]. Its main focus is PyCUDA which represents a Python scripting approach to GPU run-time code generation in the Compute Unified Device Architecture (CUDA) framework. It accompanies the open-source code distributed in GitHub which provides a PyCUDA implementation of a GPU-based line-segment, surface-triangle intersection test. The objective is to share a PyCUDA learning experience with people who are new to PyCUDA. Using the existing CUDA code and foundation from [1] as the starting point, we document the key changes made to facilitate a transition to PyCUDA. As the CUDA source for the ray-surface intersection test contains both host and device code and uses multiple kernel functions, these notes offer a substantive example and real-world perspective of what it is like to utilize PyCUDA. It delves into custom data structures such as binary radix tree and highlights some possible pitfalls. The case studies present a debugging strategy which may be used to examine complex C structures in device memory using standard Python tools without the CUDA-GDB debugger.
翻译:本文是《CUDA框架下光线-表面相交算法的GPU实现》(arXiv:2209.02878)[1]的续篇。重点聚焦于PyCUDA——这一通过Python脚本在统一计算设备架构(CUDA)框架中实现GPU运行时代码生成的方法。文章配合GitHub上分发的开源代码,提供了基于GPU的线段-表面三角形相交测试的PyCUDA实现。目的在于为PyCUDA初学者分享学习经验。以现有CUDA代码及文献[1]中的基础为起点,我们记录为促进向PyCUDA迁移所做的主要修改。由于光线-表面相交测试的CUDA源码同时包含主机与设备代码,并使用多个内核函数,本文档提供了利用PyCUDA的真实案例与业界视角。深入探讨了诸如二叉树等自定义数据结构,并指出了若干潜在陷阱。案例研究提出了一种调试策略,可借助标准Python工具(无需CUDA-GDB调试器)检查设备内存中的复杂C语言结构。