This article is a sequel to "GPU implementation of a ray-surface intersection algorithm in CUDA" (arXiv:2209.02878) [1]. Its main focus is PyCUDA which represents a Python scripting approach to GPU run-time code generation in the Compute Unified Device Architecture (CUDA) framework. It accompanies the open-source code distributed in GitHub which provides a PyCUDA implementation of a GPU-based line-segment, surface-triangle intersection test. The objective is to share a PyCUDA learning experience with people who are new to PyCUDA. Using the existing CUDA code and foundation from [1] as the starting point, we document the key changes made to facilitate a transition to PyCUDA. As the CUDA source for the ray-surface intersection test contains both host and device code and uses multiple kernel functions, these notes offer a substantive example and real-world perspective of what it is like to utilize PyCUDA. It delves into custom data structures such as binary radix tree and highlights some possible pitfalls. The case studies present a debugging strategy which may be used to examine complex C structures in device memory using standard Python tools without the CUDA-GDB debugger.
翻译:本文是《基于CUDA的光线-曲面求交算法GPU实现》(arXiv:2209.02878)[1]的续篇。主要探讨PyCUDA——一种利用Python脚本在统一计算设备架构(CUDA)框架下实现GPU运行时代码生成的方法。本文配套GitHub上开源的PyCUDA实现代码,该代码提供了基于GPU的线段与表面三角形相交测试算法。旨在为PyCUDA初学者分享学习经验。以文献[1]中现有CUDA代码和基础架构为起点,我们记录了为过渡到PyCUDA所进行的关键修改。由于光线-曲面求交测试的CUDA源代码包含主机端和设备端代码,并涉及多个内核函数,本文提供了实质性示例和现实视角,展示使用PyCUDA的具体实践。文中深入探讨了自定义数据结构(如二进制基数树)并指出了可能遇到的陷阱。案例分析部分提出了一种调试策略,该策略无需CUDA-GDB调试器,即可通过标准Python工具检查设备内存中的复杂C语言结构。