并行计算讲义 (Lectures on Parallel Computing)

These lecture notes are designed to accompany an imaginary, virtual, undergraduate, one or two semester course on fundamentals of Parallel Computing as well as to serve as background and reference for graduate courses on High-Performance Computing, parallel algorithms and shared-memory multiprocessor programming. They introduce theoretical concepts and tools for expressing, analyzing and judging parallel algorithms and, in detail, cover the two most widely used concrete frameworks OpenMP and MPI as well as the threading interface pthreads for writing parallel programs for either shared or distributed memory parallel computers with emphasis on general concepts and principles. Code examples are given in a C-like style and many are actual, correct C code. The lecture notes deliberately do not cover GPU architectures and GPU programming, but the general concerns, guidelines and principles (time, work, cost, efficiency, scalability, memory structure and bandwidth) will be just as relevant for efficiently utilizing various GPU architectures. Likewise, the lecture notes focus on deterministic algorithms only and do not use randomization. The student of this material will find it instructive to take the time to understand concepts and algorithms visually. The exercises can be used for self-study and as inspiration for small implementation projects in OpenMP and MPI that can and should accompany any serious course on Parallel Computing. The student will benefit from actually implementing and carefully benchmarking the suggested algorithms on the parallel computing system that may or should be made available as part of such a Parallel Computing course. In class, the exercises can be used as basis for hand-ins and small programming projects for which sufficient, additional detail and precision should be provided by the instructor.

翻译：本讲义旨在配合一门虚拟的本科一至两学期并行计算基础课程，同时为高性能计算、并行算法及共享内存多处理器编程等研究生课程提供背景知识与参考资料。讲义介绍了表达、分析与评估并行算法的理论概念与工具，并详细涵盖两种最广泛使用的具体框架OpenMP与MPI，以及用于编写共享或分布式内存并行计算机并行程序的线程接口pthreads，重点阐述通用概念与原则。代码示例采用类C风格呈现，其中多数为实际可运行的正确C代码。本讲义特意未涵盖GPU架构与GPU编程，但其中涉及的通用考量、指导原则与核心概念（时间、工作量、成本、效率、可扩展性、内存结构与带宽）对于高效利用各类GPU架构同样具有重要参考价值。此外，讲义仅聚焦确定性算法，不涉及随机化方法。学习者通过可视化方式深入理解概念与算法将获得显著收益。习题部分可用于自学，并可作为OpenMP与MPI小型实现项目的灵感来源，这些实践环节应成为任何严谨并行计算课程的重要组成部分。学习者通过在实际并行计算系统上具体实现并细致评测建议的算法将获得实质性提升，此类系统可作为并行计算课程的配套资源提供。在课堂教学中，习题可作为作业提交与小型编程项目的基础，教师需为此补充充分、详尽的细节说明与精确要求。