Unstructured meshes are characterized by data points irregularly distributed in the Euclidian space. Due to the irregular nature of these data, computing connectivity information between the mesh elements requires much more time and memory than on uniformly distributed data. To lower storage costs, dynamic data structures have been proposed. These data structures compute connectivity information on the fly and discard them when no longer needed. However, on-the-fly computation slows down algorithms and results in a negative impact on the time performance. To address this issue, we propose a new task-parallel approach to proactively compute mesh connectivity. Unlike previous approaches implementing data-parallel models, where all threads run the same type of instructions, our task-parallel approach allows threads to run different functions. Specifically, some threads run the algorithm of choice while other threads compute connectivity information before they are actually needed. The approach was implemented in the new Accelerated Clustered TOPOlogical (ACTOPO) data structure, which can support any processing algorithm requiring mesh connectivity information. Our experiments show that ACTOPO combines the benefits of state-of-the-art memory-efficient (TTK CompactTriangulation) and time-efficient (TTK ExplicitTriangulation) topological data structures. It occupies a similar amount of memory as TTK CompactTriangulation while providing up to 5x speedup. Moreover, it achieves comparable time performance as TTK ExplicitTriangulation while using only half of the memory space.
翻译:非结构化网格的特征是数据点在欧几里得空间中呈不规则分布。由于这些数据的非规则性,计算网格单元间的连通性信息所需的时间和内存远多于均匀分布数据。为降低存储成本,研究者提出了动态数据结构。这类数据结构在运行时动态计算连通性信息,并在不再需要时将其丢弃。然而,运行时计算会降低算法速度,并对时间性能产生负面影响。为解决此问题,我们提出了一种新的任务并行方法,可主动计算网格连通性。与先前实现数据并行模型(所有线程执行相同类型指令)的方法不同,我们的任务并行方法允许线程执行不同函数。具体而言,部分线程运行所选算法,而其他线程在连通性信息实际被需要之前预先计算。该方法已在新型加速聚类拓扑(ACTOPO)数据结构中实现,该结构可支持任何需要网格连通性信息的处理算法。实验表明,ACTOPO融合了现有存储高效型(TTK CompactTriangulation)与时间高效型(TTK ExplicitTriangulation)拓扑数据结构的优势:其内存占用与TTK CompactTriangulation相当,同时提供最高5倍的加速比;此外,其时间性能与TTK ExplicitTriangulation接近,但仅需占用其一半的内存空间。