We present SPILDL, a Scalable and Parallel Inductive Learner in Description Logic (DL). SPILDL is based on the DL-Learner (the state of the art in DL-based ILP learning). As a DL-based ILP learner, SPILDL targets the $\mathcal{ALCQI}^{\mathcal{(D)}}$ DL language, and can learn DL hypotheses expressed as disjunctions of conjunctions (using the $\sqcup$ operator). Moreover, SPILDL's hypothesis language also incorporates the use of string concrete roles (also known as string data properties in the Web Ontology Language, OWL); As a result, this incorporation of powerful DL constructs, enables SPILDL to learn powerful DL-based hypotheses for describing many real-world complex concepts. SPILDL employs a hybrid parallel approach which combines both shared-memory and distributed-memory approaches, to accelerates ILP learning (for both hypothesis search and evaluation). According to experimental results, SPILDL's parallel search improved performance by up to $\sim$27.3 folds (best case). For hypothesis evaluation, SPILDL improved evaluation performance through HT-HEDL (our multi-core CPU + multi-GPU hypothesis evaluation engine), by up to 38 folds (best case). By combining both parallel search and evaluation, SPILDL improved performance by up to $\sim$560 folds (best case). In terms of worst case scenario, SPILDL's parallel search doesn't provide consistent speedups on all datasets, and is highly dependent on the search space nature of the ILP dataset. For some datasets, increasing the number of parallel search threads result in reduced performance, similar or worse than baseline. Some ILP datasets benefit from parallel search, while others don't (or the performance gains are negligible). In terms of parallel evaluation, on small datasets, parallel evaluation provide similar or worse performance than baseline.
翻译:本文提出SPILDL,一种可扩展的并行描述逻辑(DL)归纳学习器。SPILDL基于当前最先进的描述逻辑归纳逻辑程序设计(ILP)学习系统DL-Learner。作为基于描述逻辑的ILP学习器,SPILDL面向$\mathcal{ALCQI}^{\mathcal{(D)}}$描述逻辑语言,能够学习以析取合取式(使用$\sqcup$运算符)表示的描述逻辑假设。此外,SPILDL的假设语言还引入了字符串具体角色(在Web本体语言OWL中亦称为字符串数据属性);这种强大描述逻辑构造的融入,使得SPILDL能够学习具有强表达能力的描述逻辑假设,以刻画众多现实世界的复杂概念。SPILDL采用共享内存与分布式内存相结合的混合并行方法,以加速ILP学习过程(包括假设搜索与评估两个阶段)。实验结果表明:SPILDL的并行搜索在最佳情况下可实现约27.3倍的性能提升;在假设评估方面,通过HT-HEDL(我们开发的多核CPU+多GPU假设评估引擎)最高可获得38倍的加速效果。综合并行搜索与评估,SPILDL在最佳情况下整体性能提升约560倍。在最差情况下,SPILDL的并行搜索并非在所有数据集上都能提供稳定的加速效果,其性能高度依赖于ILP数据集的搜索空间特性。对于某些数据集,增加并行搜索线程数反而会导致性能下降,与基线水平相当或更差。部分ILP数据集能从并行搜索中获益,而其他数据集则不能(或性能增益可忽略不计)。在并行评估方面,对于小型数据集,并行评估提供的性能与基线水平相当或更差。