An efficient algorithm, Apriori_Goal, is proposed for constructing association rules for a relational database with a given classification. The algorithm's features are related to the specifics of the database and the method of encoding its records. The algorithm proposes five criteria that characterize the quality of the rules being constructed. Different criteria are also proposed for filtering the sets used when constructing association rules. The proposed method of encoding records allows for an efficient implementation of the basic operation underlying the computation of rule characteristics. The algorithm works with a relational database, where the columns can be of different types, both continuous and discrete. Among the columns, a target discrete column is distinguished, which defines the classification of the records. This allows the original database to be divided into $n$ subsets according to the number of categories of the target parameter. A classical example of such databases is medical databases, where the target parameter is the diagnosis established by doctors. A preprocessor, which is an important part of the algorithm, converts the properties of the objects represented by the columns of the original database into binary properties and encodes each record as a single integer. In addition to saving memory, the proposed format allows the complete preservation of information about the binary properties representing the original record. More importantly, the computationally intensive operations on records, required for calculating rule characteristics, are performed almost instantly in this format using a pair of logical operations on integers.
翻译:本文提出了一种高效的Apriori_Goal算法,用于为具有给定分类的关系数据库构建关联规则。该算法的特性与数据库的具体特征及其记录编码方法密切相关。算法提出了五项用于衡量所构建规则质量的标准,同时提出了用于过滤关联规则构建过程中所用集合的不同准则。所提出的记录编码方法能够高效实现计算规则特征所需的基础运算。该算法适用于处理包含连续型和离散型等多种列类型的关系数据库。在数据库列中,需指定一个目标离散列用于定义记录的分类,这使得原始数据库可根据目标参数的类别数量划分为$n$个子集。此类数据库的典型范例是医疗数据库,其目标参数即为医生确诊的诊断结果。作为算法核心组成部分的预处理器,负责将原始数据库各列所表征的对象属性转换为二元属性,并将每条记录编码为单个整数。该编码格式不仅节省内存空间,还能完整保留表征原始记录的二元属性信息。更重要的是,在此格式下,计算规则特征所需的大量记录运算可通过整数间的逻辑运算对近乎即时完成。