Machine learning (ML) has employed various discretization methods to partition numerical attributes into intervals. However, an effective discretization technique remains elusive in many ML applications, such as association rule mining. Moreover, the existing discretization techniques do not reflect best the impact of the independent numerical factor on the dependent numerical target factor. This research aims to establish a benchmark approach for numerical attribute partitioning. We conduct an extensive analysis of human perceptions of partitioning a numerical attribute and compare these perceptions with the results obtained from our two proposed measures. We also examine the perceptions of experts in data science, statistics, and engineering by employing numerical data visualization techniques. The analysis of collected responses reveals that $68.7\%$ of human responses approximately closely align with the values generated by our proposed measures. Based on these findings, our proposed measures may be used as one of the methods for discretizing the numerical attributes.
翻译:机器学习(ML)已采用多种离散化方法将数值属性划分为区间。然而,在许多ML应用(如关联规则挖掘)中,有效的离散化技术仍难以实现。此外,现有离散化技术未能最佳反映独立数值因素对依赖数值目标因素的影响。本研究旨在为数值属性划分建立基准方法。我们深入分析了人类对数值属性划分的感知,并将这些感知与我们提出的两种度量结果进行比较。同时,通过数值数据可视化技术,我们考察了数据科学、统计学和工程学领域专家的感知。对收集到的响应分析表明,$68.7\%$的人类响应与我们所提出度量的生成值高度吻合。基于这些发现,我们提出的度量可作为数值属性离散化的方法之一。