AutoClustering methods aim to automate unsupervised learning tasks, including algorithm selection (AS), hyperparameter optimization (HPO), and pipeline synthesis (PS), by often leveraging meta-learning over dataset meta-features. While these systems often achieve strong performance, their recommendations are often difficult to justify: the influence of dataset meta-features on algorithm and hyperparameter choices is typically not exposed, limiting reliability, bias diagnostics, and efficient meta-feature engineering. This limits reliability and diagnostic insight for further improvements. In this work, we investigate the explainability of the meta-models in AutoClustering. We first review 22 existing methods and organize their meta-features into a structured taxonomy. We then apply a global explainability technique (i.e., Decision Predicate Graphs) to assess feature importance within meta-models from selected frameworks. Finally, we use local explainability tools such as SHAP (SHapley Additive exPlanations) to analyse specific clustering decisions. Our findings highlight consistent patterns in meta-feature relevance, identify structural weaknesses in current meta-learning strategies that can distort recommendations, and provide actionable guidance for more interpretable Automated Machine Learning (AutoML) design. This study therefore offers a practical foundation for increasing decision transparency in unsupervised learning automation.
翻译:自动聚类方法旨在通过通常利用数据集元特征的元学习,自动化无监督学习任务,包括算法选择(AS)、超参数优化(HPO)和流程合成(PS)。尽管这些系统通常能实现强大的性能,但其推荐结果往往难以证明合理性:数据集元特征对算法和超参数选择的影响通常未被揭示,这限制了可靠性、偏差诊断以及高效的元特征工程,从而进一步制约了可靠性和用于改进的诊断洞察。在本工作中,我们研究了自动聚类中元模型的可解释性。我们首先回顾了22种现有方法,并将其元特征组织成一个结构化分类体系。随后,我们应用一种全局可解释性技术(即决策谓词图)来评估选定框架中元模型内的特征重要性。最后,我们使用局部可解释性工具,如SHAP(SHapley加性解释),来分析具体的聚类决策。我们的研究结果凸显了元特征相关性中的一致模式,识别了当前可能扭曲推荐的元学习策略中的结构性弱点,并为设计更具可解释性的自动化机器学习(AutoML)提供了可操作的指导。因此,本研究为提高无监督学习自动化中的决策透明度奠定了实践基础。