In this study, we aim to identify the most effective machine learning model for accurately classifying Myers-Briggs Type Indicator (MBTI) types from Reddit posts and a Kaggle data set. We apply multi-label classification using the Binary Relevance method. We use Explainable Artificial Intelligence (XAI) approach to highlight the transparency and understandability of the process and result. To achieve this, we experiment with glass-box learning models, i.e. models designed for simplicity, transparency, and interpretability. We selected k-Nearest Neighbour, Multinomial Naive Bayes, and Logistic Regression for the glass-box models. We show that Multinomial Naive Bayes and k-Nearest Neighbour perform better if classes with Observer (S) traits are excluded, whereas Logistic Regression obtains its best results when all classes have > 550 entries.
翻译:在本研究中,我们旨在识别最有效的机器学习模型,以准确通过Reddit帖子及Kaggle数据集对迈尔斯-布里格斯类型指标(MBTI)进行分类。我们采用二元相关性方法进行多标签分类,并运用可解释人工智能(XAI)方法以突显过程与结果的透明性与可理解性。为此,我们实验了玻璃箱学习模型,即那些注重简洁性、透明性和可解释性的模型。我们选取了k近邻、多项朴素贝叶斯及逻辑回归作为玻璃箱模型。研究表明,排除包含观察者(S)特质的类别时,多项朴素贝叶斯与k近邻表现更佳;而逻辑回归在所有类别样本量均超过550条时达到最优结果。