The widespread adoption of social media has heightened interest in its psychological effects, particularly on mental health indicators such as anxiety, depression, loneliness, and sleep quality, as these platforms increasingly influence social interactions and well-being. Although previous research has examined correlations between social media use and mental health, few studies have utilized unsupervised machine learning to segment users based on behavioral and psychological patterns, leaving a gap in identifying distinct risk profiles across diverse groups. This study seeks to address this by segmenting individuals according to their social media usage and psychological well-being, employing clustering to reveal hidden patterns and evaluate their mental health implications. Data from 551 participants, collected via an online survey, were preprocessed using KNN imputation for missing values, one-hot encoding for categorical variables like Gender with 5 unique values, and outlier detection via IQR and Z-score methods. K-Means clustering, optimized at 6 clusters using the Elbow Method and a Silhouette Score of 0.32, was applied, with PCA reducing 22 dimensions for visualization and a correlation heatmap highlighting relationships, such as a 0.28 correlation between social media hours and anxiety.
翻译:社交媒体的广泛普及引发了对其心理效应的关注,特别是对焦虑、抑郁、孤独感和睡眠质量等心理健康指标的影响,因为这些平台日益影响社会互动和幸福感。尽管已有研究探讨了社交媒体使用与心理健康之间的关联,但鲜有研究利用无监督机器学习根据行为和心理模式对用户进行细分,导致在识别不同群体的独特风险特征方面存在空白。本研究旨在通过根据个体社交媒体使用和心理状况进行细分来填补这一空白,采用聚类方法揭示隐藏模式并评估其对心理健康的影响。通过在线调查收集了551名参与者的数据,数据预处理包括使用KNN插补处理缺失值、对具有5个唯一值的分类变量(如性别)进行独热编码,并通过IQR和Z-score方法检测异常值。采用K-Means聚类(通过肘部法和轮廓系数0.32优化为6个聚类),使用PCA将22个维度降维以进行可视化,并通过相关性热图突出关系(例如,社交媒体使用时长与焦虑之间的相关性为0.28)。