Code style is an aesthetic choice exhibited in source code that reflects programmers individual coding habits. This study is the first to investigate whether code style can be used as an indicator to identify good programmers. Data from Google Code Jam was chosen for conducting the study. A cluster analysis was performed to find whether a particular coding style could be associated with good programmers. Furthermore, supervised machine learning models were trained using stylistic features and evaluated using recall, macro-F1, AUC-ROC and balanced accuracy to predict good programmers. The results demonstrate that good programmers may be identified using supervised machine learning models, despite that no particular style groups could be attributed as a good style.
翻译:代码风格是源代码中展现的一种美学选择,它反映了程序员个体的编程习惯。本研究首次探讨了代码风格是否可以作为识别优秀程序员的指标。研究选用了Google Code Jam的数据进行实验。通过聚类分析,探究特定编码风格是否与优秀程序员相关联。此外,利用风格特征训练了监督式机器学习模型,并采用召回率、宏F1分数、AUC-ROC和平衡准确率等指标评估模型对优秀程序员的预测能力。结果表明,尽管无法将特定风格类别归为"优秀风格",但监督式机器学习模型仍可用于识别优秀程序员。