Automatically evaluate the correctness of programming assignments is rather straightforward using unit and integration tests. However, programming tasks can be solved in multiple ways, many of which, although correct, are inelegant. For instance, excessive branching, poor naming or repetitiveness make the code hard to understand and maintain. These subjective qualities of code are hard to automatically assess using current techniques. In this work we investigate the use of CodeBERT to automatically assign quality score to Java code. We experiment with different models and training paradigms. We explore the accuracy of the models on a novel dataset for code quality assessment. Finally, we assess the quality of the predictions using saliency maps. We find that code quality to some extent is predictable and that transformer based models using task adapted pre-training can solve the task more efficiently than other techniques.
翻译:自动评估编程作业的正确性可通过单元测试和集成测试较为直接地实现。然而,编程任务可通过多种方式解决,其中许多方案虽正确但设计欠佳。例如,过度分支、命名不当或代码重复会导致程序难以理解与维护。现有技术难以自动评估这些代码的主观质量特征。本研究探究利用CodeBERT自动对Java代码进行质量评分的方法,实验了不同模型与训练范式,并在新型代码质量评估数据集上验证模型准确性。最后,通过显著性图谱评估预测质量。研究发现代码质量在一定程度上具有可预测性,且基于Transformer的模型通过任务自适应预训练可更高效地完成该任务。