Many data science students and practitioners don't see the value in making time to learn and adopt good coding practices as long as the code "works". However, code standards are an important part of modern data science practice, and they play an essential role in the development of data acumen. Good coding practices lead to more reliable code and save more time than they cost, making them important even for beginners. We believe that principled coding is vital for quality data science practice. To effectively instill these practices within academic programs, instructors and programs need to begin establishing these practices early, to reinforce them often, and to hold themselves to a higher standard while guiding students. We describe key aspects of good coding practices for data science, illustrating with examples in R and in Python, though similar standards are applicable to other software environments. Practical coding guidelines are organized into a top ten list.
翻译:许多数据科学学生和从业者认为,只要代码“能运行”,就无需花时间学习并采纳良好的编程实践。然而,代码标准是现代数据科学实践的重要组成部分,在培养数据素养方面发挥着关键作用。良好的编程实践能带来更可靠的代码,且节省的时间远多于投入的成本,因此即使对初学者也至关重要。我们认为,遵循原则的编程对于高质量的数据科学实践不可或缺。为了在学术项目中有效灌输这些实践,教师和项目需要尽早确立这些规范、经常强化它们,并在指导学生时以更高标准要求自身。我们描述了数据科学中良好编程实践的关键方面,并以R和Python为例进行说明,尽管类似标准也适用于其他软件环境。实用编码指南被归纳为十大要点清单。