Still Manual? Automated Linter Configuration via DSL-Based LLM Compilation of Coding Standards

Coding standards are essential for maintaining consistent and high-quality code across teams and projects. Linters help developers enforce these standards by detecting code violations. However, manual linter configuration is complex and expertise-intensive, and the diversity and evolution of programming languages, coding standards, and linters lead to repetitive and maintenance-intensive configuration work. To reduce manual effort, we propose LintCFG, a domain-specific language (DSL)-driven, LLM-based compilation approach to automate linter configuration generation for coding standards, independent of programming languages, coding standards, and linters. Inspired by compiler design, we first design a DSL to express coding rules in a tool-agnostic, structured, readable, and precise manner. Then, we build linter configurations into DSL configuration instructions. For a given natural language coding standard, the compilation process parses it into DSL coding standards, matches them with the DSL configuration instructions to set configuration names, option names and values, verifies consistency between the standards and configurations, and finally generates linter-specific configurations. Experiments with Checkstyle for Java coding standard show that our approach achieves over 90% precision and recall in DSL representation, with accuracy, precision, recall, and F1-scores close to 70% (with some exceeding 70%) in fine-grained linter configuration generation. Notably, our approach outperforms baselines by over 100% in precision. A user study further shows that our approach improves developers' efficiency in configuring linters for coding standards. Finally, we demonstrate the generality of the approach by generating ESLint configurations for JavaScript coding standards, showcasing its broad applicability across other programming languages, coding standards, and linters.

翻译：编码规范对于跨团队与跨项目维护一致且高质量的代码至关重要。Linter通过检测代码违规帮助开发者执行这些规范。然而，手动配置Linter复杂且需要专业知识，而编程语言、编码规范和Linter的多样性与不断演进导致了重复且维护密集的配置工作。为减少人工投入，我们提出LintCFG，一种基于领域特定语言驱动、利用LLM的编译方法，用于自动化生成独立于编程语言、编码规范和Linter的编码规范配置。受编译器设计启发，我们首先设计了一种DSL，以工具无关、结构化、可读且精确的方式表达编码规则。随后，我们将Linter配置构建为DSL配置指令。对于给定的自然语言编码规范，编译过程将其解析为DSL编码规范，与DSL配置指令匹配以设置配置名称、选项名称和值，验证规范与配置之间的一致性，最终生成特定于Linter的配置。针对Java编码规范使用Checkstyle的实验表明，我们的方法在DSL表示中实现了超过90%的精确率和召回率，在细粒度Linter配置生成中，准确率、精确率、召回率和F1分数接近70%（部分超过70%）。值得注意的是，我们的方法在精确率上超过基线方法100%以上。一项用户研究进一步表明，我们的方法提高了开发者为编码规范配置Linter的效率。最后，我们通过为JavaScript编码规范生成ESLint配置，展示了该方法在其他编程语言、编码规范和Linter上的广泛适用性。