While Large Language Models (LLMs) have demonstrated remarkable capabilities, research shows that their effectiveness depends not only on explicit prompts but also on the broader context provided. This requirement is especially pronounced in software engineering, where the goals, architecture, and collaborative conventions of an existing project play critical roles in response quality. To support this, many AI coding assistants have introduced ways for developers to author persistent, machine-readable directives that encode a project's unique constraints. Although this practice is growing, the content of these directives remains unstudied. This paper presents a large-scale empirical study to characterize this emerging form of developer-provided context. Through a qualitative analysis of 401 open-source repositories containing cursor rules, we developed a comprehensive taxonomy of project context that developers consider essential, organized into five high-level themes: Conventions, Guidelines, Project Information, LLM Directives, and Examples. Our study also explores how this context varies across different project types and programming languages, offering implications for the next generation of context-aware AI developer tools.
翻译:尽管大型语言模型(LLM)已展现出卓越的能力,但研究表明其有效性不仅取决于显式提示,还依赖于所提供的更广泛上下文。这一要求在软件工程领域尤为突出,其中现有项目的目标、架构及协作规范对响应质量起着关键作用。为支持这一点,许多AI编码助手已引入方法,使开发者能够编写持久化、机器可读的指令,以编码项目的独特约束。尽管这一实践日益普及,但这些指令的内容尚未得到系统研究。本文通过大规模实证研究,旨在刻画这种新兴的开发者提供上下文的形式。通过对包含光标规则的401个开源仓库进行定性分析,我们构建了一个全面的项目上下文分类体系,涵盖开发者认为至关重要的内容,并将其归纳为五个高层主题:规范、指南、项目信息、LLM指令和示例。本研究还探讨了此类上下文如何随项目类型和编程语言的不同而变化,为下一代上下文感知的AI开发者工具提供了设计启示。