Our paper investigates effective methods for code generation in "specific-domain" applications, including the use of Large Language Models (LLMs) for data segmentation and renewal, as well as stimulating deeper thinking in LLMs through prompt adjustments. Using a real company product as an example, we provide user manuals, API documentation, and other data. The ideas discussed in this paper help segment and then convert this data into semantic vectors to better reflect their true positioning. Subsequently, user requirements are transformed into vectors to retrieve the most relevant content, achieving about 70% accuracy in simple to medium-complexity tasks through various prompt techniques. This paper is the first to enhance specific-domain code generation effectiveness from this perspective. Additionally, we experiment with generating more scripts from a limited number using llama2-based fine-tuning to test its effectiveness in professional domain code generation. This is a challenging and promising field, and once achieved, it will not only lead to breakthroughs in LLM development across multiple industries but also enable LLMs to understand and learn any new knowledge effectively.
翻译:本文研究了“特定领域”应用中代码生成的有效方法,包括利用大语言模型(LLMs)进行数据分割与更新,以及通过提示调整激发LLMs的深层思考。以某实际公司产品为例,我们提供了用户手册、API文档等数据。本文提出的思路有助于将这些数据分割并转换为语义向量,以更准确地反映其真实定位。随后,用户需求被转化为向量以检索最相关的内容,通过多种提示技术在简单至中等复杂度任务中实现了约70%的准确率。本文首次从这一角度提升了特定领域代码生成的有效性。此外,我们基于llama2微调,从有限数量的脚本中尝试生成更多脚本,以测试其在专业领域代码生成中的有效性。这是一个充满挑战且前景广阔的领域,一旦实现,不仅将推动多个行业中LLMs发展的突破,还能使LLMs有效理解并学习任何新知识。