Recent work shows Large Language Models (LLMs) struggle to understand natural language constraints for various text generation tasks in zero- and few-shot settings. While, in the code domain, there is wide usage of constraints in code format to maintain the integrity of code written in Domain-Specific Languages (DSLs), yet there has been no work evaluating LLMs with these constraints. We propose two novel tasks to assess the controllability of LLMs using hard and soft constraints represented as code across five representations. Our findings suggest that LLMs struggle to comprehend constraints in all representations irrespective of their portions in the pre-training data. While models are better at comprehending constraints in JSON, YAML, and natural language representations, they struggle with constraints represented in XML and the resource-rich language Python.
翻译:近期研究表明,大型语言模型(LLMs)在零样本和少样本设置下,难以理解针对各类文本生成任务的自然语言约束。然而在代码领域,为维护领域特定语言(DSLs)所编写代码的完整性,广泛采用了代码形式的约束,但目前尚无工作评估LLMs处理此类约束的能力。我们提出了两项新颖任务,通过以五种表示形式呈现的硬约束与软约束(均以代码形式表达)来评估LLMs的可控性。研究结果表明,无论这些约束在预训练数据中所占比例如何,LLMs均难以理解所有表示形式的约束。虽然模型对JSON、YAML及自然语言表示的约束理解能力较强,但在处理XML表示和资源丰富的Python语言表示的约束时仍面临困难。