Space agencies execute complex satellite operations that need to be supported by the technical knowledge contained in their extensive information systems. Knowledge bases (KB) are an effective way of storing and accessing such information at scale. In this work we present a system, developed for the European Space Agency (ESA), that can answer complex natural language queries, to support engineers in accessing the information contained in a KB that models the orbital space debris environment. Our system is based on a pipeline which first generates a sequence of basic database operations, called a %program sketch, from a natural language question, then specializes the sketch into a concrete query program with mentions of entities, attributes and relations, and finally executes the program against the database. This pipeline decomposition approach enables us to train the system by leveraging out-of-domain data and semi-synthetic data generated by GPT-3, thus reducing overfitting and shortcut learning even with limited amount of in-domain training data. Our code can be found at \url{https://github.com/PaulDrm/DISCOSQA}.
翻译:航天机构执行复杂的卫星操作,这需要其庞大信息系统中的技术知识支持。知识库(KB)是规模化存储和访问此类信息的有效方式。本文提出了一套为欧洲空间局(ESA)开发的系统,能够回答复杂的自然语言查询,以支持工程师访问建模轨道空间碎片环境的知识库中的信息。本系统基于一个流水线架构:首先从自然语言问题生成一系列基本数据库操作(称为"程序草图"),随后通过提及实体、属性和关系将该草图具体化为可执行的查询程序,最后在数据库上执行程序。这种流水线分解方法使我们能够利用域外数据和GPT-3生成的半合成数据训练系统,从而在域内训练数据有限的情况下减少过拟合和捷径学习。本系统代码见\url{https://github.com/PaulDrm/DISCOSQA}。