In IT system operations, shell commands are common command line tools used by site reliability engineers (SREs) for daily tasks, such as system configuration, package deployment, and performance optimization. The efficiency in their execution has a crucial business impact since shell commands very often aim to execute critical operations, such as the resolution of system faults. However, many shell commands involve long parameters that make them hard to remember and type. Additionally, the experience and knowledge of SREs using these commands are almost always not preserved. In this work, we propose SHREC, a SRE behaviour knowledge graph model for shell command recommendations. We model the SRE shell behaviour knowledge as a knowledge graph and propose a strategy to directly extract such a knowledge from SRE historical shell operations. The knowledge graph is then used to provide shell command recommendations in real-time to improve the SRE operation efficiency. Our empirical study based on real shell commands executed in our company demonstrates that SHREC can improve the SRE operation efficiency, allowing to share and re-utilize the SRE knowledge.
翻译:在IT系统运维中,Shell命令是站点可靠性工程师(SRE)用于日常任务(如系统配置、软件包部署和性能优化)的常见命令行工具。其执行效率具有至关重要的业务影响,因为Shell命令通常旨在执行关键操作,例如系统故障的解决。然而,许多Shell命令包含冗长的参数,导致其难以记忆和输入。此外,SRE使用这些命令的经验与知识几乎从未被有效保存。本文提出SHREC,一种用于Shell命令推荐的SRE行为知识图谱模型。我们将SRE的Shell行为知识建模为知识图谱,并提出一种直接从SRE历史Shell操作中提取此类知识的策略。该知识图谱随后被用于实时提供Shell命令推荐,以提升SRE运维效率。基于本公司实际执行的Shell命令开展的实证研究表明,SHREC能够有效提升SRE运维效率,实现SRE知识的共享与复用。