Learning Program Behavioral Models from Synthesized Input-Output Pairs

We introduce Modelizer - a novel framework that, given a black-box program, learns a model from its input/output behavior using neural machine translation algorithms. The resulting model mocks the original program: Given an input, the model predicts the output that would have been produced by the program. However, the model is also reversible - that is, the model can predict the input that would have produced a given output. Finally, the model is differentiable and can be efficiently restricted to predict only a certain aspect of the program behavior. Modelizer uses grammars to synthesize and inputs and unsupervised tokenizers to decompose the resulting outputs, allowing it to learn sequence-to-sequence associations between token streams. Other than input grammars, Modelizer only requires the ability to execute the program. The resulting models are small, requiring fewer than 6.3 million parameters for languages such as Markdown or HTML; and they are accurate, achieving up to 95.4% accuracy and a BLEU score of 0.98 with standard error 0.04 in mocking real-world applications. As it learns from and predicts executions rather than code, Modelizer departs from the LLM-centric research trend, opening new opportunities for program-specific models that are fully tuned towards individual programs. Indeed, we foresee several applications of these models, especially as the output of the program can be any aspect of program behavior. Beyond mocking and predicting program behavior, the models can also synthesize inputs that are likely to produce a particular behavior, such as failures or coverage, thus assisting in program understanding and maintenance.

翻译：我们提出了Modelizer——一种新颖的框架，该框架在给定黑盒程序的情况下，利用神经机器翻译算法从其输入/输出行为中学习模型。所得模型能够模拟原始程序：给定输入时，该模型可预测程序原本会产生的输出。同时，该模型具有可逆性——即能够预测会产生特定输出的对应输入。此外，该模型具备可微分特性，并可被高效约束为仅预测程序行为的特定方面。Modelizer采用语法规则合成输入，并利用无监督分词器对输出结果进行分解，从而学习令牌流之间的序列到序列关联。除输入语法外，Modelizer仅需具备程序执行能力。所生成的模型规模紧凑，对于Markdown或HTML等语言仅需不足630万个参数；同时具有高准确性，在模拟实际应用时最高可达95.4%的准确率，BLEU分数达0.98（标准误差0.04）。由于该框架从程序执行过程而非代码中学习并进行预测，Modelizer突破了当前以LLM为核心的研究范式，为完全针对个体程序定制的专用模型开辟了新路径。我们预见此类模型具有多重应用前景，特别是考虑到程序输出可表征程序行为的任意维度。除模拟和预测程序行为外，该模型还能合成可能引发特定行为（如故障或覆盖率）的输入，从而辅助程序理解与维护工作。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/