In managed languages, serialization of objects is typically done in bespoke binary formats such as Protobuf, or markup languages such as XML or JSON. The major limitation of these formats is readability. Human developers cannot read binary code, and in most cases, suffer from the syntax of XML or JSON. This is a major issue when objects are meant to be embedded and read in source code, such as in test cases. To address this problem, we propose plain-code serialization. Our core idea is to serialize objects observed at runtime in the native syntax of a programming language. We realize this vision in the context of Java, and demonstrate a prototype which serializes Java objects to Java source code. The resulting source faithfully reconstructs the objects seen at runtime. Our prototype is called ProDJ and is publicly available. We experiment with ProDJ to successfully plain-code serialize 174,699 objects observed during the execution of 4 open-source Java applications. Our performance measurement shows that the performance impact is not noticeable.
翻译:在托管语言中,对象的序列化通常采用定制二进制格式(如 Protobuf)或标记语言(如 XML 或 JSON)。这些格式的主要局限性在于可读性。人类开发者无法阅读二进制代码,且多数情况下难以解析 XML 或 JSON 的语法。当对象需要嵌入源代码并供阅读时(例如在测试用例中),这一问题尤为突出。针对该问题,我们提出纯代码序列化方法。其核心思想是将运行时观测到的对象以编程语言的原生语法进行序列化。我们以 Java 为背景实现这一构想,并展示了可将 Java 对象序列化为 Java 源代码的原型系统。生成的源代码能忠实重构运行时观测到的对象。该原型系统名为 ProDJ,目前已开源。我们通过执行 4 个开源 Java 应用时观测到的 174,699 个对象进行实验,成功实现了纯代码序列化。性能评估表明,该方案对性能的影响可忽略不计。