We introduce NeuralOS, a neural framework that simulates graphical user interfaces (GUIs) of operating systems by directly predicting screen frames in response to user inputs such as mouse movements, clicks, and keyboard events. NeuralOS combines a recurrent neural network (RNN), which tracks computer state, with a diffusion-based neural renderer that generates screen images. The model is trained on a dataset of Ubuntu XFCE recordings, which include both randomly generated interactions and realistic interactions produced by AI agents. Experiments show that NeuralOS successfully renders realistic GUI sequences, accurately captures mouse interactions, and reliably predicts state transitions like application launches. Beyond reproducing existing systems, NeuralOS shows that synthesized training data can teach the model to simulate applications that were never installed, as illustrated by a Doom application, and suggests a path toward learning user interfaces purely from synthetic demonstrations.
翻译:本文提出NeuralOS,一种通过直接预测屏幕帧来模拟操作系统图形用户界面(GUI)的神经框架,其响应包括鼠标移动、点击和键盘事件在内的用户输入。NeuralOS结合了用于追踪计算机状态的循环神经网络(RNN)与基于扩散的神经渲染器来生成屏幕图像。该模型在Ubuntu XFCE录屏数据集上进行训练,该数据集包含随机生成的交互以及由AI智能体产生的拟真交互。实验表明,NeuralOS能够成功渲染逼真的GUI序列,精确捕捉鼠标交互,并可靠预测应用程序启动等状态转换。除了复现现有系统外,NeuralOS还证明合成训练数据能够使模型学会模拟从未安装过的应用程序(如Doom应用示例),这为纯粹从合成演示中学习用户界面提供了一条可行路径。