The ability to connect visual patterns with the processes that form them represents one of the deepest forms of visual understanding. Textures of clouds and waves, the growth of cities and forests, or the formation of materials and landscapes are all examples of patterns emerging from underlying mechanisms. We present the SciTextures dataset, a large-scale collection of textures and visual patterns from all domains of science, tech, and art, along with the models and code that generate these images. Covering over 1,270 different models and 100,000 images of patterns and textures from physics, chemistry, biology, sociology, technology, mathematics, and art, this dataset offers a way to explore the deep connection between the visual patterns that shape our world and the mechanisms that produce them. Built through an agentic AI pipeline that autonomously collects, implements, and standardizes scientific and generative models. This AI pipeline is also used to autonomously invent and implement novel methods for generating visual patterns and textures. SciTextures enables systematic evaluation of vision language models (VLM's) ability to link visual patterns to the models and code that generate them, and to identify different patterns that emerge from the same underlying process. We also test VLMs ability to infer and recreate the mechanisms behind visual patterns by providing a natural image of a real-world phenomenon and asking the AI to identify and code a model of the process that formed it, then run this code to generate a simulated image that is compared to the reference image. These benchmarks reveal that VLM's can understand and simulate physical systems beyond visual patterns at multiple levels of abstraction. The dataset and code are available at: https://zenodo.org/records/17485502
翻译:将视觉模式与其形成过程相连接的能力代表了视觉理解最深层次的形式之一。云与波浪的纹理、城市与森林的生长、材料与地貌的形成,都是底层机制涌现出模式的例证。我们提出SciTextures数据集,这是一个大规模收集自科学、技术与艺术所有领域的纹理与视觉模式的数据集,同时包含生成这些图像的模型与代码。该数据集涵盖物理学、化学、生物学、社会学、技术、数学及艺术领域的超过1,270种不同模型和100,000张模式与纹理图像,为探索塑造我们世界的视觉模式与产生它们的机制之间的深层联系提供了途径。该数据集通过一个自主收集、实现并标准化科学模型与生成模型的智能体AI流程构建而成。此AI流程亦被用于自主发明并实现生成视觉模式与纹理的新方法。SciTextures使得能够系统评估视觉语言模型(VLM)在将视觉模式与生成它们的模型及代码相关联,以及识别源自同一底层过程的不同模式方面的能力。我们还测试了VLM推断并重现视觉模式背后机制的能力,具体方法为:提供一张真实世界现象的自然图像,要求AI识别并编写形成该过程的模型代码,随后运行此代码生成模拟图像,并与参考图像进行比较。这些基准测试表明,VLM能够在多个抽象层次上理解并模拟超越视觉模式的物理系统。数据集与代码发布于:https://zenodo.org/records/17485502