Understanding how machine learning models respond to distributional shifts is a key research challenge. Mazes serve as an excellent testbed due to varied generation algorithms offering a nuanced platform to simulate both subtle and pronounced distributional shifts. To enable systematic investigations of model behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks. With this library, researchers can easily create datasets, having extensive control over the generation algorithm used, the parameters fed to the algorithm of choice, and the filters that generated mazes must satisfy. Furthermore, it supports multiple output formats, including rasterized and text-based, catering to convolutional neural networks and autoregressive transformer models. These formats, along with tools for visualizing and converting between them, ensure versatility and adaptability in research applications.
翻译:理解机器学习模型如何应对分布偏移是研究中的关键挑战。迷宫因可通过多样化生成算法提供细微及显著分布偏移的精细模拟平台,成为出色的测试基准。为支持对模型在分布外数据上行为的系统性研究,我们提出$\texttt{maze-dataset}$——一个用于生成、处理及可视化迷宫求解任务数据集的综合性库。借助该库,研究者可轻松创建数据集,并对所使用生成算法、算法参数输入及生成迷宫需满足的过滤条件进行广泛控制。此外,库支持包括栅格化与文本化在内的多种输出格式,可适配卷积神经网络与自回归Transformer模型。这些格式及其可视化与格式转换工具,确保了研究应用中的灵活性与适应性。