Understanding how machine learning models respond to distributional shifts is a key research challenge. Mazes serve as an excellent testbed due to varied generation algorithms offering a nuanced platform to simulate both subtle and pronounced distributional shifts. To enable systematic investigations of model behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks. With this library, researchers can easily create datasets, having extensive control over the generation algorithm used, the parameters fed to the algorithm of choice, and the filters that generated mazes must satisfy. Furthermore, it supports multiple output formats, including rasterized and text-based, catering to convolutional neural networks and autoregressive transformer models. These formats, along with tools for visualizing and converting between them, ensure versatility and adaptability in research applications.
翻译:理解机器学习模型如何应对分布偏移是一项关键研究挑战。迷宫凭借其多样化的生成算法,为模拟微妙与显著的分布偏移提供了精密的平台。为实现对模型在分布外数据上行为的系统性研究,我们提出了$\texttt{maze-dataset}$——一个集生成、处理与可视化迷宫求解任务数据集于一体的综合性库。借助该库,研究者可轻松创建数据集,并对生成算法、算法参数以及迷宫必须满足的过滤条件实施全面控制。此外,该库支持包括栅格化与文本格式在内的多种输出格式,可适配卷积神经网络与自回归Transformer模型。这些格式及其可视化与格式转换工具,确保了研究应用的灵活性与适应性。