TreeMind: Automatically Reproducing Android Bug Reports via LLM-empowered Monte Carlo Tree Search

Automatically reproducing Android app crashes from textual bug reports is challenging, particularly when the reports are incomplete and the modern UI exhibits high combinatorial complexity. Existing approaches based solely on reinforcement learning or large language models (LLMs) exhibit limitations in such scenarios. They struggle to infer unobserved steps and reconstruct the underlying user action sequences to navigate the vast UI interaction space, primarily due to limited goal-directed reasoning and planning. We present TreeMind, a novel technique that integrates LLMs with an adapted Monte Carlo Tree Search (MCTS) algorithm to achieve strategic UI exploration in bug reproduction. To the best of our knowledge, this is the first work to combine external decision-making with LLM semantic reasoning for reliable and accurate reproduction processes. We formulate the reproduction task as a target-driven search problem, leveraging MCTS as the core planning mechanism to iteratively refine action sequences. To enhance MCTS with semantic reasoning, we introduce two LLM-guided agents with distinct roles: Expander generates top-k promising actions based on the current UI state and exploration history, while Simulator estimates the likelihood that each candidate action leads toward successful reproduction by additionally leveraging dynamic environment feedback. By incorporating multi-modal UI inputs and tailored prompting strategies, TreeMind performs feedback-aware navigation that identifies essential user actions and incrementally reconstructs reproduction paths. We evaluate TreeMind on a dataset of 93 real-world Android bug reports from three widely-used benchmarks. Experimental results show that it significantly outperforms four state-of-the-art baselines, including ReBL, ReActDroid, AdbGPT, and ReproBot, in reproduction success rate.

翻译：从文本错误报告中自动复现Android应用崩溃具有挑战性，尤其是在报告不完整且现代用户界面展现出高组合复杂度的情况下。仅基于强化学习或大语言模型（LLMs）的现有方法在此类场景中存在局限性。由于目标导向的推理和规划能力有限，它们难以推断未观察到的步骤并重建底层的用户操作序列，以在广阔的UI交互空间中导航。我们提出了TreeMind，这是一种将LLMs与改进的蒙特卡洛树搜索（MCTS）算法相结合的新技术，旨在实现错误报告复现中的策略性UI探索。据我们所知，这是首个将外部决策与LLM语义推理相结合以实现可靠且准确的复现过程的工作。我们将复现任务形式化为一个目标驱动的搜索问题，利用MCTS作为核心规划机制来迭代优化操作序列。为了用语义推理增强MCTS，我们引入了两个具有不同角色的LLM引导智能体：Expander基于当前UI状态和探索历史生成top-k个有前景的操作，而Simulator则通过额外利用动态环境反馈来估计每个候选操作导向成功复现的可能性。通过整合多模态UI输入和定制的提示策略，TreeMind执行具备反馈感知的导航，识别关键的用户操作并逐步重建复现路径。我们在来自三个广泛使用的基准测试的93个真实世界Android错误报告数据集上评估了TreeMind。实验结果表明，在复现成功率方面，它显著优于包括ReBL、ReActDroid、AdbGPT和ReproBot在内的四个最先进的基线方法。