Persistent semantic monitoring of indoor spaces such as warehouses, hospitals, and offices requires a robot to repeatedly monitor an environment and track how objects change over time. Running full simultaneous localization and mapping (SLAM) with dense semantic reconstruction from scratch on every visit is redundant when the environment geometry stays the same and only the objects move. We present a modular two-stage system that separates geometric mapping from semantic updating. In the first stage, a frontier-based exploration method with a dynamic search window builds a 2D occupancy grid. In the second stage, the robot relocalizes in this map and builds a semantic object graph using an open-vocabulary object detector and a promptable segmentation model. Only the lightweight semantic stage is repeated on later visits, so the system scales well to frequent revisits. The object graph uses a category and distance based association rule to update objects, which lets the map reflect both intra-session changes (object changes within a single traversal) and inter-session changes (changes across revisits), such as objects being moved, removed, or added. We validate the system on a Fetch robot in two real indoor environments of about 8,500 sq.m and 117 sq.m, and report precision, recall, and F1 scores across multiple update iterations.
翻译:暂无翻译