Motion planning in an unknown environment demands synthesis of an optimal control policy that balances between exploration and exploitation. In this paper, we present the environment as a labeled graph where the labels of states are initially unknown, and consider a motion planning objective to fulfill a generalized reach-avoid specification given on these labels in minimum time. By describing the record of visited labels as an automaton, we translate our problem to a Canadian traveler problem on an adapted state space. We propose a strategy that enables the agent to perform its task by exploiting possible a priori knowledge about the labels and the environment and incrementally revealing the environment online. Namely, the agent plans, follows, and replans the optimal path by assigning edge weights that balance between exploration and exploitation, given the current knowledge of the environment. We illustrate our strategy on the setting of an agent operating on a two-dimensional grid environment.
Full paper: [ Ссылка ]
Ещё видео!