Click here to flash read.
This paper proposes a spatial-temporal recurrent neural network architecture
for deep $Q$-networks that can be used to steer an autonomous ship. The network
design makes it possible to handle an arbitrary number of surrounding target
ships while offering robustness to partial observability. Furthermore, a
state-of-the-art collision risk metric is proposed to enable an easier
assessment of different situations by the agent. The COLREG rules of maritime
traffic are explicitly considered in the design of the reward function. The
final policy is validated on a custom set of newly created single-ship
encounters called `Around the Clock' problems and the commonly used Imazu
(1987) problems, which include 18 multi-ship scenarios. Performance comparisons
with artificial potential field and velocity obstacle methods demonstrate the
potential of the proposed approach for maritime path planning. Furthermore, the
new architecture exhibits robustness when it is deployed in multi-agent
scenarios and it is compatible with other deep reinforcement learning
algorithms, including actor-critic frameworks.