Cooperative motion planning and control of a group of autonomous underwater vehicles using twin-delayed deep deterministic policy gradient

Behnaz Hadi, Alireza Khosravi, Pouria Sarhadi

Research output: Contribution to journalArticlepeer-review

Abstract

The cooperative execution of complex tasks can lead to desirable outcomes and increase the likelihood of mission success. Nevertheless, coordinating the movements of multiple autonomous underwater vehicles (AUVs) in a collaborative manner is challenging due to nonlinear dynamics and environmental disturbances. The paper presents a decentralized deep reinforcement learning algorithm for AUVs that enables cooperative motion planning and obstacle avoidance. The goal is to formulate control policies for AUVs, empowering each vehicle to create its optimal collision-free path through adjustments in speed and heading. To ensure safe navigation of multiple AUVs, COLlision AVoidance (COLAV) plays a crucial role. Therefore, the implementation of a multi-layer region control strategy enhances the AUVs’ responsiveness to nearby obstacles, leading to improved COLAV. Furthermore, a reward function is formulated to consider four criteria: path planning, obstacle- and self-COLAV, as well as feasible control signals, with the aim of strengthening the proposed strategy. Notably, the devised scheme demonstrates robustness against disturbances A comparative study is conducted with the well-established Artificial Potential Field (APF) planning method. The simulation results indicate that the proposed system effectively and safely guides the AUVs to their goals and exhibits desirable generalizability.
Original languageEnglish
Article number103977
Pages (from-to)1-12
Number of pages12
JournalApplied Ocean Research
Volume147
Early online date10 Apr 2024
DOIs
Publication statusPublished - 30 Jun 2024

Keywords

  • Multi-AUVs
  • Motion planning
  • Obstacle avoidance
  • Ocean current
  • Deep reinforcement learning

Fingerprint

Dive into the research topics of 'Cooperative motion planning and control of a group of autonomous underwater vehicles using twin-delayed deep deterministic policy gradient'. Together they form a unique fingerprint.

Cite this