TY - JOUR
T1 - Actor–critic learning based PID control for robotic manipulators
AU - Nohooji, Hamed Rahimi
AU - Zaraki, Abolfazl
AU - Voos, Holger
N1 - © 2023 Elsevier B.V. All rights reserved. This is the accepted manuscript version of an article which has been published in final form at https://doi.org/10.1016/j.asoc.2023.111153
PY - 2024/1/30
Y1 - 2024/1/30
N2 - In this paper, we propose a reinforcement learning structure for auto-tuning PID gains by solving an optimal tracking control problem for robot manipulators. Capitalizing on the actor–critic framework implemented by neural networks, we achieve optimal tracking performance, estimating unknown system dynamics simultaneously. The critic network is used to approximate the cost function, which serves as an indicator of control performance. With feedback from the critic, the actor network learns time-varying PID gains over time to optimize control input, thereby steering the system toward optimal performance. Furthermore, we utilize Lyapunov's direct method to demonstrate the stability of the closed-loop system. This approach provides an analytical procedure for a stable robot manipulator system to systematically adjust PID gains, bypassing the ad-hoc and painstaking process. The resultant actor–critic PID-like control exhibits stable adaptive and learning capabilities while maintaining a simple structure and inexpensive online computational demands. Numerical simulations underscore the effectiveness and advantages of the proposed actor–critic neural network PID control.
AB - In this paper, we propose a reinforcement learning structure for auto-tuning PID gains by solving an optimal tracking control problem for robot manipulators. Capitalizing on the actor–critic framework implemented by neural networks, we achieve optimal tracking performance, estimating unknown system dynamics simultaneously. The critic network is used to approximate the cost function, which serves as an indicator of control performance. With feedback from the critic, the actor network learns time-varying PID gains over time to optimize control input, thereby steering the system toward optimal performance. Furthermore, we utilize Lyapunov's direct method to demonstrate the stability of the closed-loop system. This approach provides an analytical procedure for a stable robot manipulator system to systematically adjust PID gains, bypassing the ad-hoc and painstaking process. The resultant actor–critic PID-like control exhibits stable adaptive and learning capabilities while maintaining a simple structure and inexpensive online computational demands. Numerical simulations underscore the effectiveness and advantages of the proposed actor–critic neural network PID control.
KW - Actor–critic
KW - Neural network
KW - PID control
KW - Reinforcement learning
KW - Robot manipulators
UR - http://www.scopus.com/inward/record.url?scp=85180368308&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2023.111153
DO - 10.1016/j.asoc.2023.111153
M3 - Article
AN - SCOPUS:85180368308
SN - 1568-4946
VL - 151
SP - 1
EP - 11
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 111153
ER -