1.广西大学电气工程学院,广西 南宁 530003
2.广西民族大学物理与电子信息学院,广西 南宁 530006
3.多模态信息智能感知处理与应用广西高校工程研究中心,广西 南宁 530006
4.广西智语人形机器人重点实验室,广西 南宁 530006
5.广西智能视觉协作机器人工程研究中心,广西 南宁 530006
6.天津大学电气自动化与信息工程学院,天津 300072
[ "赵剑楠(1994- ),男,博士,广西大学电气工程学院助理教授,主要研究方向为自主无人机、仿生视觉算法、人工智能等。" ]
[ "覃琪琪(1999- ),女,广西大学电气工程学院硕士生,主要研究方向为强化学习、水下传感器网络。" ]
[ "李云(1978- ),女,博士,广西民族大学物理与电子信息学院教授,多模态信息智能感知处理与应用广西高校工程研究中心、广西智语人形机器人重点实验室、广西智能视觉协作机器人工程研究中心核心成员,主要研究方向为水下传感器网络、大数据分析、人工智能等。" ]
[ "苏毅珊(1978- ),男,博士,天津大学电气与自动化工程学院副教授、博士生导师,主要研究方向为水声通信与海洋立体信息网络、计算机网络、传感器网络、物联网等。" ]
收稿:2025-03-27,
修回:2025-04-30,
录用:2025-06-10,
纸质出版:2025-10-20
移动端阅览
赵剑楠,覃琪琪,李云等.基于分布式强化学习的AUV水下三维洋流目标跟踪控制算法[J].电信科学,2025,41(10):88-101.
ZHAO Jiannan,QIN Qiqi,LI Yun,et al.Distributed reinforcement learning-based AUV 3D underwater current target tracking control algorithm[J].Telecommunications Science,2025,41(10):88-101.
赵剑楠,覃琪琪,李云等.基于分布式强化学习的AUV水下三维洋流目标跟踪控制算法[J].电信科学,2025,41(10):88-101. DOI: 10.11959/j.issn.1000-0801.2025209.
ZHAO Jiannan,QIN Qiqi,LI Yun,et al.Distributed reinforcement learning-based AUV 3D underwater current target tracking control algorithm[J].Telecommunications Science,2025,41(10):88-101. DOI: 10.11959/j.issn.1000-0801.2025209.
针对水下自主航行器(autonomous underwater vehicle,AUV)在复杂三维洋流环境中目标跟踪的高维、动态干扰和稀疏回报挑战,提出了一种基于分布式强化学习的水下自主航行器水下三维洋流目标跟踪控制算法。首先,引入真实三维洋流数据,设计动态目标跟踪场景,以准确描述AUV的运动过程;其次,结合对抗深度强化学习网络(dueling deep Q-network,Dueling DQN)结构与分位数回归方法,针对三维洋流环境可能导致Q值过高估计的问题,构建分布强化学习框架,以量化Q值的不确定性,提升策略对动态干扰的适应能力;最后,引入优先经验回放机制,设计约束条件下的奖励函数,优化数据采样策略,加速模型收敛。实验结果表明,相较于深度Q网络(deep Q-network,DQN)、双深度Q网络(double deep Q-network,DDQN)和Dueling DQN,所提算法在复杂洋流环境中表现更优,在收敛速度、目标跟踪精度和鲁棒性方面均取得了显著的进展。
To address the challenges of high dimensionality
dynamic disturbances
and sparse rewards in autonomous underwater vehicle (AUV) target tracking within complex three-dimensional ocean current environments
a distributed reinforcement learning-based control algorithm was proposed. Firstly
realistic 3D ocean current data was incorporated to design dynamic target tracking scenarios
accurately modeling the AUV’s motion dynamics. Secondly
a distributed reinforcement learning framework was constructed by integrating the dueling deep Q-network (Dueling DQN) architecture with quantile regression methods. This framework quantified the uncertainty of Q-values to mitigate overestimation risks in 3D turbulent environments
enhancing the policy’s adaptability to dynamic disturbances. Finally
the prioritized experience replay mechanism was introduced
along with a constraint-optimized reward function
to optimize data sampling strategies and accelerate model convergence. Experimental results demonstrate superior performance of the proposed algorithm compared with the baseline methods
such as deep Q-network (DQN)
double deep Q-network (DDQN)
and Dueling DQN
in complex current conditions
achieving significant improvements in convergence speed
target tracking accuracy
and robustness.
李锦江 , 向先波 , 刘传 , 等 . 基于预设性能制导律的欠驱动AUV海底地形鲁棒时滞跟踪控制 [J ] . 上海交通大学学报 , 2022 , 56 ( 7 ): 944 - 952 .
LI J J , XIANG X B , LIU C , et al . Robust seabed terrain following control of underactuated AUV with prescribed performance guidance law under time delay of actuator [J ] . Journal of Shanghai Jiaotong University , 2022 , 56 ( 7 ): 944 - 952 .
SHEN C , SHI Y , BUCKHAM B . Integrated path planning and tracking control of an AUV: a unified receding horizon optimization approach [J ] . IEEE/ASME Transactions on Mechatronics , 2016 , 22 ( 3 ): 1163 - 1173 .
LI Y , CAI K , ZHANG Y , et al . Localization and tracking for AUVs in marine information networks: research directions, recent advances, and challenges [J ] . IEEE Network , 2019 , 33 ( 6 ): 78 - 85 .
郭银景 , 侯佳辰 , 吴琪 , 等 . AUV全局路径规划环境建模算法研究进展 [J ] . 舰船科学技术 , 2021 , 43 ( 17 ): 12 - 18 .
GUO Y J , HOU J C , WU Q , et al . Research progress of AUV global path planning environment modeling algorithm [J ] . Ship Science and Technology , 2021 , 43 ( 17 ): 12 - 18 .
潘伟 , 曾庆军 , 姚金艺 , 等 . 全驱动自主水下机器人回收路径跟踪模糊滑模控制 [J ] . 船舶与海洋工程 , 2022 , 38 ( 5 ): 45 - 50 .
PAN W , ZENG Q J , YAO J Y , et al . A research on fuzzy sliding mode control for fully actuated AUV recovery path tracking [J ] . Naval Architecture and Ocean Engineering , 2022 , 38 ( 5 ): 45 - 50 .
YANG J , NI J , XI M , et al . Intelligent path planning of underwater robot based on reinforcement learning [J ] . IEEE Transactions on Automation Science and Engineering , 2022 , 20 ( 3 ): 1983 - 1996 .
HAO L Y , WANG R Z , SHEN C , et al . Trajectory tracking control of autonomous underwater vehicles using improved tube-based model predictive control approach [J ] . IEEE Transactions on Industrial Informatics , 2024 , 20 ( 4 -Part1): 11.
郭琳钰 , 高剑 , 焦慧锋 , 等 . 基于RBF神经网络的自主水下航行器模型预测路径跟踪控制 [J ] . 西北工业大学学报 , 2023 , 41 ( 5 ): 871 - 877 .
GUO L Y , GAO J , JIAO H F , et al . Model predictive path following control of underwater vehicle based on RBF neural network [J ] . Journal of Northwestern Polytechnical University , 2023 , 41 ( 5 ): 871 - 877 .
MA D , CHEN X , MA W , et al . Neural network model-based reinforcement learning control for AUV 3-D path following [J ] . IEEE Transactions on Intelligent Vehicles , 2024 , 9 ( 1 ): 893 - 904 .
WANG J , WU Z , YAN S , et al . Real-time path planning and following of a gliding robotic dolphin within a hierarchical framework [J ] . IEEE Transactions on Vehicular Technology , 2021 , 70 ( 4 ): 3243 - 3255 .
金立生 , 韩广德 , 谢宪毅 , 等 . 基于强化学习的自动驾驶决策研究综述 [J ] . 汽车工程 , 2023 , 45 ( 4 ): 527 - 540 .
JIN L S , HAN G D , XIE X Y , et al . Review of autonomous driving decision-making research based on reinforcement learning [J ] . Automotive Engineering , 2023 , 45 ( 4 ): 527 - 540 .
SINGH B , KUMAR R , SINGH V P . Reinforcement learning in robotic applications: a comprehensive survey [J ] . Artificial Intelligence Review , 2022 , 55 ( 2 ): 945 - 990 .
ZHANG M , WU S , JIAO J , et al . Energy- and cost-efficient transmission strategy for UAV trajectory tracking control: a deep reinforcement learning approach [J ] . IEEE Internet of Things Journal , 2023 , 10 ( 10 ): 8958 - 8970 .
ZHANG T , LEI J , LIU Y , et al . Trajectory optimization for UAV emergency communication with limited user equipment energy: a safe-DQN approach [J ] . IEEE Transactions on Green Communications and Networking , 2021 , 5 ( 3 ): 1236 - 1247 .
ZHANG W , GAI J , ZHANG Z , et al . Double-DQN based path smoothing and tracking control method for robotic vehicle navigation [J ] . Computers and Electronics in Agriculture , 2019 , 166 : 104985 .
CHU Z , WANG F , LEI T , et al . Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance [J ] . IEEE Transactions on Intelligent Vehicles , 2023 , 8 ( 1 ): 108 - 120 .
YUE Y , PAN Z , LI S , et al . Reinforcement learning based smart UUV-IoUT localization in underwater acoustic topology network [J ] . IEEE Internet of Things Journal , 2025 : 1 .
LI Y , HE X , LU Z , et al . Comprehensive ocean information-enabled AUV motion planning based on reinforcement learning [J ] . Remote Sensing , 2023 , 15 ( 12 ): 3077 .
LIDTKE A K , RIJPKEMA D , DUZ B . General reinforcement learning control for AUV manoeuvring in turbulent flows [J ] . Ocean Engineering , 2024 , 309 ( 2 ): 118538 .
LIU Z , HOU J , NING D , et al . Improving deep Q network based on marketing psychology for AUV path planning in unknown marine environments [J ] . IEEE Internet of Things Journal , 2025 , 12 ( 5 ): 5476 - 5487 .
LI Y , HUANG H , ZHUANG Y , et al . An AUV-assisted data collection scheme for UWSNs based on reinforcement learning under the influence of ocean current [J ] . IEEE Sensors Journal , 2023 , 24 ( 3 ): 3960 - 3972 .
YU Y , ZHENG H , XU W . Learning and sampling-based informative path planning for AUVs in ocean current fields [J ] . IEEE Transactions on Systems, Man, and Cybernetics: Systems , 2024 .
TO K Y C , KONG F H , LEE K M B , et al . Estimation of spatially-correlated ocean currents from ensemble forecasts and online measurements [C ] // Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2021 : 2301 - 2307 .
KONG F H , TO K Y C , BRASSINGTON G , et al . 3D ensemble-based online oceanic flow field estimation for underwater glider path planning [C ] // Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE Press , 2021 : 4358 - 4365 .
LI X , YU S . Three-dimensional path planning for AUVs in ocean currents environment based on an improved compression factor particle swarm optimization algorithm [J ] . Ocean Engineering , 2023 , 280 : 114610 .
DUAN J , GUAN Y , LI S E , et al . Distributional soft actor-critic: off-policy reinforcement learning for addressing value estimation errors [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2022 , 33 ( 11 ): 6584 - 6598 .
LUIS C E , BOTTERO A G , VINOGRADSKA J , et al . Value-distributional model-based reinforcement learning [J ] . Journal of Machine Learning Research , 2024 , 25 ( 298 ): 1 - 42 .
XI M , YANG J , WEN J , et al . Comprehensive ocean information-enabled AUV path planning via reinforcement learning [J ] . IEEE Internet of Things Journal , 2022 , 9 ( 18 ): 17440 - 17451 .
YANG J , NI J , XI M , et al . Intelligent path planning of underwater robot based on reinforcement learning [J ] . IEEE Transactions on Automation Science and Engineering , 2022 .
Copernicus Marine Service . Global ocean physics reanalysis dataset [DB ] . [ 2025-02-18 ] .
LIU X , YU M , YANG C , et al . Value distribution DDPG with dual-prioritized experience replay for coordinated control of coal-fired power generation systems [J ] . IEEE Transactions on Industrial Informatics , 2024 .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621