基于深度强化学习的数据传输策略优化研究

蒋守花; 冯军; 舒晖; 黎佳宜

doi:10.11959/j.issn.1000-0801.2025188

您当前的位置：

首页 >

文章列表页 >

基于深度强化学习的数据传输策略优化研究

研究与开发 | 更新时间：2025-09-04

- 基于深度强化学习的数据传输策略优化研究
- Research on optimization of data transmission strategies based on deep reinforcement learning
- 电信科学 2025年41卷第8期页码：148-162
- 作者机构：
  
  1.成都医学院现代教育技术中心，四川成都 610599
  2.北京师范大学，北京 100875
- 作者简介：
  
  [ "蒋守花（1988- ），女，成都医学院工程师，主要研究方向为边缘计算、人工智能、物联网、大数据。" ]
  [ "冯军（1969- ），男，成都医学院教授，主要研究方向为5G双域网、医疗设备管理、教育信息技术应用、人工智能。" ]
  [ "舒晖（1972- ），女，成都医学院高级工程师，主要研究方向为教育信息技术应用、5G双域网、物联网。" ]
  黎佳宜（2004- ），女，北京师范大学在读，主要研究方向为国际与比较教育、数据传输、新媒体传播等。
- 基金信息：
  
  四川省教育信息化应用与发展研究中心2024年度立项课题(JYXX2410);四川省教育信息化与大数据中心项目(DSJZXKT256);四川省教育数字化发展与评价重点实验室2025年度立项课题(JYSZH202514)
- DOI：10.11959/j.issn.1000-0801.2025188
  中图分类号： TP393
- 收稿：2025-02-25，
  
  修回：2025-04-27，
  
  录用：2025-07-07，
  
  纸质出版：2025-08-20
- 稿件说明：
移动端阅览
蒋守花,冯军,舒晖等.基于深度强化学习的数据传输策略优化研究[J].电信科学,2025,41(08):148-162.

JIANG Shouhua,FENG Jun,SHU Hui,et al.Research on optimization of data transmission strategies based on deep reinforcement learning[J].Telecommunications Science,2025,41(08):148-162.
蒋守花,冯军,舒晖等.基于深度强化学习的数据传输策略优化研究[J].电信科学,2025,41(08):148-162. DOI： 10.11959/j.issn.1000-0801.2025188.

JIANG Shouhua,FENG Jun,SHU Hui,et al.Research on optimization of data transmission strategies based on deep reinforcement learning[J].Telecommunications Science,2025,41(08):148-162. DOI： 10.11959/j.issn.1000-0801.2025188.

摘要

基于深度强化学习理论框架，提出分层递进式解决方案。首先，构建融合边缘计算节点的异构数据传输架构，建立具有时变特征的多维状态空间马尔可夫决策过程。其次，在传统深度Q网络（deep Q-learning network，DQN）算法中嵌入熵正则化约束项，结合同策略经验回放机制，形成增强型ESERDQN（improved DQN algorithm based on entropy and same-strategy experience replay）优化器。最终，设计五维评估指标体系（收敛速率、累积奖励值、能耗、传输时延、传输成本），开展多算法对比实验。仿真结果表明，ESERDQN在1 500训练周期内达成稳定收敛，较基准贪心算法、随机算法、DDPG算法及PPO分别提升收敛速度49.2%、41.7%、30.1%和13.3%；在综合业务指标方面，其单位能耗成本降低27.8%，关键任务时延控制在12.3 ms以内，验证了所提方法在智慧城市复杂传输场景下的技术优越性。

Abstract

Based on the theoretical framework of deep reinforcement learning

a hierarchical and progressive solution was proposed. Firstly

a heterogeneous data transmission architecture integrating edge computing nodes was constructed

and a multi-dimensional state space Markov decision process with time-varying characteristics was established. Secondly

the entropy regularization constraint term was embedded in the traditional deep Q-learning network (DQN) algorithm

and the experience replay mechanism of the same strategy was combined. An enhanced ESERDQN (improved DQN algorithm based on entropy and same-strategy experience replay) optimizer was formed. Finally

a five-dimensional evaluation index system (convergence rate

cumulative reward value

energy consumption

end-to-end delay

transmission cost) was designed to carry out multi-algorithm comparison experiments. The simulation results show that ESERDQN achieves stable convergence within 1 500 training cycles

which improves the convergence speed by 49.2%

41.7%

30.1% and 13.3% respectively compared with the benchmark greedy algorithm

random algorithm

DDPG algorithm and PPO. In terms of comprehensive business indicators

the unit energy cost was reduced by 27.8%

and the delay of key tasks is controlled within 12.3 ms

which verifies the technical superiority of the proposed method in complex transmission scenarios of smart cities.

关键词

Keywords

references

GUO Z J . The application process of artificial intelligence in smart cities [J ] . Automation and Instrumentation , 2024 , 39 ( 9 ): 162 - 164 .

JIA X F , GAO S , ZHOU Y , et al . An efficient cross-domain data flow technology framework for mega-city governance [J ] . Frontiers in Data and Computing Development , 2023 , 5 ( 5 ): 35 - 45 .

王亚平 , 余贶琭 , 滕永平 . 面向智慧交通的物联网实验教学探索 [J ] . 实验室研究与探索 , 2024 , 43 ( 1 ): 184 - 187 .

WANG Y P , YU K L , TENG Y P . Exploration on the experimental teaching of the Internet of things for intelligent transportation [J ] . Research and Exploration in Laboratory , 2024 , 43 ( 1 ): 184 - 187 .

ZHOU C Z , WU W , CAI X Q . Research on data security prevention and control strategies based on classification and classification [J ] . Frontiers in Data and Computing , 2023 , 5 ( 1 ): 128 - 135 .

CHEN K , CHEN L , XIE J M , et al . Simulation research on adaptive signal control of deformed intersection based on LSTM-GNN [J ] . Journal of System Simulation . 2025 , 37 ( 6 ): 1343 - 1351 .

YU H M , LIU W , MENG L L , et al . RWK-GNN: non-equilibrium graph fraud detection algorithm based on feature enhancement and subkernel decomposition [J ] . Acta Electronica Sinica , 2013 , 52 ( 10 ): 3382 - 3391 .

NUNES B A A , MENDONCA M , NGUYEN X N , et al . A survey of software-defined networking: past, present, and future of programmable networks [J ] . IEEE Communications Surveys & Tutorials , 2014 , 16 ( 3 ): 1617 - 1634 .

蒋莹莹 . 移动边缘网络中计算卸载优化策略研究 [D ] . 武汉 : 华中科技大学 , 2022 .

JIANG Y Y . Research on computing offload optimization strategy in mobile edge networks [D ] . Wuhan : Huazhong University of Science and Technology , 2022 .

XU X L , FANG Z J , QI L Y , et al . Distributed service unloading method based on deep reinforcement learning in the edge computing environment of vehicle networking [J ] . Journal of Computer Science , 2019 , 44 ( 12 ): 2382 - 2405 .

刘智铭 . 基于深度强化学习的物联网数据汇聚及分发机制研究 [D ] . 北京：北京邮电大学 , 2024 .

LIU Z M . Research on data aggregation and distribution mechanism of Internet of things based on deep reinforcement learning [D ] . Beijing : Beijing University of Posts and Telecommunications , 2024 .

孙翔宇 . 离线数据驱动的深度强化学习算法研究 [D ] . 成都 : 电子科技大学 , 2024 .

SUN X Y . Research on offline data-driven deep reinforcement learning algorithm [D ] . Chengdu : University of Electronic Science and Technology of China , 2024 .

于会涵 . 基于深度强化学习的网络数据传输优化关键技术研究 [D ] . 北京 : 北京邮电大学 , 2024 .

YU H H . Research on key technologies of network data transmission optimization based on deep reinforcement learning [D ] . Beijing : Beijing University of Posts and Telecommunications , 2024 .

唐珩膑 . 基于深度强化学习的异构网络缓存策略与数据传输研究 [D ] . 广州：华南理工大学 , 2022 .

TANG H B . Research on caching strategy and data transmission in heterogeneous network based on deep reinforcement learning [D ] . Guangzhou : South China University of Technology , 2022 .

左亚兵 , 王凯 , 杨帆 , 等 . 基于用户偏好的协作内容缓存策略 [J ] . 计算机应用研究 , 2022 , 39 ( 1 ): 123 - 127 .

ZUO Y B , WANG K , YANG F , et al . Cooperative content caching strategy based on user preference [J ] . Application Research of Computers , 2022 , 39 ( 1 ): 123 - 127 .

BEN-AMEUR A , ARALDO A , CHAHED T . Cache allocation in multi-tenant edge computing via online reinforcement learning [C ] // Proceedings of the ICC 2022 - IEEE International Conference on Communications . Piscataway : IEEE Press , 2022 : 859 - 864 .

SHEN L J , QIU S Q , CUI C , et al . Research on efficient data circulation strategy under "east number and west calculation" [J ] . Frontiers in Data and Computing , 2023 , 5 ( 5 ): 3 - 12 .

CETOLA S . A method for comparative analysis of trusted execution environments [D ] . Portland : Portland State University , 2021 .

ZHANG X J , KANG Y , CHEN K , et al . Trading off privacy, utility and efficiency in federated learning [EB ] . 2022 : 2209 .00230.

SERGEI A , BOHDAN T , FRANZ G , et al . SCONE: secure linux containers with intel SGX [C ] // Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI' 16), USENIXAssociation, USA, 2016 : 689 - 703 .

SUZAKI K , NAKAJIMA K , OI T , et al . TS-perf: general performance measurement of trusted execution environment and rich execution environment on intel SGX, arm TrustZone, and RISC-V keystone [J ] . IEEE Access , 2021 , 9 : 133520 - 133530 .

ALE L H , ZHANG N , FANG X J , et al . Delay-aware and energy-efficient computation offloading in mobile-edge computing using deep reinforcement learning [J ] . IEEE Transactions on Cognitive Communications and Networking , 2021 , 7 ( 3 ): 881 - 892 .

LI M S , GAO J , ZHAO L , et al . Deep reinforcement learning for collaborative edge computing in vehicular networks [J ] . IEEE Transactions on Cognitive Communications and Networking , 2020 , 6 ( 4 ): 1122 - 1135 .

蒋守花 , 王以伍 . SDCN中基于深度强化学习的移动边缘计算任务卸载算法研究 [J ] . 电信科学 , 2024 , 40 ( 2 ): 96 - 106 .

JIANG S H , WANG Y W . Research on task unloading algorithm of moving edge computing based on deep reinforcement learning in SDCN [J ] . Telecommunications Science , 2024 , 40 ( 2 ): 96 - 106 .

ZHAN W H , LUO C B , MIN G Y , et al . Mobility-aware multi-user offloading optimization for mobile edge computing [J ] . IEEE Transactions on Vehicular Technology , 2020 , 69 ( 3 ): 3341 - 3356 .

ALFAKIH T , HASSAN M M , GUMAEI A , et al . Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA [J ] . IEEE Access , 2020 , 8 : 54074 - 54084 .

LI D J ， XU S Y ， LI P Y . Deep reinforcement learning-empowered resource allocation for mobile edge computing in cellular V2X networks [J ] . Sensors , 2021 , 21 ( 2 ): 372 .

ABADI A , AGARWAL A , BARHAM P , et al . Tensor Flow: largescale machine learning on heterogeneous distributed systems [EB ] . 2021 .

CHEN F , WU L X , WANG M , et al . A low-carbon optimization scheduling method of CIES based on PPO algorithm [J ] . Electric Power Engineering Technology , 2024 , 43 ( 6 ): 88 - 99 .

MAO Y Y , ZHANG J , SONG S H , et al . Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems [J ] . IEEE Transactions on Wireless Communications , 2017 , 16 ( 9 ): 5994 - 6009 .

张燕 , 杨一帆 , 伊人 , 等 . 隐私计算场景下数据质量治理探索与实践 [J ] . 大数据 , 2022 , 8 ( 5 ): 55 - 73 .

ZHANG Y , YANG Y F , YI R , et al . Exploration and practice of data quality governance in privacy computing scenarios [J ] . Big Data Research , 2022 , 8 ( 5 ): 55 - 73 .

LILLICRAP T P , HUNT J J , PRITZEL A , et al . Continuous control with deep reinforcement learning [EB ] . 2015 : 1509 .02971.

JOHN S , SERGEY L , PHILIPP M , et al . Trust region policy optimization [C ] // Proceedings of the 32nd International Conference on International Conference on Machine Learning . [ S.l. : s.n. ] , 2015 , 37 (ICML' 15 ): 1889 - 1897 .

浏览量

501

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

星地边缘计算网络中基于 CA3C的异构任务卸载方法研究

一种面向资源高效利用的无蜂窝RAN分层协同资源分配算法

面向智慧城市数字孪生网络的资源分配方案

基于深度强化学习的算网协同动态路由调度算法