1.成都医学院现代教育技术中心,四川 成都 610599
2.北京师范大学,北京 100875
[ "蒋守花(1988- ),女,成都医学院工程师,主要研究方向为边缘计算、人工智能、物联网、大数据。" ]
[ "冯军(1969- ),男,成都医学院教授,主要研究方向为5G双域网、医疗设备管理、教育信息技术应用、人工智能。" ]
[ "舒晖 (1972- ),女,成都医学院高级工程师,主要研究方向为教育信息技术应用、5G双域网、物联网。" ]
黎佳宜(2004- ),女,北京师范大学在读,主要研究方向为国际与比较教育、数据传输、新媒体传播等。
收稿:2025-02-25,
修回:2025-04-27,
录用:2025-07-07,
纸质出版:2025-08-20
移动端阅览
蒋守花,冯军,舒晖等.基于深度强化学习的数据传输策略优化研究[J].电信科学,2025,41(08):148-162.
JIANG Shouhua,FENG Jun,SHU Hui,et al.Research on optimization of data transmission strategies based on deep reinforcement learning[J].Telecommunications Science,2025,41(08):148-162.
蒋守花,冯军,舒晖等.基于深度强化学习的数据传输策略优化研究[J].电信科学,2025,41(08):148-162. DOI: 10.11959/j.issn.1000-0801.2025188.
JIANG Shouhua,FENG Jun,SHU Hui,et al.Research on optimization of data transmission strategies based on deep reinforcement learning[J].Telecommunications Science,2025,41(08):148-162. DOI: 10.11959/j.issn.1000-0801.2025188.
基于深度强化学习理论框架,提出分层递进式解决方案。首先,构建融合边缘计算节点的异构数据传输架构,建立具有时变特征的多维状态空间马尔可夫决策过程。其次,在传统深度Q网络(deep Q-learning network,DQN)算法中嵌入熵正则化约束项,结合同策略经验回放机制,形成增强型ESERDQN(improved DQN algorithm based on entropy and same-strategy experience replay)优化器。最终,设计五维评估指标体系(收敛速率、累积奖励值、能耗、传输时延、传输成本),开展多算法对比实验。仿真结果表明,ESERDQN在1 500训练周期内达成稳定收敛,较基准贪心算法、随机算法、DDPG算法及PPO分别提升收敛速度49.2%、41.7%、30.1%和13.3%;在综合业务指标方面,其单位能耗成本降低27.8%,关键任务时延控制在12.3 ms以内,验证了所提方法在智慧城市复杂传输场景下的技术优越性。
Based on the theoretical framework of deep reinforcement learning
a hierarchical and progressive solution was proposed. Firstly
a heterogeneous data transmission architecture integrating edge computing nodes was constructed
and a multi-dimensional state space Markov decision process with time-varying characteristics was established. Secondly
the entropy regularization constraint term was embedded in the traditional deep Q-learning network (DQN) algorithm
and the experience replay mechanism of the same strategy was combined. An enhanced ESERDQN (improved DQN algorithm based on entropy and same-strategy experience replay) optimizer was formed. Finally
a five-dimensional evaluation index system (convergence rate
cumulative reward value
energy consumption
end-to-end delay
transmission cost) was designed to carry out multi-algorithm comparison experiments. The simulation results show that ESERDQN achieves stable convergence within 1 500 training cycles
which improves the convergence speed by 49.2%
41.7%
30.1% and 13.3% respectively compared with the benchmark greedy algorithm
random algorithm
DDPG algorithm and PPO. In terms of comprehensive business indicators
the unit energy cost was reduced by 27.8%
and the delay of key tasks is controlled within 12.3 ms
which verifies the technical superiority of the proposed method in complex transmission scenarios of smart cities.
GUO Z J . The application process of artificial intelligence in smart cities [J ] . Automation and Instrumentation , 2024 , 39 ( 9 ): 162 - 164 .
JIA X F , GAO S , ZHOU Y , et al . An efficient cross-domain data flow technology framework for mega-city governance [J ] . Frontiers in Data and Computing Development , 2023 , 5 ( 5 ): 35 - 45 .
王亚平 , 余贶琭 , 滕永平 . 面向智慧交通的物联网实验教学探索 [J ] . 实验室研究与探索 , 2024 , 43 ( 1 ): 184 - 187 .
WANG Y P , YU K L , TENG Y P . Exploration on the experimental teaching of the Internet of things for intelligent transportation [J ] . Research and Exploration in Laboratory , 2024 , 43 ( 1 ): 184 - 187 .
ZHOU C Z , WU W , CAI X Q . Research on data security prevention and control strategies based on classification and classification [J ] . Frontiers in Data and Computing , 2023 , 5 ( 1 ): 128 - 135 .
CHEN K , CHEN L , XIE J M , et al . Simulation research on adaptive signal control of deformed intersection based on LSTM-GNN [J ] . Journal of System Simulation . 2025 , 37 ( 6 ): 1343 - 1351 .
YU H M , LIU W , MENG L L , et al . RWK-GNN: non-equilibrium graph fraud detection algorithm based on feature enhancement and subkernel decomposition [J ] . Acta Electronica Sinica , 2013 , 52 ( 10 ): 3382 - 3391 .
NUNES B A A , MENDONCA M , NGUYEN X N , et al . A survey of software-defined networking: past, present, and future of programmable networks [J ] . IEEE Communications Surveys & Tutorials , 2014 , 16 ( 3 ): 1617 - 1634 .
蒋莹莹 . 移动边缘网络中计算卸载优化策略研究 [D ] . 武汉 : 华中科技大学 , 2022 .
JIANG Y Y . Research on computing offload optimization strategy in mobile edge networks [D ] . Wuhan : Huazhong University of Science and Technology , 2022 .
XU X L , FANG Z J , QI L Y , et al . Distributed service unloading method based on deep reinforcement learning in the edge computing environment of vehicle networking [J ] . Journal of Computer Science , 2019 , 44 ( 12 ): 2382 - 2405 .
刘智铭 . 基于深度强化学习的物联网数据汇聚及分发机制研究 [D ] . 北京 : 北京邮电大学 , 2024 .
LIU Z M . Research on data aggregation and distribution mechanism of Internet of things based on deep reinforcement learning [D ] . Beijing : Beijing University of Posts and Telecommunications , 2024 .
孙翔宇 . 离线数据驱动的深度强化学习算法研究 [D ] . 成都 : 电子科技大学 , 2024 .
SUN X Y . Research on offline data-driven deep reinforcement learning algorithm [D ] . Chengdu : University of Electronic Science and Technology of China , 2024 .
于会涵 . 基于深度强化学习的网络数据传输优化关键技术研究 [D ] . 北京 : 北京邮电大学 , 2024 .
YU H H . Research on key technologies of network data transmission optimization based on deep reinforcement learning [D ] . Beijing : Beijing University of Posts and Telecommunications , 2024 .
唐珩膑 . 基于深度强化学习的异构网络缓存策略与数据传输研究 [D ] . 广州 : 华南理工大学 , 2022 .
TANG H B . Research on caching strategy and data transmission in heterogeneous network based on deep reinforcement learning [D ] . Guangzhou : South China University of Technology , 2022 .
左亚兵 , 王凯 , 杨帆 , 等 . 基于用户偏好的协作内容缓存策略 [J ] . 计算机应用研究 , 2022 , 39 ( 1 ): 123 - 127 .
ZUO Y B , WANG K , YANG F , et al . Cooperative content caching strategy based on user preference [J ] . Application Research of Computers , 2022 , 39 ( 1 ): 123 - 127 .
BEN-AMEUR A , ARALDO A , CHAHED T . Cache allocation in multi-tenant edge computing via online reinforcement learning [C ] // Proceedings of the ICC 2022 - IEEE International Conference on Communications . Piscataway : IEEE Press , 2022 : 859 - 864 .
SHEN L J , QIU S Q , CUI C , et al . Research on efficient data circulation strategy under "east number and west calculation" [J ] . Frontiers in Data and Computing , 2023 , 5 ( 5 ): 3 - 12 .
CETOLA S . A method for comparative analysis of trusted execution environments [D ] . Portland : Portland State University , 2021 .
ZHANG X J , KANG Y , CHEN K , et al . Trading off privacy, utility and efficiency in federated learning [EB ] . 2022 : 2209 .00230.
SERGEI A , BOHDAN T , FRANZ G , et al . SCONE: secure linux containers with intel SGX [C ] // Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI' 16), USENIXAssociation, USA, 2016 : 689 - 703 .
SUZAKI K , NAKAJIMA K , OI T , et al . TS-perf: general performance measurement of trusted execution environment and rich execution environment on intel SGX, arm TrustZone, and RISC-V keystone [J ] . IEEE Access , 2021 , 9 : 133520 - 133530 .
ALE L H , ZHANG N , FANG X J , et al . Delay-aware and energy-efficient computation offloading in mobile-edge computing using deep reinforcement learning [J ] . IEEE Transactions on Cognitive Communications and Networking , 2021 , 7 ( 3 ): 881 - 892 .
LI M S , GAO J , ZHAO L , et al . Deep reinforcement learning for collaborative edge computing in vehicular networks [J ] . IEEE Transactions on Cognitive Communications and Networking , 2020 , 6 ( 4 ): 1122 - 1135 .
蒋守花 , 王以伍 . SDCN中基于深度强化学习的移动边缘计算任务卸载算法研究 [J ] . 电信科学 , 2024 , 40 ( 2 ): 96 - 106 .
JIANG S H , WANG Y W . Research on task unloading algorithm of moving edge computing based on deep reinforcement learning in SDCN [J ] . Telecommunications Science , 2024 , 40 ( 2 ): 96 - 106 .
ZHAN W H , LUO C B , MIN G Y , et al . Mobility-aware multi-user offloading optimization for mobile edge computing [J ] . IEEE Transactions on Vehicular Technology , 2020 , 69 ( 3 ): 3341 - 3356 .
ALFAKIH T , HASSAN M M , GUMAEI A , et al . Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA [J ] . IEEE Access , 2020 , 8 : 54074 - 54084 .
LI D J , XU S Y , LI P Y . Deep reinforcement learning-empowered resource allocation for mobile edge computing in cellular V2X networks [J ] . Sensors , 2021 , 21 ( 2 ): 372 .
ABADI A , AGARWAL A , BARHAM P , et al . Tensor Flow: largescale machine learning on heterogeneous distributed systems [EB ] . 2021 .
CHEN F , WU L X , WANG M , et al . A low-carbon optimization scheduling method of CIES based on PPO algorithm [J ] . Electric Power Engineering Technology , 2024 , 43 ( 6 ): 88 - 99 .
MAO Y Y , ZHANG J , SONG S H , et al . Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems [J ] . IEEE Transactions on Wireless Communications , 2017 , 16 ( 9 ): 5994 - 6009 .
张燕 , 杨一帆 , 伊人 , 等 . 隐私计算场景下数据质量治理探索与实践 [J ] . 大数据 , 2022 , 8 ( 5 ): 55 - 73 .
ZHANG Y , YANG Y F , YI R , et al . Exploration and practice of data quality governance in privacy computing scenarios [J ] . Big Data Research , 2022 , 8 ( 5 ): 55 - 73 .
LILLICRAP T P , HUNT J J , PRITZEL A , et al . Continuous control with deep reinforcement learning [EB ] . 2015 : 1509 .02971.
JOHN S , SERGEY L , PHILIPP M , et al . Trust region policy optimization [C ] // Proceedings of the 32nd International Conference on International Conference on Machine Learning . [ S.l. : s.n. ] , 2015 , 37 (ICML' 15 ): 1889 - 1897 .
0
浏览量
501
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621