基于策略约束强化学习的算网多目标优化研究

沈林江; 曹畅; 崔超; 张岩

doi:10.11959/j.issn.1000-0801.2023165

您当前的位置：

首页 >

文章列表页 >

基于策略约束强化学习的算网多目标优化研究

专栏：算力网络 | 更新时间：2024-06-05

- 基于策略约束强化学习的算网多目标优化研究
- Research on constrained policy reinforcement learning based multi-objective optimization of computing power network
- 电信科学 2023年39卷第8期页码：136-148
- 作者机构：
  
  1. 浪潮通信信息系统有限公司，山东济南 250100
  2. 中国联合网络通信有限公司研究院，北京 100048
- 作者简介：
  
  [ "沈林江（1981- ），男，浪潮通信信息系统有限公司副总经理、算力网络研究院院长，主要从事算力网络相关前沿理论分析、技术研究和产品设计等工作" ]
  [ "曹畅（1984- ），男，博士，中国联合网络通信有限公司研究院未来网络研究部总监、高级工程师，主要从事算力网络、IPv6+网络新技术、未来网络体系架构等研究工作" ]
  [ "崔超（1993- ），男，现就职于浪潮通信信息系统有限公司，主要从事算力网络、AI算法等相关研究工作" ]
  [ "张岩（1983- ），男，博士，中国联合网络通信有限公司研究院未来网络研究部主任研究员、高级工程师，主要从事算力网络、云网融合/云计算、未来网络体系架构等研究工作" ]
- 基金信息：
- DOI：10.11959/j.issn.1000-0801.2023165
  中图分类号： TP393
- 网络出版日期：2023-08，
  
  纸质出版日期：2023-08-20
- 稿件说明：
移动端阅览
沈林江, 曹畅, 崔超, 等. 基于策略约束强化学习的算网多目标优化研究[J]. 电信科学, 2023,39(8):136-148.

Linjiang SHEN, Chang CAO, Chao CUI, et al. Research on constrained policy reinforcement learning based multi-objective optimization of computing power network[J]. Telecommunications science, 2023, 39(8): 136-148.
沈林江, 曹畅, 崔超, 等. 基于策略约束强化学习的算网多目标优化研究[J]. 电信科学, 2023,39(8):136-148. DOI： 10.11959/j.issn.1000-0801.2023165.

Linjiang SHEN, Chang CAO, Chao CUI, et al. Research on constrained policy reinforcement learning based multi-objective optimization of computing power network[J]. Telecommunications science, 2023, 39(8): 136-148. DOI： 10.11959/j.issn.1000-0801.2023165.

摘要

算力网络需要在满足用户业务需求的基础上最大化系统性能指标，现有方法主要通过多目标加权进行转换和求解，存在超参数难以确定、跨场景适用性差等问题。在分析算网目标特性的基础上，基于策略约束强化学习，将业务需求作为约束、系统性能指标作为优化目标，通过价值—策略—超参数的多级迭代策略，实现算网对用户业务需求的期望确定性保障和对系统性能的最优化。同时，研究了针对超参数寻优的多尺度步长（multi-scale step length，MSL）方法，进一步提升了系统的稳定性和准确性。仿真结果表明，所提方法在系统架构和负载变化情况下均具有良好的收敛性和稳定性。

Abstract

The computing power network needs to maximize the system performance index on the basis of meeting user business needs

and the existing methods are mainly based on the multi-objective weighting method

which has problems such as difficult to determine hyperparameters and poor cross-scenario applicability.Based on this

based on the analysis of the characteristics of the computing power network target

the user business requirements were taken as the policy constraints

and the performance indicators of the computing power network was taken as the optimization goal based on constrained policy optimization

and the expectation certainty of user business needs and the optimization of system performance through the value-strategy-hyper-parameter multi-level iterative strategy was realized.At the same time

the multi-scale step length (MSL) method for hyper-parameter optimization was studied

which further improved the stability and accuracy of the system.Simulation results show that the proposed method has good convergence and stability under the conditions of single terminal-single edge server

multi-terminal-multi-edge server and system load change.

关键词

Keywords

references

TANG X Y , CAO C , WANG Y X , et al . Computing power network:the architecture of convergence of computing and networking towards 6G requirement [J ] . China Communications , 2021 , 18 ( 2 ): 175 - 185 .

雷波 , 赵倩颖 , 赵慧玲 . 边缘计算与算力网络综述 [J ] . 中兴通讯技术 , 2021 , 27 ( 3 ): 3 - 6 .

LEI B , ZHAO Q Y , ZHAO H L . Overview of edge computing and computing power network [J ] . ZTE Technology Journal , 2021 , 27 ( 3 ): 3 - 6 .

雷波 , 刘增义 , 王旭亮 , 等 . 基于云、网、边融合的边缘计算新方案:算力网络 [J ] . 电信科学 , 2019 , 35 ( 9 ): 44 - 51 .

LEI B , LIU Z Y , WANG X L , et al . Computing network:a new multi-access edge computing [J ] . Telecommunications Science , 2019 , 35 ( 9 ): 44 - 51 .

李建飞 , 曹畅 , 李奥 , 等 . 算力网络中面向业务体验的算力建模 [J ] . 中兴通讯技术 , 2020 , 26 ( 5 ): 34 - 38 , 52 .

LI J F , CAO C , LI A , et al . Computing power modeling for business experience in computing power network [J ] . ZTE Technology Journal , 2020 , 26 ( 5 ): 34 - 38 , 52 .

何涛 , 杨振东 , 曹畅 , 等 . 算力网络发展中的若干关键技术问题分析 [J ] . 电信科学 , 2022 , 38 ( 6 ): 62 - 70 .

HE T , YANG Z D , CAO C , et al . Analysis of some key technical problems in the development of computing power network [J ] . Telecommunications Science , 2022 , 38 ( 6 ): 62 - 70 .

KHAN W Z , AHMED E , HAKAK S , et al . Edge computing:a survey [J ] . Future Generation Computer Systems , 2019 , 97 ( C ): 219 - 235 .

MAO Y Y , ZHANG J , SONG S H , et al . Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems [J ] . IEEE Transactions on Wireless Communications , 2017 , 16 ( 9 ): 5994 - 6009 .

MOUSAVI S S , SCHUKAT M , HOWLEY E . Deep reinforcement learning:an overview [C ] // Proceedings of SAI Intelligent Systems Conference (IntelliSys) . Heidelberg:Springer , 2016 : 426 - 440 .

LI Y , ZHANG X , ZENG T , et al . Task placement and resource allocation for edge machine learning:a GNN-based multi-agent reinforcement learning paradigm [J ] . arXiv preprint , 2023 ,arXiv:2302.00571.

ALE L H , ZHANG N , FANG X J , et al . Delay-aware and energy-efficient computation offloading in mobile-edge computing using deep reinforcement learning [J ] . IEEE Transactions on Cognitive Communications and Networking , 2021 , 7 ( 3 ): 881 - 892 .

LI M S , GAO J , ZHAO L , et al . Deep reinforcement learning for collaborative edge computing in vehicular networks [J ] . IEEE Transactions on Cognitive Communications and Networking , 2020 , 6 ( 4 ): 1122 - 1135 .

YANG A , WU M , CHENG B , et al . Reinforcement learning in computing and network convergence orchestration [J ] . arXiv preprint , 2022 ,arXiv:2209.10753.

JAIN T , AVANEESH , VERMA R , et al . Latency-memory optimized splitting of convolution neural networks for resource constrained edge devices [C ] // Proceedings of 2022 14th International Conference on Communication Systems ＆ Networks(COMSNETS) . Piscataway:IEEE Press , 2022 : 531 - 539 .

TESSLER C , MANKOWITZ D J , MANNOR S . Reward constrained policy optimization [J ] . arXiv preprint , 2018 ,arXiv:1805.11074.

ZHUANG S , GAO C X , HE Y , et al . QC-DQN:a novel constrained reinforcement learning method for computation offloading in multi-access edge computing [C ] // Proceedings of 2022 International Joint Conference on Neural Networks (IJCNN) . Piscataway:IEEE Press , 2022 : 1 - 8 .

BHATNAGAR S , LAKSHMANAN K . An online actor-critic algorithm with function approximation for constrained Markov decision processes [J ] . Journal of Optimization Theory and Applications , 2012 , 153 ( 3 ): 688 - 708 .

ACHIAM J , HELD D , TAMAR A , et al . Constrained policy optimization [J ] . arXiv preprint , 2017 ,arXiv:1705.10528.

浏览量

252

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向新型电力系统的fgOTN关键技术与应用研究

基于OS-MBRL的网络切片资源动态分配方法研究

多目标滚动时域优化的数学模型在运营商智能派单中的构建与应用

大语言模型对齐研究综述

算力网络路由调度技术研究