苏州大学电子信息学院,江苏 苏州 215006
[ "黄婷婷(2000- ),女,苏州大学电子信息学院硕士生,主要研究方向为智算中心光网络。" ]
[ "袁志林(2000- ),男,苏州大学电子信息学院硕士生,主要研究方向为智算中心光网络。" ]
[ "翟德伟(1999- ),男,苏州大学电子信息学院硕士生,主要研究方向为智算中心光网络。" ]
[ "李泳成(1989- ),男,博士,苏州大学电子信息学院副研究员,主要研究方向为骨干光网络和智算中心光网络。" ]
收稿:2025-02-21,
修回:2025-03-25,
纸质出版:2025-04-20
移动端阅览
黄婷婷,袁志林,翟德伟等.面向智算中心光网络的Ring Allreduce业务算网协同部署算法研究[J].电信科学,2025,41(04):44-52.
HUANG Tingting,YUAN Zhilin,ZHAI Dewei,et al.Integrated communication and computing deployment algorithms for Ring Allreduce in optical networks of intelligent computing centers[J].Telecommunications Science,2025,41(04):44-52.
黄婷婷,袁志林,翟德伟等.面向智算中心光网络的Ring Allreduce业务算网协同部署算法研究[J].电信科学,2025,41(04):44-52. DOI: 10.11959/j.issn.1000-0801.2025115.
HUANG Tingting,YUAN Zhilin,ZHAI Dewei,et al.Integrated communication and computing deployment algorithms for Ring Allreduce in optical networks of intelligent computing centers[J].Telecommunications Science,2025,41(04):44-52. DOI: 10.11959/j.issn.1000-0801.2025115.
随着人工智能大模型训练与推理业务的快速发展,智算中心面临算网协同调度的新挑战。为优化分布式智算Ring Allreduce业务的部署问题,首先,通过扩展传统波平面,开发了算力波平面技术,实现了算力与网络资源的一体化虚拟管理。然后,基于算力波平面,提出了一种高效路由、波长、算力和时隙分配(routing,wavelength,computing power and time slot assignment,RWCTA)算法用于环规约(Ring Allreduce)业务部署。仿真结果表明,与传统基于波平面的部署算法相比,基于算力波平面的RWCTA算法能有效降低62.4%的总业务完成时间和54.5%的平均业务计算时间。
The rapid development of artificial intelligence has posed significant challenges to intelligent computing centers
especially in the collaborative scheduling of computational and networking resources. To address the deployment optimization issues of distributed intelligent computing services (Ring Allreduce)
a novel technology called the computing power-wavelength plane (CWP) which enhanced the traditional waveplane framework to enable integrated virtual management of computational and network resources was firstly proposed. Based on the CWP
an efficient routing
wavelength
computing power
and time slot assignment (RWCTA) algorithm was proposed for Ring Allreduce service deployment. Simulation results demonstrate that
compared to conventional waveplane-based algorithms
the RWCTA algorithm based on the CWP effectively reduces the overall task completion time by 62.4% and the average task computation time by 54.5%.
唐雄燕 , 魏步征 , 沈世奎 , 等 . 智算数据中心光电交换技术综述(特邀) [J ] . 光通信研究 , 2024 ( 5 ): 1 - 13 .
TANG X Y , WEI B Z , SHEN S K , et al . Overview of optoelectronic switching technology in artificial intelligent data centers [J ] . Study on Optical Communications , 2024 ( 5 ): 1 - 13 .
刘璐 , 吴冰冰 , 赵文玉 . 智算中心光互联技术呈高速率、低能耗、高可靠三大发展态势 [J ] . 通信世界 , 2025 ( 3 ): 36 - 37 .
LIU L , WU B B , ZHAO W Y . Intelligent computing center optical interconnection technology exhibits three major development trends: high-speed, low-power consumption, and high-reliability [J ] . Communications World , 2025 ( 3 ): 36 - 37 .
URATA R , LIU H , YASUMURA K , et al . Apollo: large-scale deployment of optical circuit switching for datacenter networking [C ] // Proceedings of the 2023 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2023 : 1 - 3 .
BALLANI H , COSTA P , BEHRENDT R , et al . Sirius: a flat datacenter network with nanosecond optical switching [C ] // Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication . New York : ACM Press , 2020 : 782 - 797 .
中国移动通信研究院 . 面向AI大模型的智算中心网络演进白皮书(2023年) [R ] . 2023 .
China Mobile Research Institute . Towards network evolution for AI large models: white paper on intelligent computing centers [R ] . 2023 .
中国通信工业协会 . 中国智算中心产业发展白皮书(2024年) [R ] . 2024 .
China Communications Industry Association . White paper on the development of smart computing center industry in China [R ] . 2024 .
LIN J M , ZHAI Z W , LI Y C , et al . Performance evaluation of WSS-based all-optical spine-leaf data center network [C ] // Proceedings of the 2022 Asia Communications and Photonics Conference (ACP) . Piscataway : IEEE Press , 2022 : 1185 - 1190 .
CLOS C . A study of non-blocking switching networks [J ] . Bell System Technical Journal , 1953 , 32 ( 2 ): 406 - 424 .
LIN J M , CHANG T H , ZHAI Z W , et al . Wavelength selective switch-based clos network: blocking theory and performance analyses [J ] . Journal of Lightwave Technology , 2022 , 40 ( 17 ): 5842 - 5853 .
DONG J B , CAO Z , ZHANG T , et al . EFLOPS: algorithm and system co-design for a high performance distributed training platform [C ] // Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) . Piscataway : IEEE Press , 2020 : 610 - 622 .
SHEN G , BOSE S K , CHENG T H , et al . Efficient heuristic algorithms for light-path routing and wavelength assignment in WDM networks under dynamically varying loads [J ] . Computer Communications , 2001 , 24 ( 3/4 ): 364 - 373 .
中国信息通信研究院 . 中国算力发展指数白皮书(2023年) [R ] . 2023 .
China Information and Communication Research Institute . White paper on China's computing power development index (2023) [R ] . 2023 .
李建飞 , 曹畅 , 李奥 , 等 . 算力网络中面向业务体验的算力建模 [J ] . 中兴通讯技术 , 2020 , 26 ( 5 ): 34 - 38, 52 .
LI J F , CAO C , LI A , et al . Computing power modeling for business experience in computing power network [J ] . ZTE Technology Journal , 2020 , 26 ( 5 ): 34 - 38, 52 .
TANG Y A , YUAN T T , LIU B , et al . Effective *-flow schedule for optical circuit switching based data center networks: a comprehensive survey [J ] . Computer Networks , 2021 ( 197 ): 108321 .
BENJAMIN J L , GERARD T , LAVERY D , et al . PULSE: optical circuit switched data center architecture operating at nanosecond timescales [J ] . Journal of Lightwave Technology , 2020 , 38 ( 18 ): 4906 - 4921 .
0
浏览量
370
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621