浏览全部资源
扫码关注微信
1.北京邮电大学信息与通信工程学院,北京 100876
2.北京邮电大学信息光子学与光通信全国重点实验室,北京 100876
[ "李韫瑄(1998- ),女,北京邮电大学信息与通信工程学院博士生,主要研究方向为IP/光网络和智算中心光网络。" ]
[ "杨亚萍(2002- ),女,北京邮电大学信息与通信工程学院硕士生,主要研究方向为智算中心光网络。" ]
[ "涂佳一(2002- ),女,北京邮电大学信息与通信工程学院硕士生,主要研究方向为智算中心光网络。" ]
[ "顾仁涛(1983- ),男,北京邮电大学信息与通信工程学院教授、博士生导师,主要研究方向为智能信息网络和智算光网络。" ]
[ "纪越峰(1960- ),男,北京邮电大学信息与通信工程学院教授、博士生导师,北京邮电大学信息光子学与光通信全国重点实验室副主任,主要研究方向为宽带通信网络和光通信技术。" ]
收稿日期:2025-02-10,
修回日期:2025-04-08,
纸质出版日期:2025-04-20
移动端阅览
李韫瑄,杨亚萍,涂佳一等.面向智算中心间互联的光网络关键技术研究[J].电信科学,2025,41(04):3-19.
LI Yunxuan,YANG Yaping,TU Jiayi,et al.Research on key technologies of optical networks for interconnection between artificial intelligent data centers[J].Telecommunications Science,2025,41(04):3-19.
李韫瑄,杨亚萍,涂佳一等.面向智算中心间互联的光网络关键技术研究[J].电信科学,2025,41(04):3-19. DOI: 10.11959/j.issn.1000-0801.2025106.
LI Yunxuan,YANG Yaping,TU Jiayi,et al.Research on key technologies of optical networks for interconnection between artificial intelligent data centers[J].Telecommunications Science,2025,41(04):3-19. DOI: 10.11959/j.issn.1000-0801.2025106.
智算中心作为大模型应用的关键算力基础设施,其高效运行依赖于高性能的光网络传输底座。然而,智算中心间光网络面临包括高实时性、高突发性和高可靠性的三重互联需求带来的诸多技术挑战。基于此,需要以高实时资源分配技术,助力智算光网络资源分配突破局部调度限制,以应对业务传输与调度时延挑战;以自适应协同优化技术,引导网络从被动调整向主动协同持续发展,以应对业务动态高突发性挑战;以前摄性故障恢复技术,推动网络从被动恢复故障向主动干预有序演进,以应对智算光网络的高可靠性挑战。展望未来,高实时大规模调度技术、算网深度协同技术和智算数字孪生技术,将为智算中心间互联的进一步发展提供新的机遇。
As a critical computing infrastructure for large-scale model applications
the efficient operation of artificial intelligence data center relies on high-performance optical network transmission infrastructure. However
the optical networks interconnecting artificial intelligence data center face numerous challenges in meeting the demands of high real-time
high burstiness
and high reliability. Based on this
real-time resource allocation aims to overcome localized scheduling limitations in optical networks interconnecting artificial intelligence data center and reduce transmission and scheduling delays. Adaptive and collaborative optimization enables a continuous progression from passive adjustment to active collaboration in response to dynamic traffic bursts. Proactive failure recovery aims to achieve an orderly evolution from passive restoration to active intervention for reliability in optical networks interconnecting artificial intelligence data center. Future developments in large-scale real-time scheduling
deep computing-network convergence
and AI digital twins will drive new advancements in artificial intelligence data center interconnections.
中国移动通信集团有限公司 . 面向超万卡集群的新型智算技术白皮书 [R ] . 北京 : 中国移动通信集团有限公司 , 2024 .
China Mobile Communications Group Limited . White paper on new intelligent computing technology for over 10 000 card clusters [R ] . Beijing : China Mobile Communications Group Co., Ltd. , 2024 .
中国电信股份有限公司研究院 . 分布式智算中心无损网络技术白皮书 [R ] . 北京 : 中国电信股份有限公司研究院 , 2024 .
China Telecom Research Institute . White paper on lossless network technology of distributed intelligence center [R ] . Beijing : Research Institute of China Telecom Co., Ltd. , 2024 .
浪潮信息 , 中国信息通信研究院 . 人工智能算力高质量发展评估体系报告 [R ] . 北京 : 中国信息通信研究院 , 2024 .
Inspur Information , China Academy of Information and Communications Technology . Report on high-quality development evaluation system of artificial intelligence computing power [R ] . Beijing : China Academy of Information and Communications Technology , 2024 .
CHOI W , SHIN M , LEE H , et al . Multi-task learning for real-time autonomous driving leveraging task-adaptive attention generator [C ] // Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2024 : 14732 - 14739 .
王光全 , 满祥锟 , 徐博华 , 等 . 确定性光传输支撑广域长距算力互联 [J ] . 邮电设计技术 , 2024 ( 2 ): 7 - 13 .
WANG G Q , MAN X K , XU B H , et al . Deterministic optical transmission for wide area and long-distance computing power interconnection [J ] . Designing Techniques of Posts and Telecommunications , 2024 ( 2 ): 7 - 13 .
SAMBO N , FERRARI A , NAPOLI A , et al . Provisioning in multi-band optical networks [J ] . Journal of Lightwave Technology , 2020 , 38 ( 9 ): 2598 - 2605 .
DIN D R . Heuristic algorithms for demand provisioning in hybrid single/multi-band elastic optical networks [C ] // Proceedings of the 2023 IEEE 8th Optoelectronics Global Conference (OGC) . Piscataway : IEEE Press , 2023 : 8 - 12 .
BEN TERKI A , PEDRO J , EIRA A , et al . Deep reinforcement learning for resource allocation in multi-band optical networks [C ] // Proceedings of the 2024 International Conference on Optical Network Design and Modeling (ONDM) . Piscataway : IEEE Press , 2024 : 1 - 4 .
CHEN C , XIAO S L , ZHOU F , et al . Throughput maximization in multi-band optical networks with column generation [C ] // Proceedings of the ICC 2024 - IEEE International Conference on Communications . Piscataway : IEEE Press , 2024 : 3034 - 3039 .
MEHRABI M , BEYRANVAND H , EMADI M J , et al . Efficient statistical QoT-aware resource allocation in EONs over the C+L-band: a multi-period and low-margin perspective [J ] . Journal of Optical Communications and Networking , 2024 , 16 ( 5 ): 577 - 592 .
WU Q , WANG J D , CHEN S B , et al . Resource allocation problem in multi-band space-division multiplexing elastic optical networks [C ] // Proceedings of the 2022 18th International Conference on Computational Intelligence and Security (CIS) . Piscataway : IEEE Press , 2022 : 225 - 228 .
LIN X , LIN H H , ZHANG C X , et al . ANN-assisted scheduling method for bulk data transfers in optical computing power networks [C ] // Proceedings of the 2023 Opto-Electronics and Communications Conference (OECC) . Piscataway : IEEE Press , 2023 : 1 - 6 .
LIN X , JI S , YUE S N , et al . Adaptive multi-path SnF scheduling method for delay-sensitive transfers across inter-datacenter optical networks [C ] // Proceedings of the 2022 International Conference on Optical Network Design and Modeling (ONDM) . Piscataway : IEEE Press , 2022 : 1 - 6 .
ZHOU Y , RAMAMURTHY B , GUO B L , et al . Supporting dynamic bandwidth adjustment based on virtual transport link in software-defined IP over optical networks [J ] . Journal of Optical Communications and Networking , 2018 , 10 ( 3 ): 125 - 137 .
SOOD N , KUMAR P . Determination of traffic utilization for DWDM networks and enhanced traffic engineering applications [J ] . Technical Disclosure Commons , 2022 : 1 - 15 .
HALABI W . Distributed routing scheme for IP-based elastic optical networks (EONs) [C ] // Proceedings of the 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) . Piscataway : IEEE Press , 2023 : 1 - 4 .
CHEN B W , LIU L , FAN Y X , et al . Low-latency partial resource offloading in cloud-edge elastic optical networks [J ] . Journal of Optical Communications and Networking , 2024 , 16 ( 2 ): 142 - 158 .
ZHOU Z X , GU R T , ZHANG X Y , et al . Opti-DeepRoute: a topology-adaptive deep reinforcement learning based service provisioning framework for elastic optical network [C ] // Proceedings of the IEEE INFOCOM 2024 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) . Piscataway : IEEE Press , 2024 : 1 - 2 .
ANDREOLETTI D , ROTTONDI C , BIANCO A , et al . A machine learning framework for scalable routing and wavelength assignment in large optical networks [C ] // Proceedings of the 2021 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2021 : 1 - 3 .
ZHANG X Y , GU R T , DONG J S , et al . Field trial of privacy-preserving resource allocation in multi-domain optical networks based on federated reinforcement learning [C ] // Proceedings of the 2023 Asia Communications and Photonics Conference/2023 International Photonics and Optoelectronics Meetings (ACP/POEM) . Piscataway : IEEE Press , 2023 : 1 - 4 .
CHATTERJEE B C , BA S , OKI E . Fragmentation problems and management approaches in elastic optical networks: a survey [J ] . IEEE Communications Surveys & Tutorials , 2018 , 20 ( 1 ): 183 - 210 .
ZHANG M Y , YOU C S , ZHU Z Q . On the parallelization of spectrum defragmentation reconfigurations in elastic optical networks [J ] . IEEE/ACM Transactions on Networking , 2016 , 24 ( 5 ): 2819 - 2833 .
CHATTERJEE B C , WADUD A , OKI E . Proactive fragmentation management scheme based on crosstalk-avoided batch processing for spectrally-spatially elastic optical networks [J ] . IEEE Journal on Selected Areas in Communications , 2021 , 39 ( 9 ): 2719 - 2733 .
CALDERÓN F , LOZADA A , MORALES P , et al . Heuristic approaches for dynamic provisioning in multi-band elastic optical networks [J ] . IEEE Communications Letters , 2022 , 26 ( 2 ): 379 - 383 .
TRINDADE S , DA FONSECA N L S . Machine learning for spectrum defragmentation in space-division multiplexing elastic optical networks [J ] . IEEE Network , 2021 , 35 ( 1 ): 326 - 332 .
ETEZADI E , NATALINO C , DIAZ R , et al . DeepDefrag: a deep reinforcement learning framework for spectrum Defragmentation [C ] // Proceedings of the GLOBECOM 2022-2022 IEEE Global Communications Conference . Piscataway : IEEE Press , 2022 : 3694 - 3699 .
MEHRABI M , BEYRANVAND H , EMADI M J . Multi-band elastic optical networks: inter-channel stimulated Raman scattering-aware routing, modulation level and spectrum assignment [J ] . Journal of Lightwave Technology , 2021 , 39 ( 11 ): 3360 - 3370 .
JANA R K , CHATTERJEE B C , SINGH A P , et al . Quality-aware resource provisioning for multiband elastic optical networks: a deep-learning-assisted approach [J ] . Journal of Optical Communications and Networking , 2022 , 14 ( 11 ): 882 - 893 .
TENG Y R , NATALINO C , ARPANAEI F , et al . DRL-assisted dynamic QoT-aware service provisioning in multi-band elastic optical networks [C ] // Proceedings of the 2024 European Conference on Optical Communication (ECOC) . Piscataway : IEEE Press , 2024 : 1 - 4 .
ASIRI A , WANG B . Deep reinforcement learning for QoT-aware routing, modulation, and spectrum assignment in elastic optical networks [J ] . Journal of Lightwave Technology , 2025 , 43 ( 1 ): 42 - 60 .
LI Z K , GU R T , WANG L N , et al . Computing-aware proactive network reconfiguration for optical networks interconnected edge computing system [C ] // Proceedings of the 2021 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2021 : 1 - 3 .
WANG L N , GU R T , LI Z K , et al . Computing-aware proactive IP-optical integrated network restructuring for edge computing [C ] // Proceedings of the 2021 19th International Conference on Optical Communications and Networks (ICOCN) . Piscataway : IEEE Press , 2021 : 1 - 3 .
CHEN Y Q , TIAN C , DONG J Q , et al . Swing: providing long-range lossless RDMA via PFC-relay [J ] . IEEE Transactions on Parallel and Distributed Systems , 2023 , 34 ( 1 ): 63 - 75 .
YU P W , XUE F Y , TIAN C , et al . Bifrost: extending RoCE for long distance inter-DC links [C ] // Proceedings of the 2023 IEEE 31st International Conference on Network Protocols (ICNP) . Piscataway : IEEE Press , 2023 : 1 - 12 .
HUANG C Y , XUE F Y , YU P W , et al . Minimizing buffer utilization for lossless inter-DC links [J ] . IEEE/ACM Transactions on Networking , 2024 , 32 ( 6 ): 4960 - 4975 .
LONG M F , HAN J P , WANG W T , et al . LSCC: link-segmented congestion control for RDMA in cross-datacenter networks [C ] // Proceedings of the 2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS) . Piscataway : IEEE Press , 2024 : 1 - 10 .
ZUO T Y , SUN T , ZHU S Y , et al . LoWAR: enhancing RDMA over lossy WANs with transparent error correction [C ] // Proceedings of the 2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS) . Piscataway : IEEE Press , 2024 : 1 - 10 .
TAN Y X , MAN X K , WANG G Q , et al . Field trial of long-distance RDMA lossless transmission for wide-area data center interconnection [C ] // Proceedings of the 2024 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC) . Piscataway : IEEE Press , 2024 : 1 - 3 .
ODA S , BOUDA M , GE Y , et al . Innovative optical networking by optical performance monitoring and learning process [C ] // Proceedings of the 2018 European Conference on Optical Communication (ECOC) . Piscataway : IEEE Press , 2018 : 1 - 3 .
BARZEGAR S , RUIZ M , SGAMBELLURI A , et al . Soft-failure detection, localization, identification, and severity prediction by estimating QoT model input parameters [J ] . IEEE Transactions on Network and Service Management , 2021 , 18 ( 3 ): 2627 - 2640 .
CHENG Y J , ZHANG W K , FU S N , et al . Transfer learning simplified multi-task deep neural network for PDM-64QAM optical performance monitoring [J ] . Optics Express , 2020 , 28 ( 5 ): 7607 - 7617 .
WANG C X , FU S N , WU H , et al . Joint OSNR and CD monitoring in digital coherent receiver using long short-term memory neural network [J ] . Optics Express , 2019 , 27 ( 5 ): 6936 - 6945 .
FENG J C , JIANG L , YAN L S , et al . Intelligent optical performance monitoring based on intensity and differential-phase features for digital coherent receivers [J ] . Journal of Lightwave Technology , 2022 , 40 ( 12 ): 3592 - 3601 .
李鸿 , 刘武 , 罗鸣 . 基于机器学习的光网络监测与优化方法 [J ] . 光通信研究 , 2024 ( 3 ): 5 - 14 .
LI H , LIU W , LUO M . Optical network monitoring and optimization methods based on machine learning [J ] . Study on Optical Communications , 2024 ( 3 ): 5 - 14 .
POINTURIER Y . Machine learning techniques for quality of transmission estimation in optical networks [J ] . Journal of Optical Communications and Networking , 2021 , 13 ( 4 ): B60 - B71 .
CHEN X L , LIU C Y , PROIETTI R , et al . Automating optical network fault management with machine learning [J ] . IEEE Communications Magazine , 2022 , 60 ( 12 ): 88 - 94 .
LUN H Z , LIU X M , CAI M , et al . GAN based soft failure detection and identification for long-haul coherent transmission systems [C ] // Proceedings of the Optical Fiber Communication Conference (OFC) 2021 . Optica Publishing Group , 2021 : 1 - 3 .
ZHOU Y , SUN C , LIU H H , et al . Flow event telemetry on programmable data plane [C ] // Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication . New York : ACM , 2020 : 76 - 89 .
XIE S X , HU G Y , XING C Y , et al . FINT: flexible in-band network telemetry method for data center network [J ] . Computer Networks , 2022 , 216 : 109232 .
ALGHADHBAN A . F4Tele: FSO for data center network management and packet telemetry [J ] . Computer Networks , 2021 , 186 : 107711 .
GAO J C , WANG H Y , SHEN H Y . Task failure prediction in cloud data centers using deep learning [J ] . IEEE Transactions on Services Computing , 2020 , 15 ( 3 ): 1411 - 1422 .
张春宇 . 光网络系统故障智能预测与分析技术研究 [D ] . 北京 : 北京邮电大学 , 2022 .
ZHANG C Y . Research on intelligent fault prediction and analysis technology of optical network system [D ] . Beijing : Beijing University of Posts and Telecommunications , 2022 .
ZHANG C Y , WANG D S , JIA J W , et al . Attention mechanism-driven potential fault cause identification in optical networks [C ] // Proceedings of the 2021 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2021 : 1 - 3 .
MURPHY K , LAVIGNOTTE A , LEPERS C . Fault prediction for optical access network equipment using decision tree methods [C ] // Proceedings of the 2023 Asia Communications and Photonics Conference/2023 International Photonics and Optoelectronics Meetings (ACP/POEM) . Piscataway : IEEE Press , 2023 : 1 - 5 .
ABHISHEK SINGH J , SACHIN K M R , SHUSHRUTHA K S . Implementation of topology independent loop free alternate with segment routing traffic [C ] // Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) . Piscataway : IEEE Press , 2021 : 1 - 5 .
VENTRE P L , SALSANO S , POLVERINI M , et al . Segment routing: a comprehensive survey of research activities, standardization efforts, and implementation results [J ] . IEEE Communications Surveys & Tutorials , 2021 , 23 ( 1 ): 182 - 221 .
郭学让 , 蒋一可 , 李亚平 , 等 . 持续性灾害下风险感知的光网络业务恢复算法 [J ] . 光通信研究 , 2024 ( 2 ): 10 - 15 .
GUO X R , JIANG Y K , LI Y P , et al . Risk-aware optical network service restoration algorithm under persistent disasters [J ] . Study on Optical Communications , 2024 ( 2 ): 10 - 15 .
KRAUSS T , MCCOLLUM J . Solving the network shortest path problem on a quantum annealer [J ] . IEEE Transactions on Quantum Engineering , 2020 , 1 : 3101512 .
BHARTI K , CERVERA-LIERTA A , KYAW T H , et al . Noisy intermediate-scale quantum algorithms [J ] . Reviews of Modern Physics , 2022 , 94 ( 1 ): 015004 .
OSABA E , VILLAR-RODRIGUEZ E , GOMEZ-TEJEDOR A , et al . Hybrid quantum solvers in production: how to succeed in the NISQ era? [J ] . arXiv preprint , 2024 : 2401 .10302.
LIAN M , GU R T , QU Y Y , et al . Flexible optical network enabled hybrid recovery for edge network with reinforcement learning [C ] // Proceedings of the 2020 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2020 : 1 - 3 .
LI Y X , GU R T , JI Y F . Multi-node cooperative recovery against IP node failure enabled by flexible optical network [C ] // Proceedings of the 2023 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2023 : 1 - 3 .
GU R T , QU Y Y , LIAN M , et al . Flexible optical network enabled proactive cross-layer restructuring for 5G/B5G backhaul network with machine learning engine [C ] // Proceedings of the 2020 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2020 : 1 - 3 .
Su Q Y , Sun J X , LI J . Node recovery optimization of cyber-physical power systems based on the probability of fault propagation [C ] // Proceedings of the 2024 China Automation Congress (CAC) . Piscataway : IEEE Press , 2024 : 140 - 145 .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构