浏览全部资源
扫码关注微信
中国移动通信有限公司研究院,北京 100053
[ "张德朝(1979- ),男,博士,中国移动通信有限公司研究院基础网络技术研究所副所长、正高级工程师,主要从事 SPN/PTN、OTN/WDM、前传open-WDM、PON 和同步等光传输与接入网领域的技术研究和标准化工作。" ]
[ "孙将(1987- ),男,博士,中国移动通信有限公司研究院工程师,主要从事OTN/WDM、前传open-WDM的研究工作。" ]
[ "曹珊(1990- ),女,博士,中国移动通信有限公司研究院项目经理,主要从事高速大容量光通信技术及B1T电层技术的研究工作。" ]
[ "左铭青(1995- ),男,博士,中国移动通信有限公司研究院工程师,主要从事高速大容量光通信技术研究工作。" ]
[ "王东(1986- ),男,博士,中国移动通信有限公司研究院技术经理、主任研究员、高级工程师,主要从事400 Gbit/s/超400 Gbit/s OTN/WDM、智算无损OTN、前传open-WDM领域研究和标准化工作。" ]
[ "李晗(1975- ),男,博士,中国移动通信有限公司研究院基础网络技术研究所所长、教授级高级工程师,主要从事SPN/PTN、前传open-WDM、OTN/WDM、PON和同步等光通信领域研究工作。" ]
收稿日期:2025-03-04,
修回日期:2025-04-15,
纸质出版日期:2025-04-20
移动端阅览
张德朝,孙将,曹珊等.面向跨智算集群互联的新型HIC-OTN技术[J].电信科学,2025,41(04):53-60.
ZHANG Dechao,SUN Jiang,CAO Shan,et al.Novel HIC-OTN for interconnection of cross-intelligent computing clusters[J].Telecommunications Science,2025,41(04):53-60.
张德朝,孙将,曹珊等.面向跨智算集群互联的新型HIC-OTN技术[J].电信科学,2025,41(04):53-60. DOI: 10.11959/j.issn.1000-0801.2025117.
ZHANG Dechao,SUN Jiang,CAO Shan,et al.Novel HIC-OTN for interconnection of cross-intelligent computing clusters[J].Telecommunications Science,2025,41(04):53-60. DOI: 10.11959/j.issn.1000-0801.2025117.
随着全球 AI 产业蓬勃发展,大模型技术对算力需求越来越高,国内外大型科技公司正积极投建超万卡/超十万卡集群。超十万卡集群的发展受限于水电供应、建设投入等因素,而通过高速全光网构建多集群互联的基础网络底座,实现跨集群高效协同训练是重要的潜在解决方案。为满足跨智算集群互联超大带宽、超低时延、超高可靠需求,提出了无损智算光传送网(hitless intelligent computing optical transport network,HIC-OTN)的技术架构及关键技术方案,并基于HIC-OTN完成了首个104 km跨智算集群流水线并行(pipeline parallelism,PP)拉远训练现网技术试验,探索和验证了百公里级跨集群PP训练的可行性。基于800 Gbit/s HIC-OTN互联在52 km和104 km两个智算集群场景下,HIC-OTN完成了等同单节点训练效率98%以上的高效协同训练,并实现了光网络保护倒换对训练效率的无损和无感知。
With the rapid development of the global AI industry
the computational power demands of large-scale models continued to grow
prompting major technology companies worldwide to actively construct ultra-large-scale clusters exceeding 10 000 or even 100 000 GPU. Limited by natural resource supply
construction investment
and other constraints
the construction of a multi-cluster interconnected fundamental network through a high-speed all-optical network is an important potential solution for achieving efficient collaborative training across clusters. To meet the ultra-large bandwidth
ultra-low latency
and ultra-high reliability requirements of intelligent computing interconnection
a hitless intelligent computing optical transport network (HIC-OTN) and its key technological solutions were proposed. Based on HIC-OTN
the first field trial of 104 km cross-cluster pipeline parallelism (PP) training had been demonstrated
verifying the feasibility of 100 km-class cross-cluster PP training. Based on the 800 Gbit/s HIC-OTN interconnection
highly efficient collaborative training was achieved in two scenarios (52 km and 104 km clusters)
delivering over 98% of the single-node training efficiency. Moreover
hitless and imperceptible optical network protection switching was demonstrated
ensuring zero impact on training performance.
DISKIN M , BUKHTIYAROV A , RYABININ M , et al . Distributed deep learning in open collaborations [C ] // Proceedings of the 35th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2021 : 7879 - 7897 .
中国软件评测中心 . 人工智能大语言模型技术发展研究报告(2024年) [R ] . 2024 .
China Software Evaluation Center . Research report on the development of artificial intelligence large language model technology (2024) [R ] . 2024 .
丁宏庆 , 张鹏飞 , 牛红韦华 , 等 . 云化的智算中心万卡集群创新与实践 [J ] . 电信科学 , 2024 , 40 ( 12 ): 125 - 135 .
DING H Q , ZHANG P F , NIU H W H , et al . Cloud-based intelligent computing center ten-thousand card cluster innovation and practice [J ] . Telecommunications Science , 2024 , 40 ( 12 ): 125 - 135 .
YUAN B H , HE Y J , DAVIS J , et al . Decentralized training of foundation models in heterogeneous environments [C ] // Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022 ), New York : Curran Assiociates , 2024 : 25464 – 25477 .
SUN J , WANG D , QI B , et al . Decentralized training over 100 km based on optical transport network for artificial intelligence [C ] // Proceedings of the ECOC 2024 ; 50th European Conference on Optical Communication . Piscataway : IEEE Press , 2024 : 1459 - 1462 .
CAMERA M , OLSSON B E , BRUNO G . Beyond 100 Gbit/s: system implications towards 400G and 1 T[C ] // Proceedings of the 36th European Conference and Exhibition on Optical Communication . Piscataway : IEEE Press , 2010 : 1 - 15 .
WANG D , LI Y B , ZHANG D C , et al . Ultra-low-loss and large-effective-area fiber for 100 Gbit/s and beyond 100 Gbit/s coherent long-haul terrestrial transmission systems [J ] . Scientific Reports , 2019 , 9 ( 1 ): 17162 .
LORENCES-RIESGO A , BENDIMERAD D , LE-TRUNG K , et al . PCS-16QAM vs QPSK: what is the best choice for next-generation long-haul 400G? [C ] // Proceedings of the 2021 European Conference on Optical Communication (ECOC) . Piscataway : IEEE Press , 2021 : 1 - 4 .
ZUO M Q , YAN B L , GE D W , et al . 32- λ × 400 Gb/s single-carrier 120-GBaud QPSK coherent transmission over 3075-km G.652.D fiber link using OE-MCM prototype under field-deployed configuration [C ] // Proceedings of the 2023 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2023 : 1 - 3 .
ZUO M , WANG D , GE D , et al . Field trial of 40-λ×400-Gb/s single-carrier 138-GBd DP-QPSK real-time 6028 km transmission for computility network [C ] // Opto-Electronics and Communications Conference . Piscataway : IEEE press , 2024 : 1 - 3 .
GE D W , ZUO M Q , LIU H B , et al . Fully-loaded 80× 400 Gb/s DP-QPSK transmission with commercial 12-THz C6T L6T EDFAs over record distance of 7000 km [C ] // Proceedings of the 2023 Asia Communications and Photonics Conference/2023 International Photonics and Optoelectronics Meetings (ACP/POEM) . Piscataway : IEEE Press , 2023 : 1 - 4 .
ZUO M Q , ZHANG D C , GE D W , et al . Experimental demonstration of real-time fully-loaded $80 × 800 Gb/s transmission over 1253-km G . 652 .D fiber with commercial 12-THz C L-band EDFA and integrated WSS[C ] // Proceedings of the 2024 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC) . Piscataway : IEEE Press , 2024: 1 - 3 .
中华人民共和国工业和信息化部 . 城域 N× 400 Gbit/s 光波分复用: YD/T 3964-2021 [S ] . 2021 .
Ministry of Industry and Information Technology of the People’s Republic of China . Technical requirements for metro N×400 Gbit/s optical wavelength division multiplexing (WDM) system: YD/T 3964-2021 [S ] . 2021 .
PUTTNAM B J , LUIS R S , PHILLIPS I , et al . 402 Tb/s GMI data-rate OESCLU-band transmission [C ] // Proceedings of the 2024 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2024 : 1 - 3 .
ZUO M , ZHANG D , WANG D , et al . First field demonstration of real-time sub-100-Tb/s transmission with net 1.2-Tb/s channels over 12-THz-wide super C+L band along 305-km G.652.D fiber [C ] // Proceedings of the 2025 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2025 : 1 - 3 .
ZHANG Y , ZUO M , QIU Q , et al . 214-Tb/s transmission over 2×75-km in the S+C+L band with >1-Tb/s/λ signals using only doped fiber amplifiers [C ] // Proceedings of the 2025 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2025 : 1 - 3 .
ZHANG A X , LIU Y Y , YAN B L , et al . Record real-time 128.7 Tbit/s DWDM transmission over 75km G.654.D fiber using S+C+L 17THz bandwidth lumped doped fiber amplifiers [C ] // Proceedings of the 2024 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC) . Piscataway : IEEE Press , 2024 : 1 - 4 .
LI Y B , ZHANG D C , WANG Z W , et al . Field trial of concurrent co-cable and co-trench optical fiber online identification based on ensemble learning [J ] . Optics Express , 2023 , 31 ( 26 ): 42850 - 42865 .
LIU Y , WANG D , LI Y , et al . Highly-precise fiber co-route segment location with multi-modal vibration analysis and field demonstration for intelligent optical network [C ] // Proceedings of the 2025 Optical Fiber Communications Conference and Exhibition (OFC) . Piscataway : IEEE Press , 2025 : 1 - 3 .
连阳 . OTN 中保护倒换机制的研究 [J ] . 光通信技术 , 2009 , 33 ( 1 ): 16 - 18 .
LIAN Y . Research on protect switching of OTN [J ] . Optical Communication Technology , 2009 , 33 ( 1 ): 16 - 18 .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构