浏览全部资源
扫码关注微信
1. 中国信息通信研究院,北京 100191
2. 江苏为是科技有限公司,江苏 苏州 215000
3. 国网江苏省电力有限公司信息通信分公司,江苏 南京 210024
[ "赵俊峰(1979- ),男,中国信息通信研究院高级工程师,主要研究方向为分组传送、4G/5G移动承载、确定性网络和未来网络等" ]
[ "李芳(1974- ),女,中国信息通信研究院正高级工程师,主要研究方向为高速光通信、分组传送、4G/5G移动承载和软件定义光网络等" ]
[ "叶晓峰(1974- ),男,江苏为是科技有限公司产品经理,主要研究方向为高性能网络和RDMA网卡及智能网卡芯片" ]
[ "江凇(1983- ),男,国网江苏省电力有限公司信息通信分公司高级工程师,主要研究方向为电力光通信网络、语音交换网络等" ]
网络出版日期:2023-11,
纸质出版日期:2023-11-20
移动端阅览
赵俊峰, 李芳, 叶晓峰, 等. 面向广域RDMA的确定性网络需求与技术[J]. 电信科学, 2023,39(11):39-51.
Junfeng ZHAO, Fang LI, Xiaofeng YE, et al. Research on deterministic networking requirements and technologies for RDMA-WAN[J]. Telecommunications science, 2023, 39(11): 39-51.
赵俊峰, 李芳, 叶晓峰, 等. 面向广域RDMA的确定性网络需求与技术[J]. 电信科学, 2023,39(11):39-51. DOI: 10.11959/j.issn.1000-0801.2023248.
Junfeng ZHAO, Fang LI, Xiaofeng YE, et al. Research on deterministic networking requirements and technologies for RDMA-WAN[J]. Telecommunications science, 2023, 39(11): 39-51. DOI: 10.11959/j.issn.1000-0801.2023248.
随着生成式人工智能(artificial intelligence generated content,AIGC)大模型和智能计算应用的高速发展,面向广域远程直接存储器访问(remote direct memory access,RDMA)的确定性网络技术成为近期智算中心互联的研究热点。首先分析了广域 RDMA技术发展路线、实现方案和技术挑战,然后研究广域 RDMA部署时面临的各类网络问题,通过搭建实验室和外场测试环境开展了一系列测试验证,分析提出了广域RDMA的确定性需求特性和边界性能指标,最后对面向广域RDMA的确定性网络技术进行了总结和展望。
With the rapid development of AIGC large models and intelligent computing applications
deterministic networking technology for wide-area RDMA has become a recent research hotspot.The development path
implementation solutions and technical challenges of RDMA-WAN were analyzed
by setting up a laboratory and an outfield testing environment
with a series of testing and verifications.The deterministic demand characteristics and boundary performance indicators of RDMA-WAN were proposed.Finally
the deterministic networking technologies for RDMA-WAN were summarized and prospected.
CARTER S , MINICH M , RAO N S . Experimental evaluation of in finiband transport over local-and wide-area networks [C ] // Proceedings of the 2007 Spring Simulation Multi Conference-Volume 2 .[S.l.:s.n. ] , 2007 : 419 - 426 .
YU W K , RAO N S V , VETTER J S . Experimental analysis of InfiniBand transport services on WAN [C ] // Proceedings of 2008 International Conference on Networking,Architecture,and Storage . Piscataway:IEEE Press , 2008 : 233 - 240 .
LU Y , CHEN G , LI B , et al . Multi-path transport for RDMA in datacenters [C ] // Proceedings of 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) .[S.l.:s.n. ] , 2018 : 357 - 371 .
ALIZADEH M , EDSALL T , DHARMAPURIKAR S , et al . Conga:distributed congestion-aware load balancing for datacenters [J ] . ACM SIGCOMM Computer Communication Review , 2014 , 44 ( 4 ): 503 - 514 .
RAICIU C , BARRE S , PLUNTKE C , et al . Improving datacenter performance and robustness with multipath TCP [C ] // Proceedings of the ACM SIGCOMM 2011 Conference . New York:ACM Press , 2011 .
GUO C X , WU H T , DENG Z , et al . RDMA over commodity Ethernet at scale [C ] // Proceedings of the 2016 ACM SIGCOMM Conference . New York:ACM Press , 2016 : 202 - 215 .
SHALEV L , AYOUB H , BSHARA N , et al . A cloud-optimized transport protocol for elastic and scalable HPC [J ] . IEEE Micro , 2020 , 40 ( 6 ): 67 - 73 .
BAI W , ABDEEN S S , AGRAWAL A , et al . Empowering azure storage with RDMA [C ] // Proceedings of 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) .[S.l.:s.n. ] , 2023 : 49 - 67 .
CHEN Y Q , TIAN C , DONG J Q , et al . Swing:providing long-range lossless RDMA via PFC-relay [J ] . IEEE Transactions on Parallel and Distributed Systems , 2023 , 34 ( 1 ): 63 - 75 .
SHPINER A , ZAHAVI E , DAHLEY O , et al . RoCE rocks without PFC:detailed evaluation [C ] // Proceedings of the Workshop on Kernel-Bypass Networks . New York:ACM Press , 2017 : 25 - 30 .
MATHIS M , SEMKE J , MAHDAVI J , et al . The macroscopic behavior of the TCP congestion avoidance algorithm [J ] . ACM SIGCOMM Computer Communication Review , 1997 , 27 ( 3 ): 67 - 82 .
MITTAL R , SHPINER A , PANDA A , et al . Revisiting network support for RDMA [C ] // Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication . New York:ACM Press , 2018 : 313 - 326 .
ZHU Y B , ERAN H , FIRESTONE D , et al . Congestion control for large-scale RDMA deployments [J ] . ACM SIGCOMM Computer Communication Review , 2015 , 45 ( 4 ): 523 - 536 .
MITTAL R , LAM V T , DUKKIPATI N , et al . Timely:RTT-based congestion control for the data-center [J ] . ACM SIGCOMM Computer Communication Review , 2015 , 45 ( 4 ): 537 - 550 .
刘运渠 . 2023算力互联互通大会贡献示范单位“为是科技”展示长途RDMA [EB ] . 2023 .
LIU Y Q . Viscore Tech.demonstrates long-haul RDMA in the 2023 Computility Interconnection Conference [EB ] . 2023 .
陆璐 . 网络与AI融合的思考和实践 [R ] . 2023 .
LU L . Thinking and practicing network and AI convergence [R ] . 2023 .
YU Z L , SU B W , BAI W , et al . Understanding the micro-behaviors of hardware offloaded network stacks with lumina [C ] // Proceedings of the ACM SIGCOMM 2023 Conference . New York:ACM Press , 2023 : 1074 - 1087 .
李彤 . 广域确定性网络中的传输优化 [R ] . 2023 .
LI T . Transmission optimization in wide-area deterministic networks [R ] . 2023 .
0
浏览量
442
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构