
浏览全部资源
扫码关注微信
1.腾讯科技(深圳)有限公司,广东 深圳 518000
2.南京大学计算机软件新技术国家重点实验室,江苏 南京 210000
Received:07 January 2026,
Revised:2026-01-29,
Accepted:09 April 2026,
移动端阅览
WANG Yachen, XIA Yinben, WANG Zibo, et al. Astral 3.0: Design and Practice of High-Performance Network Infrastructure for Large-Scale MoE Training and Inference[J/OL]. Telecommunications Science, 2026.
WANG Yachen, XIA Yinben, WANG Zibo, et al. Astral 3.0: Design and Practice of High-Performance Network Infrastructure for Large-Scale MoE Training and Inference[J/OL]. Telecommunications Science, 2026. DOI: 10.11959/j.issn.1000-0801.DXKX260015.
随着大模型架构向稀疏化混合专家模型(MoE)演进,训练及推理场景下的通信开销在端到端时延中的占比显著上升,通信性能逐渐成为制约系统性能的关键因素。针对大规模 MoE 训练与推理场景中 All-to-All 通信压力大、带宽及时延敏感以及运维复杂度激增等挑战,本文提出了一套软硬件协同的高性能网络基础设施解决方案。首先,在架构层面,本文利用光 Shuffle 技术构建扁平化的二级单轨网络,设计了星脉 3.0 网络架构,适配MoE All-to-All流量特征,显著提升了通信性能并降低了组网成本。其次,在通信软件层面,本文根据训练和推理中各个阶段的不同流量特点,分别进行针对性的 All-to-All 通信内核优化,利用以GPU为中心的下发技术及专家粒度的负载均衡技术,实现了适配训练与 Prefill 阶段的高带宽内核及适配 Decode 阶段的低时延内核,大幅降低了端到端时延。最后,在运维层面,本文利用AI Agent全面优化网络系统运维流程,实现了故障的主动预警与智能化交互诊断,保障了长周期训练的连续性与在线服务的高可用性。实验结果表明,该方案有效打破了 MoE 模型的通信墙,为万亿参数模型的大规模训练与在线服务提供了统一的高性能、高可靠系统底座。
With the evolution of large model architectures towards the sparse Mixture-of-Experts (MoE)
the proportion of communication overhead in end-to-end latency was observed to rise significantly in both training and inference scenarios
and communication performance gradually became a critical factor constraining system performance. To address challenges such as heavy All-to-All communication pressure
sensitivity to bandwidth and latency
and surging operational complexity in large-scale MoE training and inference scenarios
a high-performance network infrastructure solution based on hardware-software co-design was proposed in this paper. First
at the architecture level
the Astral 3.0 network architecture was designed by utilizing Optical Shuffle technology to construct a flattened two-layer single-rail network. This architecture was adapted to the All-to-All traffic characteristics of MoE
significantly improving communication performance and reducing networking costs. Second
at the communication software level
targeted All-to-All communication kernel optimizations were performed based on the distinct traffic characteristics of various stages in training and inference. By utilizing GPU-centric task dispatch technology and expert-granularity load balancing technology
high-bandwidth kernels adapted for training and Prefill stages
as well as low-latency kernels adapted for the Decode stage
were implemented
which drastically reduced end-to-end latency. Finally
at the operations level
network system operational workflows were comprehensively optimized using AI Agents
and proactive fault warning along with intelligent interactive diagnosis were achieved
ensuring the continuity of long-term training and the high availability of online services. Experimental results demonstrated that the communication wall in MoE models was effectively broken by this solution
providing a unified
high-performance
and highly reliable system foundation for the large-scale training and online service of trillion-parameter models.
国家数据局 . 国家数据局:国内多数模型训练使用中文数据占比超 60 %[EB/OL ] . ( 2025-08-19 )[ 2025-12-08 ] . https://www.gov.cn/lianbo/bumen/202508/content_7037033.htm https://www.gov.cn/lianbo/bumen/202508/content_7037033.htm .
Tech Investments . A Niche Winner in the AI Data Center [EB/OL ] . ( 2025-06-28 )[ 2025-12-08 ] . https://www.techinvestments.io/p/a-niche-winner-in-the-ai-data-center https://www.techinvestments.io/p/a-niche-winner-in-the-ai-data-center .
LIU Z , LIN Y , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision . 2021 : 10012 - 10022 .
YUAN J , GAO H , DAI D , et al . Native sparse attention: Hardware-aligned and natively trainable sparse attention [C ] // Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 2025 : 23078 - 23097 .
FEDUS W , ZOPH B , SHAZEER N . Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [J ] . Journal of Machine Learning Research , 2022 , 23 ( 120 ): 1 - 39 .
DU N , HUANG Y , DAI A M , et al . Glam: Efficient scaling of language models with mixture-of-experts [C ] . Proceedings of the International Conference on Machine Learning , 2022 : 5547 - 5569 .
LIU A , FENG B , WANG B , et al . Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model [J ] . arXiv preprint arXiv: 2405.04434 , 2024 .
YANG A , YANG B , HUI B , et al . Qwen2 technical report [J ] . arXiv preprint arXiv: 2407.10671 , 2024 .
JIANG A Q , SABLAYROLLES A , ROUX A , et al . Mixtral of experts [J ] . arXiv preprint arXiv: 2401.04088 , 2024 .
JIN Z , WANG S , ZHU J , et al . BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference [C ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2025 , 39 ( 17 ): 17689 - 17698 .
JIN P , ZHU B , YUAN L , et al . Moe++: Accelerating mixture-of-experts methods with zero-computation experts [J ] . arXiv preprint arXiv: 2410.07348 , 2024 .
HWANG C , CUI W , XIONG Y , et al . Tutel: Adaptive mixture-of-experts at scale [J ] . Proceedings of Machine Learning and Systems , 2023 , 5 : 269 - 287 .
Meng Q , Zheng H , Zhang Z , et al . Astral: A Datacenter Infrastructure for Large Language Model Training at Scale [C ] // Proceedings of the ACM SIGCOMM 2025 Conference . 2025 : 609 - 625 .
Jiang Z , Lin H , Zhong Y , et al . {MegaScale}: Scaling large language model training to more than 10,000 {GPUs} [C ] // 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 2024 : 745 - 760 .
Qian K , Xi Y , Cao J , et al . Alibaba hpn: A data center network for large language model training [C ] // Proceedings of the ACM SIGCOMM 2024 Conference . 2024 : 691 - 706 .
Gangidi A , Miao R , Zheng S , et al . Rdma over ethernet for distributed training at meta scale [C ] // Proceedings of the ACM SIGCOMM 2024 Conference . 2024 : 57 - 70 .
0
Views
0
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621