浙江工商大学,浙江 杭州 310018
[ "诸葛斌(1976-),男,博士,浙江工商大学信息与电子工程学院(萨塞克斯人工智能学院)教授,主要研究方向为网络与通信技术、互联网技术和网络安全。" ]
[ "洪仕玉(2002-),女,硕士研究生,浙江工商大学信息与电子工程学院(萨塞克斯人工智能学院),主要研究方向为数据资源调度、智慧网络。" ]
[ "许云汉(2000-),男,硕士研究生,浙江工商大学信息与电子工程学院(萨塞克斯人工智能学院),主要研究方向为数据资源调度、智慧网络。" ]
[ "张子天(1988-),男,博士,浙江工商大学讲师。其主要研究方向为基于人工智能的网络流量预测与资源管理。" ]
[ "董黎刚(1973-),男,博士,现任浙江工商大学信息与电子工程学院院长、教授及硕士生导师,同时担任中国电子学会高级会员、浙江计算机学会主任。其主要研究方向聚焦智能网络与基于大数据及深度学习的智能教育领域。" ]
[ "蒋献(1988−),男,浙江兰溪人,浙江工商大学讲师,主要研究方向为智慧网络与智慧教育。" ]
收稿:2026-01-02,
修回:2026-04-21,
录用:2026-05-18,
移动端阅览
诸葛斌, 洪仕玉, 许云汉, 等. 基于LLM语义指导与主动推理的云边协同资源调度优化[J/OL]. 电信科学, 2026.
ZHUGE Bin, HONG Shiyu, Xu Yunhan, et al. An Optimization Framework for Cloud-Edge Collaborative Resource Scheduling Based on LLM Semantic Guidance and Active Inference[J/OL]. Telecommunications Science, 2026.
诸葛斌, 洪仕玉, 许云汉, 等. 基于LLM语义指导与主动推理的云边协同资源调度优化[J/OL]. 电信科学, 2026. DOI: 10.11959/j.issn.1000-0801.DXKX260002.
ZHUGE Bin, HONG Shiyu, Xu Yunhan, et al. An Optimization Framework for Cloud-Edge Collaborative Resource Scheduling Based on LLM Semantic Guidance and Active Inference[J/OL]. Telecommunications Science, 2026. DOI: 10.11959/j.issn.1000-0801.DXKX260002.
针对云边协同环境下LLM推理面临的实时性与动态不确定性挑战,本文提出一种LLM与主动推理协同的决策框架AIF-LLM。该框架利用Sentence Transformer将宏观语义指导量化为策略偏好向量,并将其融入预期自由能计算中,实现高层指导下的精确决策;设计元学习模块,根据环境不确定性动态调整语义偏好权重
<math id="M1"><mi>λ</mi></math>
,以平衡宏观指导与主动推理精确成本模型;同时利用LLM的语义理解能力生成初始信念先验,显著优化冷启动性能和样本效率。实验表明,AIF-LLM的QoS满意度达92.57%,相较于主流SAC、PPO、DQN和A2C算法分别实现2.90、4.90、8.00和10.00个百分点的绝对提升。在系统极限负载区间,其将QoS违约风险和长尾失败率分别削减了28.07%和47.67%,验证了框架在复杂环境下卓越的鲁棒性与自适应性。
To address the challenges of real-time performance and dynamic uncertainty in Large Language Model (LLM) inference within cloud-edge collaborative environments
this paper proposes a synergistic decision-making framework combining LLMs and Active Inference (AIF-LLM). The framework utilizes Sentence Transformer to quantify macroscopic semantic guidance into policy preference vectors
which are then integrated into the calculation of Expected Free Energy to achieve precise decision-making under high-level guidance. A meta-learning module is designed to dynamically adjust the semantic preference weight $\lambda$ based on environmental uncertainty
thereby balancing macroscopic guidance with the precise cost modeling of Active Inference. Simultaneously
the framework leverages the semantic understanding capabilities of the LLM to generate initial belief priors
significantly optimizing cold-start performance and sample efficiency. Simulation results indicate that AIF-LLM reaches a Quality of Service (QoS) satisfaction rate of 92.57%
achieving absolute improvements of 2.90
4.90
8.00
and 10.00 percentage points compared to mainstream SAC
PPO
DQN
and A2C algorithms
respectively. In system limit load scenarios
the framework successfully reduces the QoS violation risk and long-tail failure rate by 28.07% and 47.67%
respectively
validating the framework's superior robustness and adaptability in complex environments.
J . Sha o et al ., " A Survey on Large Language Models for Edge Computing ," IEEE Open Journal of the Computer Society , vol. 5 , pp. 162 - 180 , 2024 .
K. Friston , “ The free-energy principle: a unified brain theory? ,” Nature Reviews Neuroscience , vol. 11 , no. 2 , pp. 127 – 138 , 2010 .
L . U . Khan et al ., "Edge Computing for Large Language Models : A Survey," arXiv preprint arXiv: 2402.07914 , 2024 .
C . Canel et al., "Scaling Large Language Model Inference with Split Computing," in Proceedings of the 29th Annual International Conference on Mobile Computing and Networking (MobiCom) , 2023 , pp. 1 - 16 .
Z . Yang et al . , " PerLLM : Personalized Inference Scheduling with Edge-Cloud Collaboration for DiverseLLMServices," arXiv preprint arXiv: 2405.14636 , 2024 .
L. Huang , S. Bi , and Y.-J . A. Zhang , " Deep Reinforcement Learning for Online Computation Offloading in Mobile Edge Computing ," IEEE Transactions on Mobile Computing , vol. 19 , no. 11 , pp. 2581 - 2593 , 2020 .
T. Parr , G. Pezzulo , and K. J. Friston , Active Inference : The Free Energy Principle in Mind , Brain , and Behavior . Cambridge , MA, USA : MIT Press , 2022 .
C. Pezzato , R. Ferrari , and C. H. Corbato , " Active Inference for Safe and Robust Control of Robot Manipulators ," IEEE Transactions on Robotics , vol. 39 , no. 6 , pp. 4589 - 4606 , 2023 .
高勇 , 陆钱春 , 李锋 . 面向 IP 网络扩容应用的复杂网络流量预测方法 [J ] . 电信科学 , 2023 , 39 ( 9 ): 21 - 31 .
C . Yang et al., "Large Language Models as Optimizers," arXiv preprint arXiv: 2309.03409 , 2023 .
M . Hynes et al . , " Large Language Models for Software Engineering : A Survey," arXiv preprint arXiv: 2307.03493 , 2023 .
H. B. Sriyananda et al ., " Active Inference for Communication-Efficient Federated Learning ," IEEE Transactions on Communications , vol. 70 , no. 11 , pp. 7288 - 7303 , 2022 .
X . X u et al ., " A Multi-Objective Optimization Approach for Task Offloading in Mobile Edge Computing ," IEEE Transactions on Mobile Computing , vol. 20 , no. 3 , pp. 1234 - 1247 , 2021 .
H . W u et al ., " Energy-Efficient Resource Allocation for Mobile Edge Computing: A Survey ," Future Generation Computer Systems , vol. 100 , pp. 523 - 538 , 2019 .
Y. Bengio , P. Simard , a nd P. Frasconi , " Learning long-term dependencies with gradient descent is difficult ," IEEE Transactions on Neural Networks , vol. 5 , no. 2 , pp. 157 - 166 , 1994 .
G. Qu , H. Wu , R. Li , and P. Jiao , “ DMRO: A deep meta reinforcement learning-based task offloading framework for edge-cloud computing ,” IEEE Transactions on Network and Service Management , 2021.
Y . Gong et al . , " Edge-Cloud Collaborative Inference for Large Language Models : A Survey," arXiv preprint arXiv: 2312.14845 , 2023 .
Y. Mao , C. You , J. Zhang , K. Huang , and K. B. Letaief , " A Survey on Mobile Edge Computing: The Communication Perspective ," IEEE Communications Surveys & Tutorials , vol. 19 , no. 4 , pp. 2322 - 2358 , 2017 .
Z . Zhou et al ., " Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing ," Proceedings of the IEEE , vol. 107 , no. 8 , pp. 1738 - 1762 , 2019 .
D. Chen , Y. He , F. R. Yu , and B. He , " Edge Computing Resources Reservation in Vehicular Networks: A Meta-Learning Approach ," IEEE Transactions on Vehicular Technology , vol. 69 , no. 12 , pp. 15730 - 15743 , 2020 .
J . B i et al ., " A Survey on Task Offloading in Mobile Edge Computing ," Journal of Grid Computing , vol. 20 , no. 3 , p. 27 , 2022 .
王晓蓉 , 魏鹏 , 孙罡 . 算力网络中基于强化学习的任务调度策略 [J ] . 电信科学 , 2022 , 38 ( 4 ): 23 - 32 .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621