基于大语言模型和RAG的自动化渗透测试框架研究

江颉; 蔡辰旭; 李明达; 朱添田

doi:10.11959/j.issn.1000-0801.2025159

您当前的位置：

首页 >

文章列表页 >

基于大语言模型和RAG的自动化渗透测试框架研究

研究与开发 | 更新时间：2025-11-16

- 基于大语言模型和RAG的自动化渗透测试框架研究
- Research on an automated penetration testing framework based on LLM and RAG
- 电信科学 2025年41卷第9期页码：119-132
- 作者机构：
  
  1.浙江工业大学计算机科学与技术学院，浙江杭州 310023
  2.浙江工业大学台州研究院，浙江台州 318001
- 作者简介：
  
  [ "江颉（1972- ），女，博士，浙江工业大学计算机科学与技术学院教授，主要研究方向为网络安全。" ]
  [ "蔡辰旭（2000- ），男，浙江工业大学计算机科学与技术学院硕士生，主要研究方向为网络安全与大语言模型。" ]
  [ "李明达（1998- ），男，浙江工业大学计算机科学与技术学院博士生，主要研究方向为网络安全。" ]
  [ "朱添田（1992- ），男，浙江工业大学计算机科学与技术学院副教授，主要研究方向为网络安全与网络攻防。" ]
- 基金信息：
  
  国家自然科学基金青年项目(62002324);国家自然科学基金重点项目(U22B2028);浙江省属高校基本科研业务费专项资金资助项目(RF-A2023009)
- DOI：10.11959/j.issn.1000-0801.2025159
  中图分类号： TP393
- 收稿：2025-01-06，
  
  修回：2025-06-16，
  
  录用：2025-06-17，
  
  纸质出版：2025-09-20
- 稿件说明：
移动端阅览
江颉,蔡辰旭,李明达等.基于大语言模型和RAG的自动化渗透测试框架研究[J].电信科学,2025,41(09):119-132.

JIANG Jie,CAI Chenxu,LI Mingda,et al.Research on an automated penetration testing framework based on LLM and RAG[J].Telecommunications Science,2025,41(09):119-132.
江颉,蔡辰旭,李明达等.基于大语言模型和RAG的自动化渗透测试框架研究[J].电信科学,2025,41(09):119-132. DOI： 10.11959/j.issn.1000-0801.2025159.

JIANG Jie,CAI Chenxu,LI Mingda,et al.Research on an automated penetration testing framework based on LLM and RAG[J].Telecommunications Science,2025,41(09):119-132. DOI： 10.11959/j.issn.1000-0801.2025159.

摘要

随着网络威胁的日益严峻，自动化渗透测试逐渐成为网络安全领域的研究热点。现有研究已初步探索了基于大语言模型实现自动化渗透测试的可行性，但在流程连续性和生成相关性方面仍有不足。对此，提出了一种基于多智能体协同的自动化渗透测试框架Pentest-Chain，通过分工协作的多个智能体来完成渗透测试的各个流程任务。为解决生成相关性问题，引入检索增强生成（retrieval-augmented generation，RAG）技术，利用外部知识库和内部经验库来提升智能体生成结果的准确性和可靠性。实验结果表明，相比单一智能体，多智能体框架Pentest-Chain的任务执行成功率整体提升了17.0%。进一步的消融实验表明，在多智能体框架中引入RAG模块对任务执行成功率的提升起到了关键作用，且显著优化了任务执行过程中的生成相关性和准确性。

Abstract

With the increasing severity of network threats

automated penetration testing has become a research focus in the field of cybersecurity. Existing studies had preliminarily explored the feasibility of leveraging large language model(LLM) for automated penetration testing but still face challenges in process continuity and generation relevance. To address these issues

a multi-agent collaborative automated penetration testing framework named Pentest-Chain was proposed

where specialized agents worked cooperatively to complete different phases of penetration testing tasks. To enhance generation relevance

retrieval-augmented generation (RAG) technology was introduced

leveraging both external knowledge bases and internal experience repositories to improve the accuracy and reliability of the agents' outputs. Experimental results demonstrated that the Pentest-Chain framework achieved a 17.0% overall improvement in task success rate compared to single-agent approaches. Further ablation studies confirmed that the integration of the RAG module played a critical role in boosting task success rates while significantly optimizing generation relevance and accuracy during task execution.

关键词

Keywords

references

STEFINKO Y , PISKOZUB A , BANAKH R . Manual and automated penetration testing. Benefits and drawbacks. Modern tendency [C ] // Proceedings of the 2016 13th International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science (TCSET) . Piscataway : IEEE Press , 2016 : 488 - 491 .

HU Z G , BEURAN R , TAN Y S . Automated penetration testing using deep reinforcement learning [C ] // Proceedings of the 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) . Piscataway : IEEE Press , 2020 : 2 - 10 .

臧艺超 , 周天阳 , 朱俊虎 , 等 . 领域独立智能规划技术及其面向自动化渗透测试的攻击路径发现研究进展 [J ] . 电子与信息学报 , 2020 , 42 ( 9 ): 2095 - 2107 .

ZANG Y C , ZHOU T Y , ZHU J H , et al . Domain-independent intelligent planning technology and its application to automated penetration testing oriented attack path discovery [J ] . Journal of Electronics & Information Technology , 2020 , 42 ( 9 ): 2095 - 2107 .

ROY S S , THOTA P , NARAGAM K V , et al . From chatbots to phishbots? : phishing scam generation in commercial large language models [C ] // Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP) . Piscataway : IEEE Press , 2024 : 36 - 54 .

GUPTA M , AKIRI C , ARYAL K , et al . From ChatGPT to ThreatGPT: impact of generative AI in cybersecurity and privacy [J ] . IEEE Access , 2023 , 11 : 80218 - 80245 .

DENG G L , LIU Y , MAYORAL-VILCHES V , et al . PentestGPT: evaluating and harnessing large language models for automated penetration testing [C ] // Proceedings of the 33rd USENIX Conference on Security Symposium . Berkeley : USENIX Association , 2024 : 847 - 864 .

BUBECK S , CHANDRASEKARAN V , ELDAN R , et al . Sparks of artificial general intelligence: early experiments with GPT-4 [J ] . arXiv preprint , 2023 , arXiv: 2303.12712 .

JI Z W , LEE N , FRIESKE R , et al . Survey of hallucination in natural language generation [J ] . ACM Computing Surveys , 2023 , 55 ( 12 ): 1 - 38 .

LEWIS P , PEREZ E , PIKTUS A , et al . Retrieval-augmented generation for knowledge-intensive NLP tasks [C ] // Proceedings of the 34th International Conference on Neural Information Processing System . New York : Curran Associates , 2020 : 9459 - 9474 .

CHEN F , REN W . On the control of multi-agent systems: a survey [J ] . Foundations and Trends® in Systems and Control , 2019 , 6 ( 4 ): 339 - 499 .

STROM B E , APPLEBAUM A , MILLER D P , et al . Mitre att&ck: design and philosophy [R ] . 2018 .

WALTERMIRE D , SCARFONE K . Guide to using vulnerability naming schemes: Special Publication (NIST SP) -800-51 Rev 1 [S ] . 2011 .

ZHAO A , HUANG D , XU Q , et al . ExpeL: LLM agents are experiential learners [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 17 ): 19632 - 19642 .

MCDERMOTT J P . Attack net penetration testing [C ] // Proceedings of the 2000 Workshop on New Security Paradigms . New York : ACM Press , 2001 : 15 - 21 .

陈可 , 鲁辉 , 方滨兴 , 等 . 自动化渗透测试技术研究综述 [J ] . 软件学报 , 2024 , 35 ( 5 ): 2268 - 2288 .

CHEN K , LU H , FANG B X , et al . Survey on automated penetration testing technology research [J ] . Journal of Software , 2024 , 35 ( 5 ): 2268 - 2288 .

HOANG L V , NHU N X , NGHIA T T , et al . Leveraging deep reinforcement learning for automating penetration testing in reconnaissance and exploitation phase [C ] // Proceedings of the 2022 RIVF International Conference on Computing and Communication Technologies (RIVF) . Piscataway : IEEE Press , 2022 : 41 - 46 .

ZHANG K Q , YANG Z R , BAŞAR T . Multi-agent reinforcement learning: a selective overview of theories and algorithms [M ] // Handbook of Reinforcement Learning and Control . Cham : Springer , 2021 : 321 - 384 .

GHANEM M C , CHEN T M . Reinforcement learning for efficient network penetration testing [J ] . Information , 2020 , 11 ( 1 ): 6 .

ZENNARO F M , ERDŐDI L . Modelling penetration testing with reinforcement learning using capture-the-flag challenges: Trade-offs between model-free learning and a priori knowledge [J ] . IET Information Security , 2023 , 17 ( 3 ): 441 - 457 .

ACHIAM J , ADLER S , AGARWAL S , et al . GPT-4 technical report [J ] . arXiv preprint , 2023 , arXiv: 2303.08774 .

CHUNG H W , HOU L , LONGPRE S , et al . Scaling instruction-finetuned language models [J ] . Journal of Machine Learning Research , 2024 , 25 ( 70 ): 1 - 53 .

BHARGAVA P , NG V . Commonsense knowledge reasoning and generation with pre-trained language models: a survey [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 11 ): 12317 - 12325 .

SHEN X M , WANG L Z , LI Z Y , et al . PentestAgent: incorporating LLM agents to automated penetration testing [J ] . arXiv preprint , 2024 , arXiv: 2411.05185 .

XU J C , STOKES J W , MCDONALD G , et al . AutoAttacker: a large language model guided system to implement automatic cyber-attacks [J ] . arXiv preprint , 2024 , arXiv: 2403.01038 .

WU Q Y , BANSAL G , ZHANG J Y , et al . AutoGen: enabling next-gen LLM applications via multi-agent conversation [J ] . arXiv preprint , 2023 , arXiv: 2308.08155 .

DEVLIN J , CHANG M-W , LEE K , et al . BERT: pre-training of deep bidirectional transformers for language understanding [J ] . arXiv preprint , 2018 , arXiv: 1810.04805 .

REIMERS N , GUREVYCH I . Sentence-BERT: sentence embeddings using Siamese BERT-networks [J ] . arXiv preprint , 2019 , arXiv: 1908.10084 .

KARPUKHIN V , OĞUZ B , MIN S , et al . Dense passage retrieval for open-domain question answering [J ] . arXiv preprint , 2020 , arXiv: 2004.04906 .

ZIEGLER D M , STIENNON N , WU J , et al . Fine-tuning language models from human preferences [J ] . arXiv preprint , 2019 , arXiv: 1909.08593 .

HU E J , SHEN Y , WALLIS P , et al . Lora: low-rank adaptation of large language models [J ] . arXiv preprint , 2021 , arXiv: 2106.09685 .

MALKOV Y A , YASHUNIN D A . Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 4 ): 824 - 836 .

浏览量

116

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

绿色AI效率评价模型的构建与应用

智能网联汽车网络安全浅析

人工智能赋能网络攻防平台的路径与进展