浏览全部资源
扫码关注微信
1.南京信息工程大学电子与信息工程学院,江苏 南京 210044
2.国防科技大学第六十三研究所,江苏 南京 210007
[ "韩博(2000- ),男,南京信息工程大学电子与信息工程学院硕士生,主要研究方向为深度学习和移动智能计算。" ]
[ "周顺(1983- ),男,博士,国防科技大学第六十三研究所助理研究员,主要研究方向为电磁空间信道建模与仿真计算、智能模型推理加速。" ]
[ "范建华(1971- ),男,国防科技大学第六十三研究所研究员、博士生导师,主要研究方向为软件无线电和频谱智能计算。" ]
魏祥麟(1985- ),男,博士,国防科技大学第六十三研究所副研究员,主要研究方向为边缘计算、深度学习以及无线网络安全。
朱艳萍(1980- ),女,博士,南京信息工程大学电子与信息工程学院副教授,主要研究方向为阵列信号处理、MIMO雷达/通信频谱共享、智能信号处理等。
收稿日期:2024-01-01,
修回日期:2024-09-01,
纸质出版日期:2024-09-20
移动端阅览
韩博,周顺,范建华等.Swin Transformer轻量化:融合权重共享、蒸馏与剪枝的高效策略[J].电信科学,2024,40(09):66-74.
HAN Bo,ZHOU Shun,FAN Jianhua,et al.Swin Transformer lightweight: an efficient strategy that combines weight sharing, distillation and pruning[J].Telecommunications Science,2024,40(09):66-74.
韩博,周顺,范建华等.Swin Transformer轻量化:融合权重共享、蒸馏与剪枝的高效策略[J].电信科学,2024,40(09):66-74. DOI: 10.11959/j.issn.1000-0801.2024209.
HAN Bo,ZHOU Shun,FAN Jianhua,et al.Swin Transformer lightweight: an efficient strategy that combines weight sharing, distillation and pruning[J].Telecommunications Science,2024,40(09):66-74. DOI: 10.11959/j.issn.1000-0801.2024209.
偏移窗口的分层视觉转换器(Swin Transformer)因其优秀的模型能力而在计算机视觉领域引起了广泛的关注,然而Swin Transformer模型有着较高的计算复杂度,限制了其在计算资源有限设备上的适用性。为缓解该问题,提出一种融合权重共享及蒸馏的模型剪枝压缩方法。首先,在各层之间实现了权重共享,并添加变
换层实现权重变换以增加多样性。接下来,构建并分析变换块的参数依赖映射图,构建分组矩阵
F
记录所有参数之间的依赖关系,确定需要同时剪枝的参数。最后,蒸馏被用于恢复模型性能。在ImageNet-Tiny-200公开数据集上的试验表明,在模型计算复杂度减少32%的情况下,最低仅造成约3%的性能下降,有效降低了模型的计算复杂度。为实现在计算资源受限环境中部署高性能人工智能模型提供了一种解决方案。
Swin Transformer
as a layered visual transformer with shifted windows
has attracted extensive attention in the field of computer vision due to its exceptional modeling capabilities. However
its high computational complexity limits its applicability on devices with constrained computational resources. To address this issue
a pruning compression method was proposed
integrating weight sharing and distillation. Initially
weight sharing was implemented across layers
and transformation layers were added to introduce weight transformation
thereby enhancing diversity. Subsequently
a parameter dependency mapping graph for the transformation blocks was constructed and analyzed
and a grouping matrix
F
was built to record the dependency relationships among all parameters and identify parameters for simultaneous pruning. Finally
distillation was then employed to restore the model’s performance. Experiments conducted on the ImageNet-Tiny-200 public dataset demonstrate that
with a reduction of 32% in model computational complexity
the proposed method only results in approximately a 3% performance degradation at minimum. It provides a solution for deploying high-performance artificial intelligence models in environments with limited computational resources.
LIU Z , LIN Y , CAO Y , et al . Swin transformer: hierarchical vision transformer using shifted windows [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV) . Piscataway : IEEE Press , 2021 : 9992 - 10002 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [J ] . Advances in Neural Information Processing Systems , 2017 , 30 : 5998 - 6008 .
KOVALEVA O , ROMANOV A , ROGERS A , et al . Revealing the dark secrets of BERT [C ] // Proceedings of the Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing(EMNLP-IJCNLP) . [ S.l. : s.n. ] , 2019 : 4365 - 4374 .
FANG G , MA X , SONG M , et al . Depgraph: towards any structural pruning [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Piscataway : IEEE Press , 2023 : 16091 - 16101 .
ZHANG J , PENG H , WU K , et al . Minivit: compressing vision transformers with weight multiplexing [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Piscataway : IEEE Press , 2022 : 12145 - 12154 .
陈婉玉 . 面向视觉处理的Transformer模型压缩算法研究及实现 [D ] . 北京 : 北京邮电大学 , 2024 .
CHEN W Y . Research and implementation of Transformer model compression algorithms for vision processing [D ] . Beijing : Beijing University of Posts and Telecommunications , 2024 .
蔡青竹 , 刘强 . 一种高效Swin Transformer加速器设计 [J ] . 北京航空航天大学学报 , 2024 : 1 - 17 .
CAI Q Z , LIU Q . A design of an efficient Swin Transformer accelerator [J ] . Journal of Beijing University of Aeronautics and Astronautics , 2024 : 1 - 17 .
MA X , ZHANG P , ZHANG S , et al . "A tensorized transformer for language modeling" [C ] // Proceedings of the 33rd Conference on Neural Information Processing Systems(NeurIPS 2019) . San Diego : NeurIPS Press , 2019 : 2229 - 2239 .
BHANDARE A , SRIPATHI V , KARKADA D , et al . Efficient 8-bit quantization of transformer neural machine language translation model [C ] // Proceedings of the International Conference on Machine Learning(ICML) . [ S.l. : s.n. ] , 2019 .
MICHEL P , LEVY O , NEUBIG G . Are sixteen heads really better than one? [J ] . Advances in Neural Information Processing Systems , 2019 ( 32 ): 14037 – 14047 .
VOITA E , TALBOT D , MOISEEV F , et al . "Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned" [C ] // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics(ACL) . [ S.l. : s.n. ] , 2019 : 5797 – 5808 .
FAN A , GRAVE E , JOULIN A . Reducing transformer depth on demand with structured dropout [C ] // Proceedings of the International Conference on Learning Representations(ICLR) . [ S.l. : s.n. ] , 2020 : 1 - 15 .
SANH V , DEBUT L , CHAUMOND J , et al . " DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter "[J ] . arXiv Preprint , arXiv: 1910 .01108, 2019 .
TANG R , LU Y , LIU L , et al . Distilling task-specific knowledge from BERT into simple neural networks [J ] . CoRR , 2019 .
LECUN Y . Generalization and network design strategies [J ] . Connectionism in Perspective , 1989 , 19 ( 143-155 ): 18 .
LAN Z , CHEN M , GOODMAN S , et al . Albert: a lite bert for self-supervised learning of language representations [C ] // Proceedings of the International Conference on Learning Representations(ICLR) . [ S.l. : s.n. ] , 2020 .
0
浏览量
8
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构