Swin Transformer轻量化：融合权重共享、蒸馏与剪枝的高效策略

韩博; 周顺; 范建华; 魏祥麟; 胡永杨; 朱艳萍

doi:10.11959/j.issn.1000-0801.2024209

您当前的位置：

首页 >

文章列表页 >

Swin Transformer轻量化：融合权重共享、蒸馏与剪枝的高效策略

研究与开发 | 更新时间：2024-11-25

- Swin Transformer轻量化：融合权重共享、蒸馏与剪枝的高效策略
- Swin Transformer lightweight: an efficient strategy that combines weight sharing, distillation and pruning
- 电信科学 2024年40卷第9期页码：66-74
- 作者机构：
  
  1.南京信息工程大学电子与信息工程学院，江苏南京 210044
  2.国防科技大学第六十三研究所，江苏南京 210007
- 作者简介：
  
  [ "韩博（2000- ），男，南京信息工程大学电子与信息工程学院硕士生，主要研究方向为深度学习和移动智能计算。" ]
  [ "周顺（1983- ），男，博士，国防科技大学第六十三研究所助理研究员，主要研究方向为电磁空间信道建模与仿真计算、智能模型推理加速。" ]
  [ "范建华（1971- ），男，国防科技大学第六十三研究所研究员、博士生导师，主要研究方向为软件无线电和频谱智能计算。" ]
  魏祥麟（1985- ），男，博士，国防科技大学第六十三研究所副研究员，主要研究方向为边缘计算、深度学习以及无线网络安全。
  朱艳萍（1980- ），女，博士，南京信息工程大学电子与信息工程学院副教授，主要研究方向为阵列信号处理、MIMO雷达/通信频谱共享、智能信号处理等。
- 基金信息：
- DOI：10.11959/j.issn.1000-0801.2024209
  中图分类号： TP183
- 收稿日期：2024-01-01，
  
  修回日期：2024-09-01，
  
  纸质出版日期：2024-09-20
- 稿件说明：
移动端阅览
韩博,周顺,范建华等.Swin Transformer轻量化：融合权重共享、蒸馏与剪枝的高效策略[J].电信科学,2024,40(09):66-74.

HAN Bo,ZHOU Shun,FAN Jianhua,et al.Swin Transformer lightweight: an efficient strategy that combines weight sharing, distillation and pruning[J].Telecommunications Science,2024,40(09):66-74.
韩博,周顺,范建华等.Swin Transformer轻量化：融合权重共享、蒸馏与剪枝的高效策略[J].电信科学,2024,40(09):66-74. DOI： 10.11959/j.issn.1000-0801.2024209.

HAN Bo,ZHOU Shun,FAN Jianhua,et al.Swin Transformer lightweight: an efficient strategy that combines weight sharing, distillation and pruning[J].Telecommunications Science,2024,40(09):66-74. DOI： 10.11959/j.issn.1000-0801.2024209.

摘要

偏移窗口的分层视觉转换器（Swin Transformer）因其优秀的模型能力而在计算机视觉领域引起了广泛的关注，然而Swin Transformer模型有着较高的计算复杂度，限制了其在计算资源有限设备上的适用性。为缓解该问题，提出一种融合权重共享及蒸馏的模型剪枝压缩方法。首先，在各层之间实现了权重共享，并添加变

换层实现权重变换以增加多样性。接下来，构建并分析变换块的参数依赖映射图，构建分组矩阵

记录所有参数之间的依赖关系，确定需要同时剪枝的参数。最后，蒸馏被用于恢复模型性能。在ImageNet-Tiny-200公开数据集上的试验表明，在模型计算复杂度减少32%的情况下，最低仅造成约3%的性能下降，有效降低了模型的计算复杂度。为实现在计算资源受限环境中部署高性能人工智能模型提供了一种解决方案。

Abstract

Swin Transformer

as a layered visual transformer with shifted windows

has attracted extensive attention in the field of computer vision due to its exceptional modeling capabilities. However

its high computational complexity limits its applicability on devices with constrained computational resources. To address this issue

a pruning compression method was proposed

integrating weight sharing and distillation. Initially

weight sharing was implemented across layers

and transformation layers were added to introduce weight transformation

thereby enhancing diversity. Subsequently

a parameter dependency mapping graph for the transformation blocks was constructed and analyzed

and a grouping matrix

was built to record the dependency relationships among all parameters and identify parameters for simultaneous pruning. Finally

distillation was then employed to restore the model’s performance. Experiments conducted on the ImageNet-Tiny-200 public dataset demonstrate that

with a reduction of 32% in model computational complexity

the proposed method only results in approximately a 3% performance degradation at minimum. It provides a solution for deploying high-performance artificial intelligence models in environments with limited computational resources.

关键词

Keywords

references

LIU Z , LIN Y , CAO Y , et al . Swin transformer: hierarchical vision transformer using shifted windows [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV) . Piscataway : IEEE Press , 2021 : 9992 - 10002 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [J ] . Advances in Neural Information Processing Systems , 2017 , 30 : 5998 - 6008 .

KOVALEVA O , ROMANOV A , ROGERS A , et al . Revealing the dark secrets of BERT [C ] // Proceedings of the Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing(EMNLP-IJCNLP) . [ S.l. : s.n. ] , 2019 : 4365 - 4374 .

FANG G , MA X , SONG M , et al . Depgraph: towards any structural pruning [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Piscataway : IEEE Press , 2023 : 16091 - 16101 .

ZHANG J , PENG H , WU K , et al . Minivit: compressing vision transformers with weight multiplexing [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Piscataway : IEEE Press , 2022 : 12145 - 12154 .

陈婉玉 . 面向视觉处理的Transformer模型压缩算法研究及实现 [D ] . 北京：北京邮电大学， 2024 .

CHEN W Y . Research and implementation of Transformer model compression algorithms for vision processing [D ] . Beijing : Beijing University of Posts and Telecommunications , 2024 .

蔡青竹 , 刘强 . 一种高效Swin Transformer加速器设计 [J ] . 北京航空航天大学学报， 2024 : 1 - 17 .

CAI Q Z , LIU Q . A design of an efficient Swin Transformer accelerator [J ] . Journal of Beijing University of Aeronautics and Astronautics , 2024 : 1 - 17 .

MA X , ZHANG P , ZHANG S , et al . "A tensorized transformer for language modeling" [C ] // Proceedings of the 33rd Conference on Neural Information Processing Systems(NeurIPS 2019) . San Diego : NeurIPS Press , 2019 : 2229 - 2239 .

BHANDARE A , SRIPATHI V , KARKADA D , et al . Efficient 8-bit quantization of transformer neural machine language translation model [C ] // Proceedings of the International Conference on Machine Learning(ICML) . [ S.l. : s.n. ] , 2019 .

MICHEL P , LEVY O , NEUBIG G . Are sixteen heads really better than one? [J ] . Advances in Neural Information Processing Systems , 2019 ( 32 ): 14037 – 14047 .

VOITA E , TALBOT D , MOISEEV F , et al . "Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned" [C ] // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics(ACL) . [ S.l. : s.n. ] , 2019 : 5797 – 5808 .

FAN A , GRAVE E , JOULIN A . Reducing transformer depth on demand with structured dropout [C ] // Proceedings of the International Conference on Learning Representations(ICLR) . [ S.l. : s.n. ] , 2020 : 1 - 15 .

SANH V , DEBUT L , CHAUMOND J , et al . " DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter "[J ] . arXiv Preprint , arXiv: 1910 .01108, 2019 .

TANG R , LU Y , LIU L , et al . Distilling task-specific knowledge from BERT into simple neural networks [J ] . CoRR , 2019 .

LECUN Y . Generalization and network design strategies [J ] . Connectionism in Perspective , 1989 , 19 ( 143-155 ): 18 .

LAN Z , CHEN M , GOODMAN S , et al . Albert: a lite bert for self-supervised learning of language representations [C ] // Proceedings of the International Conference on Learning Representations(ICLR) . [ S.l. : s.n. ] , 2020 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据