一种基于Kepler架构GPU的通信仿真加速方法

韩秉君; 黄诗铭; 杜滢

doi:10.11959/j.issn.1000-0801.2015248

您当前的位置：

首页 >

文章列表页 >

一种基于Kepler架构GPU的通信仿真加速方法

研究与开发 | 更新时间：2024-06-05

- 一种基于Kepler架构GPU的通信仿真加速方法
- A Simulation Accelerating Method Based on CUDA with Kepler GPU
- 电信科学 2015年31卷第10期页码：82-88
- 作者机构：
  
  1. 中国信息通信研究院北京 100191
  2. 北京邮电大学北京 100876
- 作者简介：
  
  [ "韩秉君，男，博士，中国信息通信研究院标准工程师，主要研究方向为移动通信领域系统级仿真平台构建，在干扰共存、系统级仿真平台加速技术等方面有较深积累。" ]
  [ "黄诗铭，男，北京邮电大学硕士生，主要研究方向为通信系统级仿真平台构建，对GPU加速、通信模型并行化处理方面有较深积累。" ]
  [ "杜滢，女，中国信息通信研究院高级工程师，主要从事无线通信技术研究、标准化和评估工作。作为主要成员参与3GPP LTE、LTE-Advanced 技术研究和国际标准化工作，目前负责LTE R13 国际标准化制定和5G国际标准化预研工作。" ]
- 基金信息：
  
  国家科技重大专项基金资助项目;The National Science and Technology Major Project(2014ZX03003011-003)
- DOI：10.11959/j.issn.1000-0801.2015248
  中图分类号：
- 网络出版日期：2015-10，
  
  纸质出版日期：2015-10-20
- 稿件说明：
移动端阅览
韩秉君, 黄诗铭, 杜滢. 一种基于Kepler架构GPU的通信仿真加速方法[J]. 电信科学, 2015,31(10):82-88.

Bingjun Han, Shiming Huang, Ying Du. A Simulation Accelerating Method Based on CUDA with Kepler GPU[J]. Telecommunications science, 2015, 31(10): 82-88.
韩秉君, 黄诗铭, 杜滢. 一种基于Kepler架构GPU的通信仿真加速方法[J]. 电信科学, 2015,31(10):82-88. DOI： 10.11959/j.issn.1000-0801.2015248.

Bingjun Han, Shiming Huang, Ying Du. A Simulation Accelerating Method Based on CUDA with Kepler GPU[J]. Telecommunications science, 2015, 31(10): 82-88. DOI： 10.11959/j.issn.1000-0801.2015248.

摘要

提出了一种在 Kepler 架构 GPU（graphics processing unit，图形处理器）上利用 CUDA（compute unified device architecture，统一计算设备架构）技术加速通信仿真中DFT（discrete Fourier transform，离散傅里叶变换）处理过程的方法。该方法的核心思想是利用线程级并行技术实现单条收发链路内部DFT运算的并行加速，并利用动态并行和Hyper-Q技术实现不同收发用户对之间链路处理过程的并行加速，从而最终达到加速仿真中DFT处理过程的目的。实验结果表明，相对单核单线程CPU程序和上一代Fermi架构GPU程序，该方法分别能够将DFT处理速度提升300倍和3倍，具有较好的加速效果。

Abstract

An accelerating method based on CUDA（compute unified device architecture）with Kepler GPU（graphics processing unit）was proposed to speed up the DFT（discrete Fourier transform）processing in the communication simulation platform.Based on this method，the whole DFT processing was split into subtasks named molecular-subtasks corresponding to communication links and a molecular-subtask was further split into smaller parallel subtasks named atomic-subtasks which correspond to the DFT processing in a link.Then，the atomic-subtasks were processed in parallel by the threads in a GPU kernel function，as well as the molecular-subtasks were processed in parallel via several GPU kernel functions to shorter the simulation time.Simulation results show this method can speed up the DFT processing more than 300 times compared with single thread CPU program and 3 times compared with traditional GPU program.

关键词

Keywords

references

NVIDIA Corporation CUDA toolkit documentation v7.5 . http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf ， 2015

NVIDIA Corporation Nvidia kepler GK110 next-generation CUDA compute architecture . http://www.nvidia.com/content/PDF/kepler/NV_DS_Tesla_KCompute_Arch_May_2012_LR.pdf http://www.nvidia.com/content/PDF/kepler/NV_DS_Tesla_KCompute_Arch_May_2012_LR.pdf ， 2012

Abdelrazek A F ， Kaschub M ， Blankenhorn C ， et al . A novel architecture using NVIDIA CUDA to speed up simulation of multi-path fast fading channels . Proceedings of the 69th IEEE Vehicular Technology Conference ， Barcelona，Spain ， 2009

Laguna-Sanchez G A ， Prieto-Guerrero A ， Rodriguez-Colina E . Speedup simulation for OFDM over PLC channel using a multithreading GPU . Proceedings of IEEE Latin-American Conference on Communications （LATINCOM）， Belem，Brazil ， 2011

Potluri S ， Wang H ， Bureddy D ， et al . Optimizing MPI communication on multi-GPU systems using CUDA inter-process communication . Proceedings of the 26th IEEE International on Parallel and Distributed Processing Symposium Workshops ＆ phD Forum（IPDPSW）， Shanghai，China ， 2012 ： 1848 ～ 1857

Wu J ， JaJa J ， Balaras E . An optimized FFT-based direct Poisson solver on CUDA GPUs . IEEE Transactions on Parallel and Distributed Systems ， 2014 （ 1 ）： 550 ～ 559

Beermann M ， Monro E ， Schmalen H ， et al . High speed decoding of non-binary irregular LDPC codes using GPUs . Proceedings of IEEE Workshop on Signal Processing System （SiPS）， Taipei，China ， 2013

Rodriguez A ， Valverde J ， Torre E ， et al . Dynamic management of multikernel multithread accelerators using dynamic partial reconfiguration . Proceedings of the 9th International Symposium on Reconfigurable and Communication-Cenric Systems-on-Chip （ReCoSoC）， Montpellier，France ， 2014

Proakis J G . Digital Signal Processing ， 4th Revised Edition London：Pearson Prentice Hall ， 2009 ： 105 ～ 129

Noga A ， Topa T . Kernel execution strategies for GPU-accelerated version of method of moments . Proceedings of the 20th International Conference on Microwaves，Radar，and Wireless Communication（MIKON）， Gdansk，Poland ， 2014

Wilt N ， The CUDA Handbook . Upper Saddle River：Addison-Wesley ， 2013

Bilel B R ， Navid N . Cunetsim：a GPU based simulation testbed for large scale mobile networks . Proceedings of International Conference on Communications and Information Technology （ICCIT）， Hammamet，Tunisia ， 2012

浏览量

572

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据