SU‐E‐J‐02: GPU‐Accelerated Polyenergetic DRR Generation Based On Data Parallelism and Task Parallelism with a Dispatcher Using OpenCL: Effect of the Numbers of Tasks and Energies | Zendy

Zhou L | Zendy; Chao K | Zendy; Chang J | Zendy

Premium

SU‐E‐J‐02: GPU‐Accelerated Polyenergetic DRR Generation Based On Data Parallelism and Task Parallelism with a Dispatcher Using OpenCL: Effect of the Numbers of Tasks and Energies

Author(s) -

Zhou L,

Chao K,

Chang J

Publication year - 2013

Publication title -

medical physics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.473

H-Index - 180

eISSN - 2473-4209

pISSN - 0094-2405

DOI - 10.1118/1.4814214

Subject(s) - computer science , parallel computing , task parallelism , data parallelism , graphics processing unit , general purpose computing on graphics processing units , partition (number theory) , speedup , multi core processor , central processing unit , energy consumption , computational science , graphics , parallelism (grammar) , computer hardware , computer graphics (images) , mathematics , ecology , combinatorics , biology

Purpose: To improve the performance of parallel processing of multi‐energetic digitally reconstructed radiograph (DRR) generation using a task‐overlap strategy on heterogeneous platforms. Methods: A segmented 512 ×512 ×223 head‐neck phantom was used to generate 512 ×512 polyenergetic DRR based on Mohan4 and Mohan6 spectrums containing 16 and 24 energy bins, respectively. The DRR formation for each energy bin comprises three steps: (1) phantom conversion, (2) line integral and (3) exponential and weighting of projection. The parallel computing ecosystem consisted of one 8‐core CPU and one general purpose graphics processing unit (GPGPU). We used Open Computing Language (OpenCL) to decompose the low‐degree parallel and serial workloads into multiple tasks on CPU using task parallelism, and partition the high‐degree parallel workloads on GPGPU using data parallelism. Two sequential task partitions for the first DRR formation were tested: (A) Step1 as Task1 on CPU, Step2 as Task2 on GPU and Step3 as Task3 on CPU; and (B) Step1 as Task1 on CPU and Step2 and Step3 as Task2 on GPU. The subsequent DRR generation does not need Step 1 so Step1 as Task1 was excluded in each partition. A task‐overlap method driven by a dispatcher was also implemented using regular single‐threaded host program to further improve the performance. Results: For the first DRR formation and Partition A, the task‐overlap strategy was 5.8 and 6.5 times faster than sequential method for 16‐energy‐bin Mohan4 and 24‐energy‐bin Mohan6 spectrums, respectively. For Partition B, the speedup of the task‐overlap strategy was 5.2 and 5.5 times for Mohan4 and Mohan6 spectrums, respectively. For the following DRR formation, the speedups were 1.16 and 1.165 times in two‐task scenario for Mohan4 and Mohan6 spectrums. Conclusion: The task‐overlap strategy significantly improves the performance for parallel processing of multi‐energetic DRR generation. The parallelism is increased when more energies and tasks are driven by the dispatcher.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research