Premium
Ultrafast convolution/superposition using tabulated and exponential kernels on GPU
Author(s) -
Chen Quan,
Chen Mingli,
Lu Weiguo
Publication year - 2011
Publication title -
medical physics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.473
H-Index - 180
eISSN - 2473-4209
pISSN - 0094-2405
DOI - 10.1118/1.3551996
Subject(s) - kernel (algebra) , speedup , computer science , convolution (computer science) , exponential function , parallel computing , computational science , computation , cuda , dimension (graph theory) , algorithm , superposition principle , mathematics , artificial intelligence , discrete mathematics , mathematical analysis , artificial neural network , pure mathematics
Purpose: Collapsed‐cone convolution/superposition (CCCS) dose calculation is the workhorse for IMRT dose calculation. The authors present a novel algorithm for computing CCCS dose on the modern graphic processing unit (GPU). Methods: The GPU algorithm includes a novel TERMA calculation that has no write‐conflicts and has linear computation complexity. The CCCS algorithm uses either tabulated or exponential cumulative‐cumulative kernels (CCKs) as reported in literature. The authors have demonstrated that the use of exponential kernels can reduce the computation complexity by order of a dimension and achieve excellent accuracy. Special attentions are paid to the unique architecture of GPU, especially the memory accessing pattern, which increases performance by more than tenfold. Results: As a result, the tabulated kernel implementation in GPU is two to three times faster than other GPU implementations reported in literature. The implementation of CCCS showed significant speedup on GPU over single core CPU. On tabulated CCK, speedups as high as 70 are observed; on exponential CCK, speedups as high as 90 are observed. Conclusions: Overall, the GPU algorithm using exponential CCK is 1000–3000 times faster over a highly optimized single‐threaded CPU implementation using tabulated CCK, while the dose differences are within 0.5% and 0.5 mm. This ultrafast CCCS algorithm will allow many time‐sensitive applications to use accurate dose calculation.