Premium
Accelerating iterative coordinate descent using a stored system matrix
Author(s) -
Hsieh Scott S.,
Hoffman John M.,
Noo Frederic
Publication year - 2019
Publication title -
medical physics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.473
H-Index - 180
eISSN - 2473-4209
pISSN - 0094-2405
DOI - 10.1002/mp.13543
Subject(s) - computer science , initialization , coordinate descent , algorithm , convergence (economics) , voxel , iterative reconstruction , iterative method , computational science , parallel computing , computer vision , economics , programming language , economic growth
Purpose The computational burden associated with model‐based iterative reconstruction ( MBIR ) is still a practical limitation. Iterative coordinate descent ( ICD ) is an optimization approach for MBIR that has sometimes been thought to be incompatible with modern computing architectures, especially graphics processing units ( GPU s). The purpose of this work is to accelerate the previously released open‐source Free CT _ ICD to include GPU acceleration and to demonstrate computational performance with ICD that is comparable with simultaneous update approaches. Methods Free CT _ ICD uses a stored system matrix ( SSM ), which precalculates the forward projector in the form of a sparse matrix and then reconstructs on a rotating coordinate grid to exploit helical symmetry. In our GPU ICD implementation, we shuffle the sinogram memory ordering such that data access in the sinogram coalesce into fewer transactions. We also update N S voxels in the xy‐plane simultaneously to improve occupancy. Conventional ICD updates voxels sequentially ( N S = 1). Using N S > 1 eliminates existing convergence guarantees. Convergence behavior in a clinical dataset was therefore studied empirically. Results On a pediatric dataset with sinogram size of 736 × 16 × 13860 reconstructed to a matrix size of 512 × 512 × 128, our code requires about 20 s per iteration on a single GPU compared to 2300 s per iteration for a 6‐core CPU using Free CT _ ICD . After 400 iterations, the proposed and reference codes converge within 2 HU RMS difference ( RMSD ). Using a wFBP initialization, convergence within 10 HU RMSD is achieved within 4 min. Convergence is similar with N S values between 1 and 256, and N S = 16 was sufficient to achieve maximum performance. Divergence was not observed until N S > 1024. Conclusions With appropriate modifications, ICD may be able to achieve computational performance competitive with simultaneous update algorithms currently used for MBIR .