Open Access
Efficient implementation of iterative multi‐input–multi‐output orthogonal frequency‐division multiplexing receiver using minimum‐mean‐square error interference cancellation
Author(s) -
Han Bing,
Yang Zengli,
Zheng Yahong Rosa
Publication year - 2014
Publication title -
iet communications
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.355
H-Index - 62
eISSN - 1751-8636
pISSN - 1751-8628
DOI - 10.1049/iet-com.2013.0694
Subject(s) - minimum mean square error , cordic , computer science , single antenna interference cancellation , qr decomposition , algorithm , estimator , multiplexing , low density parity check code , decoding methods , field programmable gate array , computer hardware , mathematics , telecommunications , statistics , eigenvalues and eigenvectors , physics , quantum mechanics
An efficient hardware implementation scheme is proposed for iterative multi‐input–multi‐output orthogonal frequency‐division multiplexing receiver which includes an MMSE‐IC (minimum‐mean‐square error interference cancellation) detector, a channel estimator, a low‐density parity‐check (LDPC) decoder and other supporting modules. The proposed implementation uses the QR decomposition (QRD) of the complex‐valued matrices with four coordinate rotation digital computer (CORDIC) cores and a back substitution to solve the MMSE‐IC equations while the existing systolic array architectures require 15–38 CORDIC cores to achieve a similar throughput. The proposed 4‐CORDIC QRD architecture can be configured as a 16‐matrix or a 64‐matrix pipelining by using a different number of multipliers combined with one‐dimensional (1D) or 2D arrays of the back substitution, respectively. The channel estimator implements a commonly‐used frequency domain least squares channel estimation with the canonic‐signed‐digits method, thanks to the character of the Zadroff‐Chu sequence used as the pilot. In the LDPC decoder, the min‐sum algorithm is implemented for the quasicyclic LDPC decoding. The two schemes for the MMSE‐IC detector with different throughput and resource usages have been implemented in a Field Programmable Gate Array for a complete baseband turbo receiver. Their resource usages, throughputs and latencies are compared with the classic systolic array architectures, which demonstrate that the proposed receiver architecture achieves the best tradeoff between the throughput and the resource usage.