Coarse Grain Parallelization of H.264 Video Decoder and Memory Bottleneck in Multi-Core Architectures
Author(s) -
Ahmet Gürhanlı,
Charlie ChungPing Chen,
ShihHao Hung
Publication year - 2011
Publication title -
international journal of computer theory and engineering
Language(s) - English
Resource type - Journals
ISSN - 1793-8201
DOI - 10.7763/ijcte.2011.v3.335
Subject(s) - computer science , bottleneck , parallel computing , multi core processor , video decoder , many core , decoding methods , algorithm , embedded system
Fine grain methods for parallelization of the H.264 decoder have good latency performance and less memory usage. However, they could not reach the scalability of coarse grain approaches although assuming a well-designed entropy decoder which can feed the increasing number of parallel working cores. We would like to introduce a GOP (Group of Pictures) level approach due to its high scalability, mentioning solution approaches for the well-known memory issues. Our design revokes the need to a scanner for GOP start-codes which was used in the earlier methods. This approach lets all the cores work on the decoding task. Our experiments showed that the memory initialization operations may degrade the scalability of parallel applications substantially. The multi-core cache architecture appeared to be a critical point for getting the desired speedup. We observed a speedup of 7.63 with 8 processors having separate caches, and a speedup of 13.35 using 16 processors when a cache is shared by 2 processors.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom