z-logo
open-access-imgOpen Access
An Aggressive Implementation Method of Branch Instruction Prefetch
Author(s) -
Yan Sun,
Ye Yuan,
Weili Li,
Dongyan Zhao,
Liang Liu,
Rui Tian
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1769/1/012062
Subject(s) - instruction prefetch , branch predictor , computer science , pipeline (software) , parallel computing , clock rate , real time computing , embedded system , computer hardware , operating system , telecommunications , chip , cache
Due to the demand in the field of industrial control, we developed a CPU core based on ARMv8m architecture [1], named TS400, and its performance is comparable to Arm Cortex-M33 [2]. The TS400 has four stages of pipelining, thus having a higher clock frequency than Cortex-M33 which has three stages. With the increase of pipeline stages, extra idle clock beats will be introduced when the pipeline is flushed, which will increase the clock cycles per instruction (CPI) value and decrease the score of CoreMark. Accurate instruction fetch and branch prediction can effectively reduce the impact of refreshing pipeline, at the cost of extra logic resources. In TS400, an aggressive branch instruction prefetch method is designed. Compared with branch prediction technology, this method does not need complex branch prediction logic and is suitable for the design of embedded CPU. The aggressive branch instruction prefetch method includes: 1) reducing the time cost of conditional branch target fetching to the minimum by taking the branch first and then confirming the branch taking result; 2) optimizing the bus control signal timing of the prefetch instruction to make the target address prefetch respond in time. The aggressive branch prefetch method reduces the impact of pipeline stall caused by the execution of conditional branch instructions as much as possible, thus achieving better running performance than Cortex-M33 at the same clock frequency, while the clock frequency performance is superior to Cortex-M33.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here