A fault‐tolerant routing strategy for  k ‐ary  n ‐direct  s ‐indirect topologies based on intermediate nodes | Zendy

Peñaranda Roberto | Zendy; Gómez María Engracia | Zendy; López Pedro | Zendy; Gunnar Gran Ernst | Zendy; Skeie Tor | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A fault‐tolerant routing strategy for k ‐ary n ‐direct s ‐indirect topologies based on intermediate nodes

Author(s) -

Peñaranda Roberto,

Gómez María Engracia,

López Pedro,

Gunnar Gran Ernst,

Skeie Tor

Publication year - 2017

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.4065

Subject(s) - fault tolerance , computer science , network topology , node (physics) , interconnection , network packet , distributed computing , routing (electronic design automation) , computer network , topology (electrical circuits) , engineering , electrical engineering , structural engineering

Summary Exascale computing systems are being built with thousands of nodes. The high number of components of these systems significantly increases the probability of failure. A key component for them is the interconnection network. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault‐tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid k ‐ary n ‐direct s ‐indirect family that provides optimal performance and connectivity at a reduced hardware cost. This paper presents a fault‐tolerant routing methodology for the k ‐ary n ‐direct s ‐indirect topology that degrades performance gracefully in presence of faults and tolerates a large number of faults without disabling any healthy computing node. In order to tolerate network failures, the methodology uses a simple mechanism. For any source‐destination pair, if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) with the aim of circumventing faults. The evaluation results shows that the proposed methodology tolerates a large number of faults. For instance, it is able to tolerate more than 99.5% of fault combinations when there are 10 faults in a 3‐D network with 1000 nodes using only 1 intermediate node and more than 99.98% if 2 intermediate nodes are used. Furthermore, the methodology offers a gracious performance degradation. As an example, performance degrades only by 1% for a 2‐D network with 1024 nodes and 1% faulty links.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research