z-logo
open-access-imgOpen Access
Cause-aware failure detection using an interpretable XGBoost for optical networks
Author(s) -
Chunyu Zhang,
Danshi Wang,
Lingling Wang,
Le Guan,
Hui Yang,
Zhiguo Zhang,
Xue Chen,
Min Zhang
Publication year - 2021
Publication title -
optics express
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.394
H-Index - 271
ISSN - 1094-4087
DOI - 10.1364/oe.436293
Subject(s) - computer science , feature (linguistics) , feature engineering , ranking (information retrieval) , artificial intelligence , artificial neural network , data mining , scheme (mathematics) , machine learning , boosting (machine learning) , pattern recognition (psychology) , deep learning , mathematics , mathematical analysis , philosophy , linguistics
Failure detection is an important part of failure management, and network operators encounter serious consequences when operating under failure conditions. Machine learning (ML) is widely applied in the failure management of optical networks, where neural networks (NNs) have particularly attracted considerable attention and become the most extensively applied algorithm among all MLs. However, the black-box nature of NN makes it difficult to interpret or analyze why and how NNs work during execution. In this paper, we propose a cause-aware failure detection scheme for optical transport network (OTN) boards, adopting the interpretable extreme gradient boosting (XGBoost) algorithm. According to the feature importance ranking by XGBoost, the high-relevance features with the equipment failure are found. Then, SHapley Additive exPlanations (SHAP) is applied to solve the inconsistency of feature attribution under three common global feature importance measurement parameters of XGBoost, and can obtain a consistent feature attribution by calculating the contribution (SHAP value) of each input feature to detection result of XGBoost. Based on the feature importance ranking of SHAP values, the features most related to two types of OTN board failures are confirmed, enabling the identification of failure causes. Moreover, we evaluate the failure detection performance for two types of OTN boards, in which the practical data are balanced and unbalanced respectively. Experimental results show that the F1 score of the two types of OTN boards based on the proposed scheme is higher than 98%, and the most relevant features of the two types of board failures are confirmed based on SHAP value, which are the average and maximum values of the environment temperature, respectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here