Premium
A hybrid knowledge and ensemble classification approach for prediction of venous thromboembolism
Author(s) -
Sabra Susan,
Malik Khalid Mahmood,
Afzal Muhammad,
Sabeeh Vian,
Charaf Eddine Ahmad
Publication year - 2020
Publication title -
expert systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.365
H-Index - 38
eISSN - 1468-0394
pISSN - 0266-4720
DOI - 10.1111/exsy.12388
Subject(s) - computer science , artificial intelligence , ensemble learning , classifier (uml) , machine learning , ontology , feature selection , natural language processing , data mining , philosophy , epistemology
Abstract Clinical narratives such as progress summaries, lab reports, surgical reports, and other narrative texts contain key biomarkers about a patient's health. Evidence‐based preventive medicine needs accurate semantic and sentiment analysis to extract and classify medical features as the input to appropriate machine learning classifiers. However, the traditional approach of using single classifiers is limited by the need for dimensionality reduction techniques, statistical feature correlation, a faster learning rate, and the lack of consideration of the semantic relations among features. Hence, extracting semantic and sentiment‐based features from clinical text and combining multiple classifiers to create an ensemble intelligent system overcomes many limitations and provides a more robust prediction outcome. The selection of an appropriate approach and its interparameter dependency becomes key for the success of the ensemble method. This paper proposes a hybrid knowledge and ensemble learning framework for prediction of venous thromboembolism (VTE) diagnosis consisting of the following components: a VTE ontology, semantic extraction and sentiment assessment of risk factor framework, and an ensemble classifier. Therefore, a component‐based analysis approach was adopted for evaluation using a data set of 250 clinical narratives where knowledge and ensemble achieved the following results with and without semantic extraction and sentiment assessment of risk factor, respectively: a precision of 81.8% and 62.9%, a recall of 81.8% and 57.6%, an F measure of 81.8% and 53.8%, and a receiving operating characteristic of 80.1% and 58.5% in identifying cases of VTE.