
Data Pre-Processing and Customized Onto-Graph Construction for Knowledge Extraction in Healthcare Domain of Semantic Web
Author(s) -
P Monika,
G. T. Raju
Publication year - 2019
Publication title -
international journal of innovative technology and exploring engineering
Language(s) - English
Resource type - Journals
ISSN - 2278-3075
DOI - 10.35940/ijitee.k1423.0981119
Subject(s) - computer science , raw data , missing data , imputation (statistics) , graph , data mining , knowledge extraction , semantic web , information retrieval , external data representation , artificial intelligence , natural language processing , machine learning , theoretical computer science , programming language
Present electronic world produces enormous amount of data every second in various formats, especially in healthcare units. To efficiently utilize the available data by representing it in the machine readable form, the concept of Semantic web stepped in progressing towards automated knowledge discovery process. In this paper, comprehensive pre-processing techniques have been proposed for preparing the raw data to be presentable in structured format so as to construct the onto-graph for selected features in a health care domain. Cluster based Missing Value Imputation Algorithm (CMVI) has been proposed to enhance the quality of the imputed data which is the most important step during data pre-processing. Missing values were randomly induced into the Pima Indian Diabetic dataset with the missing ratio of 1%, 3% and 5% for each attribute up to 50% of the attributes in the original diabetic dataset. The experimental observations reveal that the quality of the pre-processed data is better compared to raw, unprocessed data in terms of imputation accuracy measured against coefficient of determination (R2 ), Index of agreement (d2 ) and Root Mean Square Error (RMSE).Documented results proved that the proposed techniques are comparatively superior than the traditional approaches with increased R2 & d2 and decreased RMSE scores. Further, importance of knowledge graph and various ontological representation types are discussed in short as construction of .owl file is the first step towards automation in semantic web.