
Understanding Clinical Data using Exploratory Analysis
Author(s) -
Owk Mrudula,
A.Mary Sowjanya
Publication year - 2020
Publication title -
international journal of recent technology and engineering
Language(s) - English
Resource type - Journals
ISSN - 2277-3878
DOI - 10.35940/ijrte.e6827.018520
Subject(s) - computer science , exploratory data analysis , naive bayes classifier , data pre processing , decision tree , outlier , data mining , data science , machine learning , logistic regression , missing data , preprocessor , support vector machine , random forest , artificial intelligence , data discovery , metadata , operating system
In today's world the data plays an indispensable role. The proper understanding of data and its interpretation lays the foundation for the growth and also the success of company or an organization. As in domains such as business, finance and banking, health sector also produces huge amounts of data. This data needs to be properly analyzed and summarized before the data is modeled for a specific purpose. Generally, clinical data involves stakeholders like doctors, technicians, lab analysts, hospital managers, care providers and insurance agents. Exploratory Data Analysis plays an important role in providing the complete picture of the dataset along with identifying new insights and hidden patterns in the data. As such it becomes the most significant step before actually preprocessing the data. In our paper we have implemented EDA on Statlog heart disease dataset to identify the important variables, correlations between any variables, missing values, outliers and PCA. To verify, whether the process of EDA actually impacts the performance we have utilized machine learning algorithms like Naïve Bayes, Logistic regression, Decision Tree, Support Vector Machine, Random forest. Results indicate that the performance of the prediction model considerably increases after performing EDA regardless of the type of prediction algorithm used. Also the analysis of the dataset with graphical results helps the stakeholders to make better decisions regarding their patients and their treatments. Understanding any clinical data before modeling would prevent erroneous models later and exploratory analysis helps in achieving it.