
SELECTION OF OPTIMAL SUMMARY STATISTICS FOR DATA ANALYSIS
Author(s) -
Vanshika Lamba
Publication year - 2021
Publication title -
international journal of engineering applied science and technology
Language(s) - English
Resource type - Journals
ISSN - 2455-2143
DOI - 10.33564/ijeast.2021.v06i01.041
Subject(s) - measure (data warehouse) , computer science , selection (genetic algorithm) , data mining , set (abstract data type) , data set , summary statistics , path (computing) , data science , statistics , machine learning , mathematics , artificial intelligence , programming language
Data analysis is the core part which needs to bedone over the data in order to gather its characteristics forfurther specifications and estimations. But achieving thegoal of extracting the maximum useful characteristics isthe main barrier in the path of any organization. Dataanalysis plays an important role in the success oforganization as it helps in proper decision making. Andthe best decision comes out by analyzing the pastinformation, their present scenario and future impacts.But most of the information is extracted in the numericalform from the data set collected. Therefore, we need toselect some proper summary statistics for the dataexploration purpose. For eg - mean, median, mode, etc.This paper focuses on the classes of summary statistics tobe used for the data analysis and how important is its usein the data exploration. The paper majorly concentrates onthe measure of location and a brief idea about measure ofdispersion and how measure of location is related tomeasure of spread.