
Data analysis methods in astronomic objects classification (Sloan Digital Sky Survey DR14)
Author(s) -
В. А. Голов,
D A Petrusevich
Publication year - 2021
Publication title -
rossijskij tehnologičeskij žurnal/russian technological journal
Language(s) - English
Resource type - Journals
eISSN - 2782-3210
pISSN - 2500-316X
DOI - 10.32362/2500-316x-2021-9-3-66-77
Subject(s) - sky , computer science , naive bayes classifier , classifier (uml) , boosting (machine learning) , artificial intelligence , decision tree , precision and recall , random forest , gradient boosting , machine learning , pattern recognition (psychology) , support vector machine , data mining , astrophysics , physics
In the paper Sloan Digital Sky Survey DR14 dataset was investigated. It contains statistical information about many astronomical objects. The information was obtained within the framework of the Sloan Digital Sky Survey project. There are telescopes at the Earth surface, at the Earth orbit and in the Lagrange points of some systems (Earth–Moon, Sun–Earth). The telescopes gain information in different frequency ranges. The large quantity of statistical information leads to the demand for analytical algorithms and systems capable of making classification. Such information is marked up well enough to build machine learning classification systems. The paper presents the results of a number of classifiers. The handled data contains measures of three types of astronomical objects of the Sloan Digital Sky Survey DR14 dataset (star, quasar, galaxy). The CART decision tree, logistic regression, naïve Bayes classifiers and ensembles of classifiers (random forest, gradient boosting) were implemented. Conclusions about special features of each machine learning classifier trained to solve this task are made at the end of the paper. In some cases, classifiers’ structure can be explained physically. The accuracy of the classifiers built in this research is more than 90% (metrics F1, precision and recall are implemented, because the classes are unbalanced). Taking these values into account classification task is supposed to be successfully solved. At the same time, the structure of classifiers and importance of features can be used as a physical explanation of the solution.