
Web Based Application for Controlling Data Quality in Phenotype Prediction of Indonesian Rice Genomes
Author(s) -
Erna Budhiarti Nababan,
Rossy Nurhasanah,
Ade Sarah Huzaifah
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1566/1/012102
Subject(s) - snp , single nucleotide polymorphism , genetics , phenotype , biology , computer science , computational biology , data mining , gene , genotype
Single Nucleotide Polymorphism (SNP) is a form of Deoxyribonucleic Acid (DNA) variation that can be used in predicting phenotypes. Data quality control is a crucial stage in the process of detecting phenotypes using SNP data. In this study, we built a web-based application to carry out the SNP data quality control function. Raw SNP data in string type are filtered by calculating the missing rate, minor allele frequency, and Hardy-Weinberg Equilibrium values. The result is SNP data that has been filtered in numeric form, namely the value 1 represents dominant homozygous, 2 represents heterozygous and 3 represents homozygous recessive. SNP encoding in numerical form aims to make SNP data can be processed into machine learning for the further phenotype prediction step.