Systematization of the Protein Sequence Diversity in Enzymes Related to Secondary Metabolic Pathways in Plants, in the Context of Big Data Biology Inspired by the KNApSAcK Motorcycle Database | Zendy

Shun Ikeda | Zendy; Takashi Abe | Zendy; Yukiko Nakamura | Zendy; Nelson Kibinge | Zendy; Aki Morita | Zendy; Atsushi Nakatani | Zendy; Naoaki Ono | Zendy; Toshimichi Ikemura | Zendy; Kensuke Nakamura | Zendy; Md. AltafUlAmin | Zendy; Shigehiko Kanaya | Zendy

Open Access

Systematization of the Protein Sequence Diversity in Enzymes Related to Secondary Metabolic Pathways in Plants, in the Context of Big Data Biology Inspired by the KNApSAcK Motorcycle Database

Author(s) -

Shun Ikeda,

Takashi Abe,

Yukiko Nakamura,

Nelson Kibinge,

Aki Morita,

Atsushi Nakatani,

Naoaki Ono,

Toshimichi Ikemura,

Kensuke Nakamura,

Md. AltafUlAmin,

Shigehiko Kanaya

Publication year - 2013

Publication title -

plant and cell physiology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.975

H-Index - 152

eISSN - 1471-9053

pISSN - 0032-0781

DOI - 10.1093/pcp/pct041

Subject(s) - knapsack problem , metabolomics , proteomics , biology , metabolic pathway , computational biology , diversity (politics) , systems biology , big data , context (archaeology) , genomics , database , bioinformatics , computer science , enzyme , genetics , data mining , biochemistry , genome , gene , paleontology , algorithm , sociology , anthropology

Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research