z-logo
Premium
Data Mining Scientific Literature Demonstrates Use of Biological and Medical Data Across Scientific Disciplines
Author(s) -
Verdiguel Natalie,
Feng Zukang,
Westbrook John,
Zardecki Christine
Publication year - 2019
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.2019.33.1_supplement.493.10
Subject(s) - protein data bank (rcsb pdb) , protein data bank , biomedicine , computer science , subject (documents) , data science , analytics , information retrieval , world wide web , bioinformatics , chemistry , biology , protein structure , biochemistry , stereochemistry
The Protein Data Bank (PDB) was established as the 1st open access digital data resource in biology and medicine. Today, the PDB archive contains over 140,000 atomic‐level biomolecular structures determined by crystallography, NMR spectroscopy, and 3D electron microscopy. About 20 new structures are deposited daily. PDB data is used in basic and applied research, patent applications, discovery of lifesaving drugs, product innovation, and education. About 80% of all PDB structures deposited have a “primary citation,” which is the first paper to describe the molecule and its structure and function. In turn, these primary citations have contributed to more than 1 million publications. An initial study by Clarivate Analytics on the Nucleic Acids Research publication demonstrated that PDB data drive high‐impact research in diverse scientific fields. To build on this study, we used Python programming to perform several data mining studies to understand the impact of these structures in fundamental biology, biomedicine, and energy in the scientific literature. The scientific literature database, Web of Science, tracks scientific article metrics such as the number of times cited, subject category, year published, funding source, and keywords. Web of Science data for PDB primary citations was compiled as used as the data set. Two main investigations were performed. In the first, we examined the number of citations per article (data overall and by individual subject categories) to develop “top 10 cited structures” lists and analyzed for trends. In the second investigation, we extended our data mining analysis to consider for each main subject category, the papers with the most citations and the structures they cite, and the most frequently occuring keywords. We have observed different aspects of biological data utility in various subject areas, at times in ways unanticipated from the structures' primary citation. This research demonstrates the value of biological data from the PDB by showing the diverse ways in which the data are being used. Support or Funding Information RCSB PDB is funded by a grant (DBI‐1338415) from the National Science Foundation, the National Institutes of Health, and the US Department of Energy. This abstract is from the Experimental Biology 2019 Meeting. There is no full text article associated with this abstract published in The FASEB Journal .

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here