Premium
Browsing large‐scale cheminformatics data with dimension reduction
Author(s) -
Choi Jong Youl,
Bae SeungHee,
Qiu Judy,
Chen Bin,
Wild David
Publication year - 2011
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.1781
Subject(s) - cheminformatics , computer science , visualization , data science , dimensionality reduction , data visualization , scientific visualization , data discovery , chemical space , dimension (graph theory) , data mining , information retrieval , drug discovery , world wide web , bioinformatics , metadata , artificial intelligence , biology , mathematics , pure mathematics
SUMMARY Visualization of large‐scale high dimensional data is highly valuable for data analysis facilitating scientific discovery in many fields. We present PubChemBrowse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in Geographic Information System browsers for Earth and Environment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in‐house high performance parallel Multi‐dimensional scaling and Generative topographic mapping services and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We prototype the integration with Chem2Bio2RDF system using SPARQL endpoint to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated PubChemBrowse application and outline its use in drug discovery. The same core technologies are generally applicable to develop high performance scientific data browsing systems for other applications. Copyright © 2011 John Wiley & Sons, Ltd.