Premium
Visualization and Analysis of the REACH‐chemical Space with Generative Topographic Mapping
Author(s) -
Lunghini Filippo,
Gilles Marcou,
Azam Philippe,
Enrici MarieHélène,
Van Miert Erik,
Varnek Alexandre
Publication year - 2021
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.202000232
Subject(s) - property (philosophy) , computer science , chemical space , dimension (graph theory) , visualization , point (geometry) , authorization , data mining , space (punctuation) , mathematics , biology , bioinformatics , philosophy , computer security , epistemology , pure mathematics , drug discovery , geometry , operating system
In the framework of REACH (Registration Evaluation Authorization and restriction of Chemicals) regulation, industries have generated and reported a huge amount of (eco)toxicological data on substance produced or imported in Europe. The registration procedure initiated the creation of a large REACH database of well defined (eco)toxicological properties. Here, the data distribution in the REACH chemical space was analyzed with the help of the Generative Topographic Mapping (GTM) approach. GTM generates 2‐dimensional maps on which each compound is represented as a data point. The 3 rd dimension can be used in order to display a distribution of the given (eco)toxicological property, which can further be used for property assessment of new compounds projected on the map. We report the “Universal REACH map” which accommodates 11 endpoints, covering environmental fate and (eco)toxicological properties. This map demonstrates acceptable predictive performance: in cross‐validation, balanced accuracy ranges from 0.60 to 0.78. The 11 endpoints profile has been computed for each REACH‐registered substance. Some concerns related to acute aquatic toxicity have been identified, whereas for environmental fate and human health endpoints the amount of compounds predicted as of concern was much smaller. It has been demonstrated that superposition of several class landscapes allows to select the zones in the chemical space populated by compounds with a given (eco)toxicological profile.