Research Library

open-access-imgOpen Accesst-SMILES: A Scalable Fragment-based Molecular Representation Framework for De Novo Molecule Generation
Author(s)
Juan-Ni Wu,
Tong Wang,
Yue Chen,
Li-Juan Tang,
Hai-Long Wu,
Ru-Qin Yu
Publication year2024
Effective representation of molecules is a crucial factor affecting theperformance of artificial intelligence models. This study introduces aflexible, fragment-based, multiscale molecular representation framework calledt-SMILES (tree-based SMILES) with three code algorithms: TSSA (t-SMILES withShared Atom), TSDY (t-SMILES with Dummy Atom) and TSID (t-SMILES with ID). Itdescribes molecules using SMILES-type strings obtained by performing abreadth-first search on a full binary tree formed from a fragmented moleculargraph. Systematic evaluations using JTVAE, BRICS, MMPA, and Scaffold show thefeasibility to construct a multi-code molecular description system, wherevarious descriptions complement each other, enhancing the overall performance.Additionally, it exhibits impressive performance on low-resource datasets,whether the model is original, data augmented, or pre-training fine-tuned. Itsignificantly outperforms classical SMILES, DeepSMILES, SELFIES and baselinemodels in goal-directed tasks. Furthermore, it surpasses start-of-the-artfragment, graph and SMILES based approaches on ChEMBL, Zinc, and QM9.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here