z-logo
Premium
How Consistent are Publicly Reported Cytotoxicity Data? Large‐Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements
Author(s) -
CortésCiriano Isidro,
Bender Andreas
Publication year - 2016
Publication title -
chemmedchem
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.817
H-Index - 100
eISSN - 1860-7187
pISSN - 1860-7179
DOI - 10.1002/cmdc.201500424
Subject(s) - cytotoxicity , data set , statistics , mathematics , chemistry , biochemistry , in vitro
While increased attention is being paid to the impact of data quality in cell‐line sensitivity and toxicology modeling, to date, no systematic study has evaluated the comparability of independent cytotoxicity measurements on a large‐scale. Here, we estimate the experimental uncertainty of public cytotoxicity data from ChEMBL version 19. We applied stringent filtering criteria to assemble a curated data set comprised of pIC 50 data for compound–cell line systems measured in independent laboratories. The estimated experimental uncertainty calculated was a mean unsigned error (MUE) value of 0.61–0.76, a median unsigned error (MedUE) value of 0.51–0.58, and a standard deviation of 0.76–1.00 pIC 50 units. The experimental uncertainty ( σ E ) estimated from all pairs of cytotoxicity measurements with a ΔpIC 50 value lower than 2.5 was found to be 0.59–0.77 pIC 50 units, and thus 21–60 % and 21–26 % higher than that of p K i and pIC 50 data for ligand–protein data ( σ E =0.47–0.48 p K i units and σ E =0.57‐0.61 pIC 50 units, respectively). The estimated σ E value from the pairs of pIC 50 values measured with metabolic assays was 0.98, whereas the σ E value was found to be 0.69 when using the 1388 pIC 50 pairs measured using exactly the same experimental setup. The maximum achievable Pearson correlation coefficient ( R P e a r s o n m a x . 2 ) of in silico models trained on cytotoxicity data from different laboratories was estimated to be 0.51–0.85, which is considerably different from the value of 1 corresponding to perfect predictions, hinting at the maximum performance one can expect also from computational cytotoxicity predictions. The lowest concordance between pairs of measurements was found for the drugs paclitaxel, methotrexate, zidovudine, and docetaxel, and for the cell lines HepG2, NCI‐H460, L1210, and CCRF‐CEM, hinting at particular sensitivity of those systems to experimental setups. The highest concordance was estimated for the compound–cell line system HL‐60–etoposide ( σ E =0.70), whereas the lowest for L1210–methotrexate ( σ E =1.68). We found that annotation errors are responsible for the high discordance observed for some pairs of measurements, pointing out the importance of data curation when automatically extracting cytotoxicity data from public databases. Likewise, these results highlight the importance of estimating compound cytotoxicity with assays providing complementary biological information (i.e., metabolic, clonogenic and assays based on cell membrane integrity), especially when the mechanism of action of test compounds is unknown. From this analysis, guidelines can be created on the reliability of cytotoxicity data from public databases, which could ultimately prove valuable for modeling purposes, and to guide reporting of data in the literature.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here