z-logo
Premium
Assessment of Modeling Techniques and Feature Selection for Predicting Drug Response from Gene Expression Data for Cytotoxic Anticancer Agents
Author(s) -
Mannheimer Joshua D.,
Prasad Ashok,
Duval Dawn L.,
Gustafson Daniel L.
Publication year - 2018
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.2018.32.1_supplement.566.5
Subject(s) - feature selection , linear regression , linear model , regression analysis , regression , support vector machine , elastic net regularization , correlation , computer science , artificial intelligence , feature (linguistics) , data set , machine learning , computational biology , statistics , mathematics , biology , linguistics , geometry , philosophy
Large amounts of genomics data being generated in vitro , in vivo , and clinically has led to statistical and mathematical modeling efforts to better understand and predict outcomes for drug treatments. However, because of the diverse set of techniques statistical learning provides, different modeling approaches can lead to drastically different results and appropriate considerations unique to the problem must be taken for optimal model performance. The high dimensional nature of gene expression data, model complexity and selection of predictor variables (features) are important factors that impact model performance. The Genomics of Drug Sensitivity in Cancer (GDSC) cell line panel consists of >1000 human cancer cell lines with drug response for 265 compounds. Using drug response data for 15 cytotoxic agents, the relationship between modeling methods and model performance was explored. Particularly, systematic assessment between linear and non‐linear regression techniques and strategies for feature selection were performed. Among different regression techniques two non‐linear techniques, non‐linear Support Vector Regression (NLSVR) and Artificial Neural Network (ANN), and two linear regression techniques, Principle Components Regression (PCR) and linear Support Vector Regression (LSVR) were compared. The modeling performance of regression techniques observed was NLSVR>PCR>LSVR>ANN when assessing both spearman correlation and mean absolute difference between predicted and measured drug response. However, performance between NLSVR, PCR, and LSVR were not significantly different (P>0.05). Additionally, utilizing correlation‐based methods for selecting genes to be used in model development enhanced model performance. However, further investigation determined that using an equal number of randomly selected genes yielded comparable model predictions establishing that feature reduction played a more prominent role than the selective process in this case. Analysis into the correlation between drug response and histotype determined that drug response and histotype were extensively linked. Cluster analysis showed that a subset of 1000 genes could accurately discern different histotypes suggesting that model performance was dominated by histotype identification. This was verified by construction of a predictive model using only histotype specific variables. Although this is not an exhaustive analysis of all modeling techniques, it provides a strong baseline demonstrating how choice of model, particularly linear verses non‐linear and feature selection influence model performance. Additionally, it reveals an interesting insight about pan‐cancer models in that strong links between histotype and drug response dominate model performance. Future work should be directed at identifying additional sources of data and modeling approaches that favor selection of genes that can emphasize response on a cell by cell basis while also conserving histotype information critical to drug response. This abstract is from the Experimental Biology 2018 Meeting. There is no full text article associated with this abstract published in The FASEB Journal .

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here