TY - JOUR
T1 - Finding Needles in a Haystack
T2 - Determining Key Molecular Descriptors Associated with the Blood-brain Barrier Entry of Chemical Compounds Using Machine Learning
AU - Majumdar, Subhabrata
AU - Basak, Subhash C.
AU - Lungu, Claudiu N.
AU - Diudea, Mircea V.
AU - Grunwald, Gregory D.
N1 - Publisher Copyright:
© 2019 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
PY - 2019/8/1
Y1 - 2019/8/1
N2 - In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.
AB - In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.
KW - blood-brain barrier
KW - machine learning
KW - molecular descriptors
KW - quantitative structure-activity relationship (QSAR)
KW - random forest
KW - two-deep cross validation
KW - variable selection
UR - http://www.scopus.com/inward/record.url?scp=85069918286&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069918286&partnerID=8YFLogxK
U2 - 10.1002/minf.201800164
DO - 10.1002/minf.201800164
M3 - Article
C2 - 31322827
AN - SCOPUS:85069918286
SN - 1868-1743
VL - 38
JO - Molecular Informatics
JF - Molecular Informatics
IS - 8-9
M1 - 1800164
ER -