TY - JOUR
T1 - Predicting nicotine metabolism across ancestries using genotypes
AU - Baurley, James W.
AU - Bergen, Andrew W.
AU - Ervin, Carolyn M.
AU - Park, Sung shim Lani
AU - Murphy, Sharon E.
AU - McMahan, Christopher S.
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: There is a need to match characteristics of tobacco users with cessation treatments and risks of tobacco attributable diseases such as lung cancer. The rate in which the body metabolizes nicotine has proven an important predictor of these outcomes. Nicotine metabolism is primarily catalyzed by the enzyme cytochrone P450 (CYP2A6) and CYP2A6 activity can be measured as the ratio of two nicotine metabolites: trans-3’-hydroxycotinine to cotinine (NMR). Measurements of these metabolites are only possible in current tobacco users and vary by biofluid source, timing of collection, and protocols; unfortunately, this has limited their use in clinical practice. The NMR depends highly on genetic variation near CYP2A6 on chromosome 19 as well as ancestry, environmental, and other genetic factors. Thus, we aimed to develop prediction models of nicotine metabolism using genotypes and basic individual characteristics (age, gender, height, and weight). Results: We identified four multiethnic studies with nicotine metabolites and DNA samples. We constructed a 263 marker panel from filtering genome-wide association scans of the NMR in each study. We then applied seven machine learning techniques to train models of nicotine metabolism on the largest and most ancestrally diverse dataset (N=2239). The models were then validated using the other three studies (total N=1415). Using cross-validation, we found the correlations between the observed and predicted NMR ranged from 0.69 to 0.97 depending on the model. When predictions were averaged in an ensemble model, the correlation was 0.81. The ensemble model generalizes well in the validation studies across ancestries, despite differences in the measurements of NMR between studies, with correlations of: 0.52 for African ancestry, 0.61 for Asian ancestry, and 0.46 for European ancestry. The most influential predictors of NMR identified in more than two models were rs56113850, rs11878604, and 21 other genetic variants near CYP2A6 as well as age and ancestry. Conclusions: We have developed an ensemble of seven models for predicting the NMR across ancestries from genotypes and age, gender and BMI. These models were validated using three datasets and associate with nicotine dosages. The knowledge of how an individual metabolizes nicotine could be used to help select the optimal path to reducing or quitting tobacco use, as well as, evaluating risks of tobacco use.
AB - Background: There is a need to match characteristics of tobacco users with cessation treatments and risks of tobacco attributable diseases such as lung cancer. The rate in which the body metabolizes nicotine has proven an important predictor of these outcomes. Nicotine metabolism is primarily catalyzed by the enzyme cytochrone P450 (CYP2A6) and CYP2A6 activity can be measured as the ratio of two nicotine metabolites: trans-3’-hydroxycotinine to cotinine (NMR). Measurements of these metabolites are only possible in current tobacco users and vary by biofluid source, timing of collection, and protocols; unfortunately, this has limited their use in clinical practice. The NMR depends highly on genetic variation near CYP2A6 on chromosome 19 as well as ancestry, environmental, and other genetic factors. Thus, we aimed to develop prediction models of nicotine metabolism using genotypes and basic individual characteristics (age, gender, height, and weight). Results: We identified four multiethnic studies with nicotine metabolites and DNA samples. We constructed a 263 marker panel from filtering genome-wide association scans of the NMR in each study. We then applied seven machine learning techniques to train models of nicotine metabolism on the largest and most ancestrally diverse dataset (N=2239). The models were then validated using the other three studies (total N=1415). Using cross-validation, we found the correlations between the observed and predicted NMR ranged from 0.69 to 0.97 depending on the model. When predictions were averaged in an ensemble model, the correlation was 0.81. The ensemble model generalizes well in the validation studies across ancestries, despite differences in the measurements of NMR between studies, with correlations of: 0.52 for African ancestry, 0.61 for Asian ancestry, and 0.46 for European ancestry. The most influential predictors of NMR identified in more than two models were rs56113850, rs11878604, and 21 other genetic variants near CYP2A6 as well as age and ancestry. Conclusions: We have developed an ensemble of seven models for predicting the NMR across ancestries from genotypes and age, gender and BMI. These models were validated using three datasets and associate with nicotine dosages. The knowledge of how an individual metabolizes nicotine could be used to help select the optimal path to reducing or quitting tobacco use, as well as, evaluating risks of tobacco use.
KW - Machine learning
KW - Nicotine biomarkers
KW - Nicotine metabolism
KW - Polygenic risk score
KW - Smoking cessation
KW - Statistical learning
UR - http://www.scopus.com/inward/record.url?scp=85138291381&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138291381&partnerID=8YFLogxK
U2 - 10.1186/s12864-022-08884-z
DO - 10.1186/s12864-022-08884-z
M3 - Article
C2 - 36131240
AN - SCOPUS:85138291381
SN - 1471-2164
VL - 23
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 663
ER -