Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces

Fabio Broccatelli; Richard Trager; Michael Reutlinger; George Karypis; Mufei Li

doi:10.1002/minf.202100321

Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces

Fabio Broccatelli, Richard Trager, Michael Reutlinger, George Karypis, Mufei Li

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

In this work, we benchmark a variety of single- and multi-task graph neural network (GNN) models against lower-bar and higher-bar traditional machine learning approaches employing human engineered molecular features. We consider four GNN variants – Graph Convolutional Network (GCN), Graph Attention Network (GAT), Message Passing Neural Network (MPNN), and Attentive Fingerprint (AttentiveFP). So far deep learning models have been primarily benchmarked using lower-bar traditional models solely based on fingerprints, while more realistic benchmarks employing fingerprints, whole-molecule descriptors and predictions from other related endpoints (e. g., LogD7.4) appear to be scarce for industrial ADME datasets. In addition to time-split test sets based on Genentech data, this study benefits from the availability of measurements from an external chemical space (Roche data). We identify GAT as a promising approach to implementing deep learning models. While all the deep learning models significantly outperform lower-bar benchmark traditional models solely based on fingerprints, only GATs seem to offer a small but consistent improvement over higher-bar benchmark traditional models. Finally, the accuracy of in vitro assays from different laboratories predicting the same experimental endpoints appears to be comparable with the accuracy of GAT single-task models, suggesting that most of the observed error from the models is a function of the experimental error propagation.

Original language	English (US)
Article number	2100321
Journal	Molecular Informatics
Volume	41
Issue number	8
DOIs	https://doi.org/10.1002/minf.202100321
State	Published - Aug 2022
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2022 Wiley-VCH GmbH.

Keywords

ADME
deep learning
graph neural network
in vitro assays
multi-task learning

Access

10.1002/minf.202100321

OpenUrl availability

Full text

Cite this

@article{f00f15a8972d4d02a21b14488a9efa70,

title = "Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces",

abstract = "In this work, we benchmark a variety of single- and multi-task graph neural network (GNN) models against lower-bar and higher-bar traditional machine learning approaches employing human engineered molecular features. We consider four GNN variants – Graph Convolutional Network (GCN), Graph Attention Network (GAT), Message Passing Neural Network (MPNN), and Attentive Fingerprint (AttentiveFP). So far deep learning models have been primarily benchmarked using lower-bar traditional models solely based on fingerprints, while more realistic benchmarks employing fingerprints, whole-molecule descriptors and predictions from other related endpoints (e. g., LogD7.4) appear to be scarce for industrial ADME datasets. In addition to time-split test sets based on Genentech data, this study benefits from the availability of measurements from an external chemical space (Roche data). We identify GAT as a promising approach to implementing deep learning models. While all the deep learning models significantly outperform lower-bar benchmark traditional models solely based on fingerprints, only GATs seem to offer a small but consistent improvement over higher-bar benchmark traditional models. Finally, the accuracy of in vitro assays from different laboratories predicting the same experimental endpoints appears to be comparable with the accuracy of GAT single-task models, suggesting that most of the observed error from the models is a function of the experimental error propagation.",

keywords = "ADME, deep learning, graph neural network, in vitro assays, multi-task learning",

author = "Fabio Broccatelli and Richard Trager and Michael Reutlinger and George Karypis and Mufei Li",

note = "Publisher Copyright: {\textcopyright} 2022 Wiley-VCH GmbH.",

year = "2022",

month = aug,

doi = "10.1002/minf.202100321",

language = "English (US)",

volume = "41",

journal = "Molecular Informatics",

issn = "1868-1743",

publisher = "Wiley - VCH Verlag GmbH & CO. KGaA",

number = "8",

}

TY - JOUR

T1 - Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces

AU - Broccatelli, Fabio

AU - Trager, Richard

AU - Reutlinger, Michael

AU - Karypis, George

AU - Li, Mufei

PY - 2022/8

Y1 - 2022/8

N2 - In this work, we benchmark a variety of single- and multi-task graph neural network (GNN) models against lower-bar and higher-bar traditional machine learning approaches employing human engineered molecular features. We consider four GNN variants – Graph Convolutional Network (GCN), Graph Attention Network (GAT), Message Passing Neural Network (MPNN), and Attentive Fingerprint (AttentiveFP). So far deep learning models have been primarily benchmarked using lower-bar traditional models solely based on fingerprints, while more realistic benchmarks employing fingerprints, whole-molecule descriptors and predictions from other related endpoints (e. g., LogD7.4) appear to be scarce for industrial ADME datasets. In addition to time-split test sets based on Genentech data, this study benefits from the availability of measurements from an external chemical space (Roche data). We identify GAT as a promising approach to implementing deep learning models. While all the deep learning models significantly outperform lower-bar benchmark traditional models solely based on fingerprints, only GATs seem to offer a small but consistent improvement over higher-bar benchmark traditional models. Finally, the accuracy of in vitro assays from different laboratories predicting the same experimental endpoints appears to be comparable with the accuracy of GAT single-task models, suggesting that most of the observed error from the models is a function of the experimental error propagation.

AB - In this work, we benchmark a variety of single- and multi-task graph neural network (GNN) models against lower-bar and higher-bar traditional machine learning approaches employing human engineered molecular features. We consider four GNN variants – Graph Convolutional Network (GCN), Graph Attention Network (GAT), Message Passing Neural Network (MPNN), and Attentive Fingerprint (AttentiveFP). So far deep learning models have been primarily benchmarked using lower-bar traditional models solely based on fingerprints, while more realistic benchmarks employing fingerprints, whole-molecule descriptors and predictions from other related endpoints (e. g., LogD7.4) appear to be scarce for industrial ADME datasets. In addition to time-split test sets based on Genentech data, this study benefits from the availability of measurements from an external chemical space (Roche data). We identify GAT as a promising approach to implementing deep learning models. While all the deep learning models significantly outperform lower-bar benchmark traditional models solely based on fingerprints, only GATs seem to offer a small but consistent improvement over higher-bar benchmark traditional models. Finally, the accuracy of in vitro assays from different laboratories predicting the same experimental endpoints appears to be comparable with the accuracy of GAT single-task models, suggesting that most of the observed error from the models is a function of the experimental error propagation.

KW - ADME

KW - deep learning

KW - graph neural network

KW - in vitro assays

KW - multi-task learning

UR - http://www.scopus.com/inward/record.url?scp=85125072817&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85125072817&partnerID=8YFLogxK

U2 - 10.1002/minf.202100321

DO - 10.1002/minf.202100321

M3 - Article

C2 - 35156325

AN - SCOPUS:85125072817

SN - 1868-1743

VL - 41

JO - Molecular Informatics

JF - Molecular Informatics

IS - 8

M1 - 2100321

ER -

Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this