The K-mer antibiotic resistance gene variant analyzer (KARGVA)

Simone Marini, Christina Boucher, Noelle Noyes, Mattia Prosperi

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license.

Original languageEnglish (US)
Article number1060891
JournalFrontiers in Microbiology
Volume14
DOIs
StatePublished - 2023

Bibliographical note

Publisher Copyright:
Copyright © 2023 Marini, Boucher, Noyes and Prosperi.

Keywords

  • antibiotic resistance
  • bioinformatics
  • gene variants
  • high-throughput sequencing
  • metagenomics
  • statistical learning

PubMed: MeSH publication types

  • Journal Article

Fingerprint

Dive into the research topics of 'The K-mer antibiotic resistance gene variant analyzer (KARGVA)'. Together they form a unique fingerprint.

Cite this