ChURP: A lightweight CLI framework to enable novice users to analyze sequencing datasets in parallel

Joshua Baller, Thomas Kono, Adam Herman, Ying Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

Progressive decreases in the cost of DNA sequencing have contributed to a decades-long exponential increase in the production of new sequencing datasets. The processing of these datasets has in turn led biology, a field that has traditionally relied on local "lab" servers to address its computational needs, to become increasingly reliant on High Performance Computing (HPC) resources. Though many operations on sequencing datasets are trivially parallelizable on multiple levels, the lack of an HPC tradition in biological research has hampered fully parallelized deployments. Here we present a lightweight flexible framework for performing parallelized processing of raw gene expression data. The framework uses a Python3 based frontend for specifying analysis options, data paths, and reference datasets. This frontend sanitizes and resolves the options, providing verbose error checking before writing a human readable configuration file and basic scripts for batch submission. The submission scripts leverage the scheduler to implement a scatter-gather approach, submitting potentially hundreds of individual jobs via a job array, each small enough to take advantage of backfill in a high contention HPC environment. The gather component is handled through a script submitted with an "after-okay" dependency.

Original languageEnglish (US)
Title of host publicationProceedings of the Practice and Experience in Advanced Research Computing
Subtitle of host publicationRise of the Machines (Learning), PEARC 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450372275
DOIs
StatePublished - Jul 28 2019
Event2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 - Chicago, United States
Duration: Jul 28 2019Aug 1 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019
Country/TerritoryUnited States
CityChicago
Period7/28/198/1/19

Bibliographical note

Publisher Copyright:
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Fingerprint

Dive into the research topics of 'ChURP: A lightweight CLI framework to enable novice users to analyze sequencing datasets in parallel'. Together they form a unique fingerprint.

Cite this