Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program

Noa Slater, Yoram Louzoun, Loren Gragert, Martin Maiers, Ansu Chatterjee, Mark Albrecht

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics.

Original languageEnglish (US)
Article numbere1004204
JournalPLoS computational biology
Volume11
Issue number4
DOIs
StatePublished - Apr 1 2015

Bibliographical note

Publisher Copyright:
© 2015 Slater et al.

Fingerprint

Dive into the research topics of 'Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program'. Together they form a unique fingerprint.

Cite this