The personal and clinical utility of polygenic risk scores

Ali Torkamani, Nathan E. Wineinger and Eric J. Topol



Paper Discussion


  • Review
    • Common vs. rare variants
    • Heritability
      • population vs. individual perspective
  • Polygenic Risk Scores (PRS)
    • Calculation
    • Utility 
  • Perspectives

common (MAF>5%) vs. rare (MAF<0.5%) variants

Genetic architecture of common adult-onset diseases

  • Familial form
    • 1–10% of disease incidence
    • highly penetrant rare variants
    • small set of genes 
  • Nonfamilial form 
    • common variants of small effects, throughout genome
    • smaller contribution from rare variants of moderate effect, in genes driving familial disease

common disease, common variant hypothesis

History of GWAS risk profiling (Box 1)

2007: first large-scale GWAS

  • N=1,000–5,000 affected individuals
  • 1–3 associated locis, <5% of disease heritability
  • Alzheimer: APOE-ε4: ~5% of heritability.

2007 - 2012:

  • N~10,000s affected individuals
  • Per disease: 10s associated loci : ~10% heritability

Latest GWAS meta-analysis:

  • N>100,000
  • Per disease: 80-150 loci: 20-30% heritability

Heritability in a population vs. individual disease risk

Although the total heritability explained by BRCA1 and BRCA2 variants is low [due to low prevalence of mutations], BRCA1 and BRCA2 testing can identify a subset of individuals whose absolute risk of disease is significantly higher than that of the average individual in the general population.

Disease risk:

  • environmental exposures
  • lifestyle factors
  • genetic susceptibility ← heritability

Compute Polygenic risk scores (prs)

  • select SNPs based on the significance of individual association test statistics
  • weight SNPs according to corresponding estimated regression coefficients 



β: effect size in discovery sample from OLS (continuous trait) or logistic reg (binary trait; log(OR))

SNP_{ij}: # alleles (0,1,2) for SNP i of person j in target sample


Goal of prs

Probabilistic susceptibility

→ identify groups of individuals who could benefit

→ prioritize interventions and screening

→  life planning


Leading heritable causes of death:

  • coronary artery disease
  • type 2 diabetes mellitus
  • prostate cancer
  • breast cancer
  • Alzheimer disease

when combined with clinical risk estimates, a PRS may modify the estimated risk of some individuals so that their combined risk is at or above the level of risk recommended for the initiation of statin therapy

Model effect.


2 equivalent models (relative distribution of diseased vs. healthy individuals) yield different conclusions in utility

#individuals benefits from intervention/# individuals treated

by PRS tier

PRS Utility

depends on a fairly complex interplay between disease-specific and intervention-specific risks and benefits


  • Therapeutic intervention: selection of interventions to treat or prevent disease
  • Disease screening: decision to initiate and the interpretation of disease screens

  • Life planning: the personal utility that PRSs can provide, even in the absence of preventive action

Therapeutic intervention

Coronary artery disease

  • Highest quintile of genetic risk
    • ~30% increased risk of adverse coronary event
    • Statin therapy → ~45% reduction in 10-year risk of heart attack/CAD related death
  • Intermediate quintiles: ~25% risk reduction
  • PRS reclassifies ~12% intermediate risk → high risk
Individualized management of disease is central to the philosophy of precision medicine, with genetic factors often invoked for this strategy to personalize health care.

Disease screening

Breast cancer

  • Age-based criterion: average risk + harms due to FP mammography
  • Identify 16% of population to screen at 40 years old, 32% for delay screening

Colorectal cancer

  • 42 years of highest PRS decile vs 52 years of lowest PRS decile

Prostate cancer

  • Prioritize screening subgroup at high risk

Life planning

Clarifying susceptibility and quantifying benefits of healthy behavior → induce & maintain behavior change

  • CAD: Offset risk by optimal lifestyle habits (reduce overall risk by ~0.5)

Within each subgroup of genetic risk, a significant trend was observed toward decreased coronary-artery calcification among participants who were more adherent to a healthy lifestyle

Life planning

Clarifying susceptibility and quantifying benefits of healthy behavior → induce & maintain behavior change

  • Breast cancer: ~20% (?) cases would be avoided when healthy lifestyle choices were employed by top decile individuals

Life planning


PRS informs financial, legal and care planning.

  • AD: Life choices may mitigate AD onset?
  • Difference in average age of onset was 10 years in the top versus bottom decile of genetic risk (APOE effect removed)


Imperfect correlation with causal genetic factor(s) → Uncertainty in variant’s estimated effect size → Poor transferability Inequities

Improve comprehensiveness and generalizability

  • Capture uncertainty from measured and unmeasured factors of individual's risk estimates → distribution
  • Dynamic model: demographics, lifestyle, clinical risk factors
  • Integrate familial and polygenic risk
  • Whole-genome prediction models

PRS model via ML and AI

  • Requires large-scale data set
  • Difficult to interpret

Discussion points


  • description of common vs. rare risk variants
  • result summary precision

Current PRS projects

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

Amit V. Khera, Mark Chaffin, Krishna G. Aragam, Mary E. Haas, Carolina Roselli, Seung Hoan Choi, Pradeep Natarajan, Eric S. Lander, Steven A. Lubitz, Patrick T. Ellinor &

Sekar Kathiresan


  • tested polygenic score from  6 million variants in two groups of patients who have had early heart attacks (n = 20,280 and 288,978) from the UK BioBank
  • ​5 diseases: CAD, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, breast cancer
  • APOB: Most common single mutation that increases the risk of heart disease, causes heterozygous familial hypercholesterolemia (1/250 prevalence) and triple's a person's risk of having a heart attack
  • 5% to 8% have a polygenic score that also at least triples their risk of having a heart attack, about 20x as many people
  • future: study effect of statin drugs on highest-risk individuals 
  • genotyping services : 23andMe and Ancestry.

Twitter rounds

  • use of UK BioBank data: bias? 
  • theory -> practice
  • commercial benefits to the US?





Statistical analysis (testing):

  • best discriminative GPS: genotyped and imputed variants, Hail
  • covariates: age, gender, 4 PCs of ancestry, genotyping array
  • average predicted probability of disease from 100 groupings (percentile)


GPSs derivation:

  • 31 candidate GPSs: 7 from LDPred, 24 from pruning and thresolding (p-value and LD-driven clumping, PLINK)

GPSs calculation:

  • validation: 120,280 p1
  • testing: 288,978 p2

genome-wide polgenic scores (GPS): MOTIVATION

  • small GWAS, precision of estimated impact of variants
  • algorithms
  • validation datasets

Fig. 2: Risk for CAD according to GPS.


2a: Distribution of GPSCAD in population. For those in top 1% of CAD score, 11% had a heart attack by mean age 57 (vs 0.8% in lowest 1%).

2c: Y axis = PPV for CAD (by mean age 57). X axis = percentile bins of score; 100 bins: percentile of the score & prevalence of CAD within each bin plotted

Fig. 3: Risk gradient for disease according to the GPS percentile.


  • Compared to basing on rare monogenic mutations, PRS identifies larger fraction of the population who are at comparable or higher risk of having one of these common diseases.
  • For those in top 8% of scores, 3-4-fold higher risk than all others.

  • "Germline inherited component to any common disease can be captured by a single number that follows normal distribution."

  • Integration with clinical and environmental factors


Paper discussion: The personal and clinical utility of polygenic risk scores ​

By Trang Le

Paper discussion: The personal and clinical utility of polygenic risk scores ​

Presentation on 2018-10-01 for the joint Lunch&Learn between the Moore Lab and Ritchie Lab.

  • 777