Data Science Tapas

Genome-wide association studies with Hail scalable software

Heidi Steiner

March 16, 2023

On Today’s Menu

  • Installations

  • Discuss

    • Hail + scalability 🏗️

    • Genome-wide association studies (GWAS) 🧬

  • Hands-on

    • perform GWAS with hail 🌨️

Prepare Workspace

  • Load CyVerse

  • Launch JupyterLab DataScience App

  • Install Java

sudo apt update
sudo apt install openjdk-8-jdk
  • Install hail
pip install hail

What is Hail?

  • Open source data science library

  • Scale-able genomic software

  • Unified genomic data representation

  • Community 🤗

The scalability problem

  • Problem:

    Not feasible to process tens or hundreds of thousands of whole genomes on a single computer

  • Solution:

    Worry about the contents of a pipeline, rather than how to parallelize it

Genome wide association studies


Statistical method to survey large amounts of genetic variants for a relationship with a disease (or a particular trait)

Let’s try it ▶️

Back to CyVerse…

Buen Provecho!