High throughput sequencing

BSX-3139 Molecular Ecology and Evolution

Dr Axel Barlow
email: a.barlow.@bangor.ac.uk

High throughput sequencing

  • First generation sequencing (Sanger)
  • Second generation sequencing (Illumina)
  • Third generation sequencing (PacBio and Oxford Nanopore)

High throughput sequencing

plot of chunk unnamed-chunk-1

High throughput sequencing

plot of chunk unnamed-chunk-2

First generation sequencing

Sanger sequencing

plot of chunk unnamed-chunk-3

Sanger sequencing

plot of chunk unnamed-chunk-4

Sanger overview

plot of chunk unnamed-chunk-5

  • 700-1000 bp
  • High accuracy
  • 384 samples
  • 1 day
  • Dominated for 3 decades

Second generation sequencing

Illumina

plot of chunk unnamed-chunk-6

Data output

Platform read pairs Read length data output Genome coverage
MiniSeq 25 million 2 x 150 bp 7.5 Gb 2 x
MiSeq 25 million 2 x 300 bp 15 Gb 4 x
NextSeq 550 400 million 2 x 150 bp 120 Gb 33 x
NextSeq 2000 900 million 2 x 300 bp 540 Gb 150 x
HiSeq X 6 billion 2 x 150 bp 1.8 Tb 500 x
NovaSeq X Plus 52 billion 2 x 150 bp 16 Tb* 4444 x
  • *16 Tb = 16,000,000,000,000 bp

Sequencing by synthesis

  1. Sample preparation
  2. Bind DNA to flowcell, generate clusters
  3. Sequencing by synthesis
  4. Data analysis (in the machine)

Sample preparation

plot of chunk unnamed-chunk-7

*Indexes allow multiple samples to be sequenced at the same time

Flow cell

Cluster generation

plot of chunk unnamed-chunk-9

Sequencing by synthesis

plot of chunk unnamed-chunk-10

Data analysis (in the machine)

plot of chunk unnamed-chunk-11

What do we sequence?

[Not an exhaustive list]

  • Whole genome resequencing (pure DNA sample from a single individual)
  • Reduced representation genome data (RADseq, targeted SNPs, single individual)
  • Poolseq (multiple individuals)
  • Transcriptome (RNA sample from single tissue/individual)
  • Metabarcoding (PCR amplicon, multiple individuals/species)
  • Metagenomics (whole genomes, multiple individuals/species)

Whole genome resequencing

  • Sequencing reads mapped to an existing reference genome
  • Easy to identify SNPs relative to reference, and to other samples
  • Accuracy depends on the number of reads in the stack, termed depth or coverage

Whole genome resequencing

plot of chunk unnamed-chunk-13

Example: leopard population genomics

plot of chunk unnamed-chunk-14

Paijmans et al. 2021. Current Biology

Illumina summary

  • The current market leader
  • Massive output
  • But keep an eye on Ultima Genomics
  • High accuracy
  • Many applications (genome resequencing, RADseq, transcriptomes, metabarcoding)
  • Cheap (£9 per Gb)
  • Major limitation is the read length
  • Unsuitable for assembly of reference genomes

Third generation sequencing

PacBio

Single Molecule, Real-Time (SMRT) sequencing

plot of chunk unnamed-chunk-16

HiFi reads

plot of chunk unnamed-chunk-17

PacBio summary

  • Single molecule sequencing (no cluster generation)
  • Long reads (around 25 kb)
  • 75 Gb per SMRT Cell for Revio
  • Fantastic for assembly of reference genomes
  • Historically high sequencing error, solved by HiFi sequencing
  • Still more expensive than Illumina (~£3k per SMRT cell with library prep)
  • Price falling rapidly

What can we do with the data?

plot of chunk unnamed-chunk-18

Nowoshilow et al. 2018. Nature

Oxford Nanopore

How it works

plot of chunk unnamed-chunk-20

Field based sequencing

plot of chunk unnamed-chunk-21

Quick et al. 2016. Real-time, portable genome sequencing for Ebola surveillance. Nature

Oxford Nanopore summary

  • Variable output, up to Tb's with larger platforms
  • Long reads, record is 2.3 Mb!
  • Output 20-30 Gb (up to 50 Gb) per Minion flow cell
  • High error rate, currently 5-10 % but improving
  • Still more expensive than Illumina and PacBio (~£750 for Minion flow cell and library prep)
  • True portability and real time sequencing/analysis
  • But need to buy sequencer: Minion £4,650 inc. 5 flow cells

Recommended reading

Recommended reading

Recommended reading

Next time:

Gene trees and molecular dating