Dr Axel Barlow
email: a.barlow.@bangor.ac.uk
Date | Time | Activity |
---|---|---|
Tue 05/11/2024 | 10:00 to 16:00 | Unix-like systems, bash, SCWales, slurm |
Wed 06/11/2024 | 10:00 to 16:00 | Illumina data, BEARCAVE, data processing |
Thu 07/11/2024 | 10:00 to 16:00 | ANGSD, covariance and distance matrices, heterozygosity |
Fri 08/11/2024 | 10:00 to 16:00 | Intro to R, PCA, NJ trees, Manhattan plots |
https://drabarlow.github.io/bioinformatics_bootcamp/
https://drabarlow.github.io/bioinformatics_bootcamp/bootcamp_worksheet.html
https://github.com/drabarlow/bioinformatics_bootcamp
Interests
Bioinformatics experience
bash
and R
.sudo
)Mac OS
Linux
R
typically via Rstudio
)bash
DOS
and Unix
not yet possibleWindows | Mac | Linux | |
---|---|---|---|
standard PC functions | yes | yes | yes |
cost | yes | yes | free |
hardware choice | yes | no | yes |
bioinformatics | no | yes | yes |
HPC | no | no | yes |
open source | no | no | yes |
active community | no | no | yes |
games | yes | no | no |
sh
), developed by Steven Bourne in 1979bash
)bash
or something like it
Slurm
job schedulermodules
/
[root] is uppermost level of filesystem/
DOS
)working directory
/home/b.xlb21brx/
/scratch/b.xlb21brx/
Slurm
Platform | Million reads | Read length | Gb data | Genome coverage |
---|---|---|---|---|
MiniSeq | 25 | 2 x 150 bp | 7.5 | 2 |
MiSeq | 25 | 2 x 300 bp | 15 | 4 |
NextSeq | 400 | 2 x 150 bp | 120 | 33 |
HiSeq X | 6000 | 2 x 150 bp | 1800 | 500 |
NovaSeq | 20000 | 2 x 150 bp | 6000 | 1667 |
*Indexes allow multiple samples to be sequenced at the same time
[Not an exhaustive list]
Short reads from a single individual can be mapped to a reference genome assembly
sample | locality |
---|---|
adder01-04 | Dublin |
adder05-08 | Belfast |
adder09-12 | Cork |
adder13-16 | Limerick |
adder17-20 | Galway |
adder21-24 | Dundalk |
adder25-27 | Bray |
adder28 | outgroup |
@A00551:758:HKTVJDSX7:4:1101:3595:6872 1:N:0:CCTGAGATGT+GGTCTAGTTG
CTGAATATGGATTTTAATTGAATCCTAAGATATTATAGCATCTTTCACTCCCTGTCCTGTGCATGTCAGA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
45 ka cave bear (Ursus kudarensis)
Cutadapt
FLASH
Expected output in /BEARCAVE2/trimdata/*processing/
*_mappable.fastq.gz
[big file]*_mappable_R1.fastq.gz
[big file]*_mappable_R2.fastq.gz
[big file]*_trim_report.log
and merge report *_merge_report.log
bwa
mem algorithmsamtools
samtools
Expected output in /BEARCAVE2/mapped*/*processing/
*.bam
[big file]*.bam.bai
*_mapping.log
angsd
plink
, admixtools
, etc)Allele1 | Allele2 | prob11 | prob12 | prob22 |
---|---|---|---|---|
A | T | 0.05 | 0.9 | 0.05 |
NGSadmix
PCangsd
NGSrelate
realSFS
Covariance matrix
Distance matrix
Heterozygosity
realSFS
R
R
?Rstudio
R
worksR
?Suppose you're a survey company. To carry out your survey you need all the people seated in a classroom, which you have to build. You're not sure how many, so you build an ordinary classroom, with 5 rows of 6 desks for 30 people, after 30 people file in you notice there's a 31st. You build a second 30-person classroom right next to the first, and now you can accept 60 people, but then you notice a 61st. So you ask them to wait, and you build two more classrooms, so now you've got a nice 2x2 grid of 30-person classrooms, but the people keep coming and soon enough the 121st person shows up and there's not enough room. So you build a big 5-story building next door with 50-person classrooms, 5 on each floor, for a total of 50 x 5 x 5 = 1,250 desks, and you have the first 120 people file out of the old rooms into the new building, and you hire some wreckers to demolish the old classrooms and recycle some of the materials, and the people keep coming. And when you're all done with all this, the only "survey question" you're going to ask is "How many rows are there?"
Meanwhile, Bob's discount survey company, who can only tell you how many people he surveyed, is down there on the streetcorner, and the people are filing by, and Bob is jotting down tally marks on his clipboards, and the people, once surveyed, are walking away and going about their business, and Bob isn't wasting time and money building any classrooms at all.
R
?An abridged version of https://stackoverflow.com/questions/30948366/why-is-unix-terminal-faster-than-r
Rstudio
tidyverse
R markdown
Tidyverse
ggplot2
tibble
tidyr
readr
dplyr
stringr
purr
forcats
R
from a hereticMost people disagree (in some cases strongly)
Rstudio
is terrible (except for R markdown
)R
is really goodggplot2
code is hellishly complextidyverse
is not the way to teach R
to beginnersObjects
<-
Functions
function()
?function
Vector
c()
my_vector[]
Matrix
my_matrix[row, column]
Dataframe
$
, which can then be indexed like vectorsList
$
eigen()
ape
librarySee you next year :)