| Date | Time | Activity |
|---|---|---|
| Mon 03/11/2024 | 10:00 to 16:00 | Unix-like systems, bash, connecting to SCWales |
| Tue 04/11/2024 | 10:00 to 16:00 | More bash, SCWales, using SCWales and slurm |
| Wed 05/11/2024 | 10:00 to 16:00 | Illumina data, BEARCAVE, data processing |
| Thu 06/11/2024 | 10:00 to 16:00 | ANGSD, covariance and distance matrices, heterozygosity, intro to R |
| Fri 07/11/2024 | 10:00 to 16:00 | Maps, PCA’s, NJ trees, Manhattan plots and Rmarkdown |
https://drabarlow.github.io/bioinformatics_bootcamp/
https://drabarlow.github.io/bioinformatics_bootcamp/bootcamp_worksheet_2025.html
https://github.com/drabarlow/bioinformatics_bootcamp
bash and R.bash and R.sudo)Mac OS
Linux
R typically via Rstudio)bashDOS and Unix not yet possible| Windows | Mac | Linux | |
|---|---|---|---|
| standard PC functions | yes | yes | yes |
| cost | yes | yes | free |
| hardware choice | yes | no | yes |
| bioinformatics | no | yes | yes |
| HPC | no | no | yes |
| open source | no | no | yes |
| active community | no | no | yes |
| games | yes | no | no |
sh), developed by Steven Bourne in 1979bash)bash or something like it
ssh)scp or sftpslurm job schedulermodulesConnecting to the jump host (with MFA)
ssh you25usr@ssh.bangor.ac.uk
Note: most UNIX systems do not show anything when you’re typing your password!
If successful, connecting to Hawk
ssh b.you25usr@hawklogin.cf.ac.uk
Raise your hand if you are having issues đŸ™Œ
/ [root] is uppermost level of filesystem/working directory/home/b.xlb21brx/ /scratch/b.xlb21brx/
slurm| Platform | Million reads | Read length | Gb data | Genome coverage |
|---|---|---|---|---|
| iSeq | 4 | 2 x 150 bp | 1.2 | 0.4x |
| MiniSeq | 25 | 2 x 150 bp | 7.5 | 2.5x |
| MiSeq | 100 | 2 x 500 bp | 30 | 10x |
| Nextseq 550 | 400 | 2 x 150 bp | 120 | 40x |
| NextSeq 1000/2000 | 1800 | 2 x 300 bp | 540 | 180x |
| NovaSeq 6000 | 20000 | 2 x 250 bp | 3000 | 1000x |
| NovaSeq X | 52000 | 2 x 150 bp | 8000 | 2667x |
*Indexes allow multiple samples to be sequenced at the same time
[Not an exhaustive list]
Short reads from a single individual can be mapped to a reference genome assembly
+-------------+---------------+
| Sample | Locality |
+-------------+---------------+
| adder01-04 | Leeds |
| adder05-08 | Wensleydale |
| adder09-12 | Manchester |
| adder13-16 | Caerphilly |
| adder17-20 | Gouda |
| adder21-24 | Stockholm |
| adder25-27 | Cheddar |
| adder28-31 | Huddersfield |
| adder32-35 | Sheffield |
| adder36-39 | Leicester |
| adder40-43 | Nottingham |
| adder44-47 | Stilton |
| adderout | outgroup |
+-------------+---------------+
@A00551:758:HKTVJDSX7:4:1101:3595:6872 1:N:0:CCTGAGATGT+GGTCTAGTTG CTGAATATGGATTTTAATTGAATCCTAAGATATTATAGCATCTTTCACTCCCTGTCCTGTGCATGTCAGA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
45 ka cave bear (Ursus kudarensis)
Cutadapt
FLASHExpected output in /BEARCAVE2/trimdata/*processing/
*_mappable.fastq.gz [big file]*_mappable_R1.fastq.gz [big file]*_mappable_R2.fastq.gz [big file]*_trim_report.log and merge report *_merge_report.logbwa mem algorithmsamtoolssamtoolsExpected output in /BEARCAVE2/mapped*/*processing/
*.bam [big file]*.bam.bai*_mapping.logplink, admixtools, etc)Allele1|Allele2|prob11|prob12|prob22 |||| A|T|0.05|0.9|0.05
NGSadmixPCangsdNGSrelaterealSFSCovariance matrix
Distance matrix
Heterozygosity
realSFSRR was born from S in the Bell labs in 1992RstudioR from a heretic and a pragmatist(guess who is who)
Most people disagree (in some cases strongly)
R is really good!R ≠Rstudiotidyverse is not the way to teach R to beginnerstidyggplot2 code is restrictiveR works“Everything is an object; everything that happens is a function”
R worksObjects
<-Functions
function()?function, example(function)Vector
c()my_vector[]Matrix
my_matrix[row, column]Dataframe
$, which can then be indexed like vectorsList
$RstudioRtidyverseR markdowngit) and other development toolsRstudioeigen()ape librarySee you next year :)