Learning objectives:
Boot an m1.medium Jetstream instance and log in.
We will be using salmon and edgeR. Salmon is installed through conda, but edgeR will require an additional script:
cd ~
conda install -y salmon
curl -L -O https://raw.githubusercontent.com/ngs-docs/angus/2018/scripts/install-edgeR.R
sudo Rscript --no-save install-edgeR.R
We will be using the same data as before (Schurch et al, 2016), so the following commands will create a new folder rnaseq and link the data in:
mkdir -p rnaseq
cd rnaseq
ln -fs ~/data/*.fastq.gz .
ls
curl -O https://downloads.yeastgenome.org/sequence/S288C_reference/orf_dna/orf_coding.fasta.gz
salmon index --index yeast_orfs --type quasi --transcripts orf_coding.fasta.gz
for i in *.fastq.gz
do
salmon quant -i yeast_orfs --libType U -r $i -o $i.quant --seqBias --gcBias
done
Read up on libtype, here.
curl -L -O https://raw.githubusercontent.com/ngs-docs/2018-ggg201b/master/lab6-rnaseq/gather-counts.py
python2 gather-counts.py
curl -L -O https://raw.githubusercontent.com/ngs-docs/angus/2018/scripts/yeast.salmon.R
Rscript --no-save yeast.salmon.R
This will produce two plots, yeast-edgeR-MA-plot.pdf and
yeast-edgeR-MDS.pdf. You can view them by going to your RStudio server file viewer, changing to the directory rnaseq, and then clicking on them. If you see an error “Popup Blocked”, then click the “Try again” button
The yeast-edgeR.csv file contains the fold expression & significance information in a spreadsheet.
What is the point or value of the multidimensional scaling (MDS) plot?
Why does the MA-plot have that shape?
Related: Why can’t we just use fold expression to select the things we’re interested in?
Related: How do we pick the FDR (false discovery rate) threshold?
How do we know how many replicates (bio and/or technical) to do?
Related: what confounding factors are there for RNAseq analysis?
Related: what is our false positive/false negative rate?
What happens when you add new replicates?
“How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?” Schurch et al., 2016.
“Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference” Patro et al., 2016.
Also see seqanswers and biostars.