Learning objectives:
Boot an m1.medium Jetstream instance and log in.
We will be using salmon and edgeR. Salmon is installed through conda, but edgeR will require an additional script:
cd ~
conda install -y salmon
curl -L -O https://raw.githubusercontent.com/ngs-docs/angus/2018/scripts/install-edgeR.R
sudo Rscript --no-save install-edgeR.R
We will be using the same data as before (Schurch et al, 2016), so the following commands will create a new folder rnaseq
and link the data in:
mkdir -p rnaseq
cd rnaseq
ln -fs ~/data/*.fastq.gz .
ls
curl -O https://downloads.yeastgenome.org/sequence/S288C_reference/orf_dna/orf_coding.fasta.gz
salmon index --index yeast_orfs --type quasi --transcripts orf_coding.fasta.gz
for i in *.fastq.gz
do
salmon quant -i yeast_orfs --libType U -r $i -o $i.quant --seqBias --gcBias
done
Read up on libtype, here.
curl -L -O https://raw.githubusercontent.com/ngs-docs/2018-ggg201b/master/lab6-rnaseq/gather-counts.py
python2 gather-counts.py
curl -L -O https://raw.githubusercontent.com/ngs-docs/angus/2018/scripts/yeast.salmon.R
Rscript --no-save yeast.salmon.R
This will produce two plots, yeast-edgeR-MA-plot.pdf
and
yeast-edgeR-MDS.pdf
. You can view them by going to your RStudio server file viewer, changing to the directory rnaseq
, and then clicking on them. If you see an error “Popup Blocked”, then click the “Try again” button
The yeast-edgeR.csv
file contains the fold expression & significance information in a spreadsheet.
What is the point or value of the multidimensional scaling (MDS) plot?
Why does the MA-plot have that shape?
Related: Why can’t we just use fold expression to select the things we’re interested in?
Related: How do we pick the FDR (false discovery rate) threshold?
How do we know how many replicates (bio and/or technical) to do?
Related: what confounding factors are there for RNAseq analysis?
Related: what is our false positive/false negative rate?
What happens when you add new replicates?
“How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?” Schurch et al., 2016.
“Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference” Patro et al., 2016.
Also see seqanswers and biostars.