Trinity and Transcriptome Evaluation¶
Trinity: http://trinityrnaseq.github.io/
Transrate: http://hibberdlab.com/transrate/installation.html
Step 1: Launch and AMI. For this exercise, we will use a c4.4xlarge with a 500Gb EBS volume. Remember to change the permission of your key code chmod 400 ~/Downloads/????.pem
(change ????.pem to whatever you named it)
ssh -i ~/Downloads/?????.pem ubuntu@ec2-???-???-???-???.compute-1.amazonaws.com
Update Software
sudo apt-get update
Install updates
sudo apt-get -y upgrade
Install other software Note that you can install a large amount of software from the Ubuntu “App Store” using a single command. Some of this software we will not use for this tutorial, but...
sudo apt-get -y install build-essential tmux git gcc make g++ python-dev unzip default-jre libcurl4-openssl-dev zlib1g-dev python-pip fastqc samtools bowtie ncbi-blast+ hmmer emboss
Mount hard drive The EBS volume we asked for is not automatically mounted - we need to do that.
sudo mkfs -t ext4 /dev/xvdb
sudo mount /dev/xvdb /mnt
sudo chown -R ubuntu:ubuntu /mnt
df -h
INSTALL TRANSRATE
cd $HOME
curl -LO https://bintray.com/artifact/download/blahah/generic/transrate-1.0.1-linux-x86_64.tar.gz
tar -zxf transrate-1.0.1-linux-x86_64.tar.gz
cd transrate-1.0.1-linux-x86_64
PATH=$PATH:$(pwd)
INSTALL Augustus
cd $HOME
curl -O http://augustus.gobics.de/binaries/augustus.2.5.5.tar.gz
tar -zxf augustus.2.5.5.tar.gz
cd augustus.2.5.5/
make
PATH=$PATH:$(pwd)/bin
export AUGUSTUS_CONFIG_PATH=$(pwd)/config
INSTALL BUSCO:
cd $HOME
curl -O http://busco.ezlab.org/files/BUSCO_v1.1b1.tar.gz
tar -zxf BUSCO_v1.1b1.tar.gz
cd BUSCO_v1.1b1/
PATH=$PATH:$(pwd)
Install Trinity
git clone https://github.com/trinityrnaseq/trinityrnaseq.git
cd trinityrnaseq
make -j4
PATH=$PATH:$(pwd)
Install Trimmomatic
cd $HOME
wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.33.zip
unzip Trimmomatic-0.33.zip
cd Trimmomatic-0.33
chmod +x trimmomatic-0.33.jar
Download data: For this lab, we’ll be using
mkdir /mnt/reads
cd /mnt/reads/
curl -LO https://www.dropbox.com/s/4o6eduzcw11gz53/subsamp.2.fq.gz
curl -LO https://www.dropbox.com/s/i4wst01yz10i9x9/subsamp.1.fq.gz
Do 2 different trimming levels – Phred=2 and Phred=30: One of these is very harsh, the other is probably more appropriate. Which one is which?
Look at the output from this command, which should start with Input Read Pairs:
mkdir /mnt/trimming
cd /mnt/trimming
#paste the below lines together as 1 command
java -Xmx10g -jar $HOME/Trimmomatic-0.33/trimmomatic-0.33.jar PE \
-threads 8 -baseout subsamp.Phred2.fq \
/mnt/reads/subsamp.1.fq.gz \
/mnt/reads/subsamp.2.fq.gz \
ILLUMINACLIP:$HOME/Trimmomatic-0.33/adapters/TruSeq3-PE.fa:2:30:10 \
SLIDINGWINDOW:4:2 \
LEADING:2 \
TRAILING:2 \
MINLEN:25
#and
java -Xmx10g -jar $HOME/Trimmomatic-0.33/trimmomatic-0.33.jar PE \
-threads 8 -baseout subsamp.Phred30.fq \
/mnt/reads/subsamp.1.fq.gz \
/mnt/reads/subsamp.2.fq.gz \
ILLUMINACLIP:$HOME/Trimmomatic-0.33/adapters/TruSeq3-PE.fa:2:30:10 \
SLIDINGWINDOW:4:30 \
LEADING:30 \
TRAILING:30 \
MINLEN:25
Run Trinity
mkdir /mnt/assembly
cd /mnt/assembly
#Open tumx window
tmux new -s trinity
#Phred30 dataset
Trinity --seqType fq --max_memory 10G --left /mnt/trimming/subsamp.Phred30_1P.fq \
--right /mnt/trimming/subsamp.Phred30_2P.fq --CPU 16
#Phred2 dataset
Trinity --seqType fq --max_memory 10G --left /mnt/trimming/subsamp.Phred2_1P.fq \
--right /mnt/trimming/subsamp.Phred2_2P.fq --CPU 16
Fix Trinity Headers
sed -i 's_|_-_g' /mnt/assembly/trinity_out_dir/Trinity.fasta
Control-b d #to exit tmux
Run BUSCO for assemblies: There are Eukaryote, Metazoa, Arthropod, Vertebrate, Plant references for use with other genomes.
mkdir /mnt/busco
cd /mnt/busco
#Download busco database
tmux new -s busco
curl -LO http://busco.ezlab.org/files/vertebrata_buscos.tar.gz
tar -zxf vertebrata_buscos.tar.gz
python3 /home/ubuntu/BUSCO_v1.1b1/BUSCO_v1.1b1.py \
-m trans -in /mnt/assembly/trinity_out_dir/Trinity.fasta \
--cpu 16 -l vertebrata -o trin.assemblty
less run*/short*
Control-b d #to exit tmux
Run Transrate
tmux new -s transrate
mkdir /mnt/transrate
cd /mnt/transrate
$HOME/transrate-1.0.1-linux-x86_64/transrate -a /mnt/assembly/trinity_out_dir/Trinity.fasta -t 16 \
--left /mnt/trimming/subsamp.Phred30_1P.fq \
--right /mnt/trimming/subsamp.Phred30_2P.fq
Control-b d #to exit tmux
CHALLENGE: Talk to me for details...
What Genus/Species did I sequence? What tissue?
Terminate your instance¶
LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.
comments powered by Disqus