The goal of this tutorial is to run you through (part of) a real mRNAseq analysis protocol, using a small data set that will complete quickly.
Prepare for this tutorial by working through Start up an EC2 instance, but follow the instructions to start up Starting up a custom operating system instead; use AMI ami-7607d01e.
Copy and paste the following two commands
apt-get update
apt-get -y install screen git curl gcc make g++ python-dev unzip \
default-jre pkg-config libncurses5-dev r-base-core \
r-cran-gplots python-matplotlib sysstat samtools python-pip
If you started up a custom operating system, then this should finish quickly; if instead you started up Ubuntu 14.04 blank, then this will take a minute or two.
The mRNAseq protocol works with the data set that you put in ‘/data’. Here, we will download a small data set (a subset of the data from this paper, data from embryonic Nematostella>`__), and put it in /data
mkdir /mnt/data
ln -fs /mnt/data /data
cd /data
curl -O http://athyra.idyll.org/~t/mrnaseq-subset.tar
tar xvf mrnaseq-subset.tar
Check it out:
ls
You’ll see a bunch of different files – these are the kinds of files you’ll get from your sequencing facility.
We’re going to work with a special version of the protocols today, one that we adapted specifically for this course.
In general, you should use the latest version, which will be at https://khmer-protocols.readthedocs.org/.
For today, we’ll be using http://khmer-protocols.readthedocs.org/en/ngs2014/ instead.
Work through the following:
To connect to your BLAST Web server, you need to enable inbound traffic on your computer. Briefly:
(should be ‘launch-wizard-‘ something). On the left panel, under Network and Security, go into Security Groups. Select your security group, and select Inbound, and Edit. Click “Add rule”, and change “Custom TCP rule” to “http”. Then click “save”. Done!
You can try pasting this into your BLAST server:
MDRSVNVIQCAAAPTRIQCEEINAKLMLGVGVFGLCMNIVLAVIMSFGAAHPHSHGMLSSVEFDHDVDYH
SRDNHHGHSHLHHEHQHRDGCSHSHGNGGADMQRLECASPESEMMEEVVVETTSSNAESICSHERGSQSM
NLRAAVLHVFGDCLQSLGVVLAACVIWAGNNSSVGVPSSAHSYYNLADPLLSVLFGVITVYTTLNLFKEV
IVILLEQVPPAVEYTVARDALLSVEKVQAVDDLHIWAVGPGFSVLSAHLCTNGCATTSEANAVVEDAECR
CRQLGIVHTTIQLKHAADVRNTGA