Arabidopsis - De novo assembly vs Reference Guided Assembly


Split into teams and choose one of these strategies:

  1. De novo assembly with trinity, following instructions from earlier

  2. Reference-guided assembly with trinity, following these instructions from Trinity.

    This will follow a slightly altered workflow:

    1. correct reads (same as before)

    2. run skewer (same as before)

    3. map with STAR (new step)

      Note: In addition to the aforementioned options, for GFF3 formatted annotations you need to use –sjdbGTFtagExonParentTranscript Parent.

    4. run trinity (altered command with bam input)

    5. run busco (same as before)

    6. run transrate (same as before)

I have notes for all of these commands if you need help.

Log your results here:

Get Data

Arabidopsis lyrata RNASeq Flower reads:


Arabidopsis thaliana Genome:


Arabidopsis thaliana Genome Annotation:


We still have a slight problem - the chromosome names in the fasta file don’t match the chromosome names in the annotation file. This is a surprisingly common problem and breaks any tool that needs both files. So lets fix the names.:

sed -i 's/>\([1-5]\)/>Chr\1/' TAIR10_chr_all.fas
sed -i 's/>mitochondria/>ChrM/' TAIR10_chr_all.fas
sed -i 's/>chloroplast/>ChrC/' TAIR10_chr_all.fas

BUSCO plant:

To get the plant database file:

gunzip plant_early_release.tar.gz

LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.
comments powered by Disqus