Bacterial genome annotation using Prokka

After you have de novo assembled your genome sequencing reads into contigs, it is useful to know what genomic features are on those contigs. The process of identifying and labelling those features is called genome annotation.

In this tutorial you will:

  1. Download and install Prokka
  2. Annotate a FASTA file of contigs
  3. Visualize the annotation using Artemis

The instructions below will work on a Ubuntu 14.04 Amazon instance.

Install Prokka dependencies

sudo apt-get -y update
sudo apt-get install bioperl libxml-simple-perl default-jre git curl

Install Prokka

git clone https://github.com/tseemann/prokka.git
export PATH=$PWD/prokka/bin:$PATH
prokka --setupdb
prokka --version

Get some contigs

We will download a genome from NCBI, decompress it, and rename it to something shorter:

curl ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000144955.1_ASM14495v1/GCA_000144955.1_ASM14495v1_genomic.fna.gz > contigs.fasta.gz
gunzip contigs.fasta

We can use the grep command to look at the FASTA sequence names in the file:

grep '>' contigs.fasta

How many sequences/contigs are in this file ?

We can use the wc (word count) command to get a rough idea of the number of basepairs in the FASTA file too by counting how many characters (bytes) are in the file, as it uses 1 charactert (A,G,T,C) per nucleotide.

wc -c contigs.fasta

How big is the genome in Mbp (mega base-pairs) ?

Why aren’t the wc result exactly correct?

Run Prokka on the contigs

Prokka is a pipeline script which coordinates a series of genome feature predictor tools and sequence similarity tools to annotate the genome sequence (contigs).

prokka --outdir anno --prefix prokka contigs.fasta
cat ./anno/prokka.txt

How many genes did Prokka find in the contigs?

Install Artemis

Artemis is a graphical Java program to browse annotated genomes. It is a a bit like IGV but sepcifically designed for bacteria. You will need to install this on your laptop computer instead of the Amazon instance.

Download: http://www.sanger.ac.uk/science/tools/artemis

Copy the annotation to your laptop

Copy the anno/prokka.gff file to your latop using the scp command

scp -i your_key.pem ubuntu@your-machine-name.amazon.com:/home/ubuntu/anno/prokka.gff ~/Downloads

Load the annotated genome

  • Start Artemis
  • Click OK
  • Go to File -> Open File Manager
  • Navigate to the ~/Downloads folder
  • Choose the prokka.gff file yoiu copied from Amazon

Browse the genome

You will be overwhelmed and/or confused at first, and possibly permanently. Here are some tips:

  • There are 3 panels: feature map (top), sequence (middle), feature list (bottom)
  • Click right-mouse-button on bottom panel and select Show products
  • Zooming is done via the verrtical scroll bars in the two top panels

LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.
comments powered by Disqus