First, run Bowtie to produce a mapping file:
cd /mnt
time bowtie -p 2 drosophila_bowtie -q /data/drosophila/RAL357_1.fastq RAL357_1_bowtie.map
This will produce a file that shows the mismatches in the mapping – check it out by doing ‘head RAL357_1_bowtie.map’.
Next, get an updated copy of the ngs-scripts:
git clone https://github.com/ngs-docs/ngs-scripts.git /root/ngs-scripts
and run it on the map file:
python /root/ngs-scripts/bowtie/map-profile.py RAL357_1_bowtie.map > RAL357_1_bowtie.count
This will produce a .count file, which, again, you can check out with ‘head’.
(You can look at the script by doing ‘more /root/ngs-scripts/bowtie/map-profile.py’ or by viewing it online at github.)
Now, go to ‘https://‘ + YOUR MACHINE NAME, and click on “New notebook”. In the new notebook, paste:
counts = numpy.loadtxt('/mnt/RAL357_1_bowtie.count')
plot(counts[:,0], counts[:,1])
axis(ymax=50000, xmax=50)
and hit “shift-ENTER” to execute this code.
Note the spike around 12 – try using the ‘map-profile-N.py’ script (in the same place as the map-profile script) to plot the distribution of mismatches where N is in the read. Do the spikes align?