=============================================== Plotting the distribution of mapping mismatches =============================================== Mapping and calculating mismatch positions ------------------------------------------ First, run Bowtie to produce a mapping file:: cd /mnt time bowtie -p 2 drosophila_bowtie -q /data/drosophila/RAL357_1.fastq RAL357_1_bowtie.map This will produce a file that shows the mismatches in the mapping -- check it out by doing 'head RAL357_1_bowtie.map'. Next, get an updated copy of the ngs-scripts:: git clone https://github.com/ngs-docs/ngs-scripts.git /root/ngs-scripts and run it on the map file:: python /root/ngs-scripts/bowtie/map-profile.py RAL357_1_bowtie.map > RAL357_1_bowtie.count This will produce a .count file, which, again, you can check out with 'head'. (You can look at the script by doing 'more /root/ngs-scripts/bowtie/map-profile.py' or by viewing it online `at github `__.) Plotting -------- Now, go to 'https://' + YOUR MACHINE NAME, and click on "New notebook". In the new notebook, paste:: counts = numpy.loadtxt('/mnt/RAL357_1_bowtie.count') plot(counts[:,0], counts[:,1]) axis(ymax=50000, xmax=50) and hit "shift-ENTER" to execute this code. Exercise -------- Note the spike around 12 -- try using the 'map-profile-N.py' script (in the same place as the map-profile script) to plot the distribution of mismatches where N is in the read. Do the spikes align?