==========================================
Next-Gen Sequence Analysis Workshop (2013)
==========================================
.. Warning:: These documents are not maintained and their instructions may be
out of date. However the GED Lab does maintain the `khmer protocols
`__ which may cover similar
topics. See also the `installation instructions for the current version
of the khmer project `__.
This is the schedule for the `2013 MSU NGS course `__, which ran from June 10th to June 20th, 2013. If you're interested in this course in 2014, please see `the 2014 announcement `__.
=============== =============================================================
Day Schedule
=============== =============================================================
Monday 6/10 * 1:30pm tutorial: :doc:`day1` (Adina)
* 7pm: research presentations
Tuesday 6/11 * :doc:`day2`
* 9:30am: lecture, `Welcome! <_static/lecture1-welcome.pptx.pdf>`__ (Titus)
* 10:45am: tutorial, :doc:`running-command-line-blast`
* 1:15pm: tutorial, :doc:`short-read-quality-evaluation`
* Evening: *firepit social*
Wednesday 6/12 * :doc:`day3`
* 9:30am: lecture, `Mapping. <_static/lecture2-mapping.pptx.pdf>`__ (Titus)
* 10:45am: tutorial, :doc:`bwa-tutorial` (Likit)
* 1:15pm: tutorial cont'd; also, :doc:`plot-mapping-mismatches`
* 8pm: `General bioinformatics overview <_static/challenges-bioinfo-ngs.pdf>`__ (Istvan)
Thursday 6/13 * 9:15am: lecture, `Assembly. <_static/lecture3-assembly.pptx.pdf>`__ (Titus)
* 10:45am: tutorial, :doc:`assembling-ecoli-with-velvet`
* 1:15pm: tutorial, cont'd, evaluating assemblies.
* Evening: *brew pub in Kalamazoo.*
Friday 6/14 * 9:15am: lecture, `Intervals <_static/interval-datatypes.pdf>`__ (Istvan)
* 10:45am: tutorial, :doc:`interval-analysis-tutorial` (Istvan)
* 1:15pm: lecture/tutorial, `Statistics <_static/stats-lecture.pptx.pdf>`__ (Ian)
* 8 Tutorial: :doc:`teach-me-intervals` (Istvan)
Saturday 6/15 * 9:15am: lecture, `Pipelines and Automation <_static/lecture5-pipelines.pptx.pdf>`__ (Titus)
* 10:45am: tutorial: Shell scripts and pipelines.
* 1:15pm: tutorial, R (`text `__ | `code `__) (Josh)
* Evening: BBQ/dinner.
Sunday 6/16 Day of rest. Brunch in the morning;
takeout dinner in evening.
Monday 6/17 * 9:15am: `lecture, mRNAseq I <_static/NGS2013_RNAseq_1.pptx.pdf>`__ (Ian Dworkin)
* Tutorials 10:45am, 1:15pm as usual; topics:
1. :doc:`rnaseq_bwa`
2. :doc:`rnaseq_tophat`
3. :doc:`DGE_analysis_with_MISO_cuffdiff`
4. :doc:`transcriptome_de_novo_assembly`
* 7:30pm: :doc:`git-koans` (Titus)
* 8-9pm: look busy
* 9pm: firepit
Tuesday 6/18 * 9:15am: `lecture, mRNAseq II <_static/NGS2013_RNAseq_2.pptx.pdf>`__ (Ian Dworkin)
* Impromptu tutorial: :doc:`seqtk_tools`
* Tutorials 10:45am, 1:15pm as usual; mRNAseq, continued.
* :doc:`rnaseq_bwa_counting` (`edgeR code <_static/analyze_edgeR_bwa_transcriptome.R>`__)
* 8:30pm: `Single-cell mRNAseq <_static/MSU_linker_2013.06.17.pdf>`__ (Erich Schwarz)
* 9:15pm: `PacBio intro <_static/PacBio_overview_angus_2013.pdf>`__ (Tristan De Buysscher)
* 8-11pm: look busy
Wednesday 6/19 * 9:15am: `Advanced assembly <_static/rayan-2013-june-18-msu.pdf>`__ (Rayan)
* 10:45am: :doc:`kmer-abundance-and-diginorm` and `presentation <_static/titus-kmers.pptx.pdf>`__ (Titus)
* 1:15pm: `Computing without Amazon <_static/crusoe-amazon-free-analysis.pdf>`__ (Michael Crusoe)
* 2pm: :doc:`snp_tutorial` (Likit)
* 3-5pm: look busy.
* 7:30pm: BBQ, gin, and firepit social
Thursday 6/20 * 9:15am: lecture, `Genome assembly and analysis <_static/NGS_Acey_2013.06.20.01.pdf>`__ (Erich Schwarz)
* 10:45am: Genome assembly treasure hunt.
* 1:15pm: :doc:`ucsc-genome-browser` (Likit)
* 5pm: Joe Graves, Genome-Wide Convergence with Repeated Evolution in Drosophila melanogaster.
* 8:30pm: Sequencing technology Q&A (Nick Beckloff)
* 8-11pm: look busy
Friday 6/21 * 9:15am: meet at classroom with bags; `final lecture <_static/final-lecture.pptx.pdf>`__.
* 10am: course post-mortem/analysis
* 11am: lunch at Frona's Pantry (optional!)
=============== =============================================================
Cheat sheet for starting up an EC2 instance
===========================================
- use Amazon Machine Instance "ami-c17ec8a8";
- m1.large or larger;
- make sure you are in the US East zone (Virgina) -- see upper right;
- make sure the security group you use has SSH and HTTPS enabled for inbound;
Dramatis personae
=================
Instructors:
* Istvan Albert
* C Titus Brown
* Ian Dworkin
TAs:
* Amanda Charbonneau
* Michael Crusoe
* Tristan De Buysscher
* Joshua Herr
* Elijah Lowe
* Likit Preeyanon
Lecturers:
* Nick Beckloff
* Rayan Chaikhi
* Chris Chandler
* Adina Chuang Howe
* Erich Schwarz
Papers and References
=====================
Books
-----
* `Practical Computing for Biologists `__
This is a highly recommended book for people looking for a systematic
presentation on shell scripting, programming, UNIX, etc.
RNAseq
------
* `Differential gene and transcript expression analysis of RNA-seq
experiments with TopHat and Cufflinks
`__, Trapnell et al.,
Nat. Protocols.
One paper that outlines a pipeline with the tophat, cufflinks, cuffdiffs and
some associated R scripts.
* `Statistical design and analysis of RNA sequencing
data. `__, Auer and
Doerge, Genetics, 2010.
* `A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. `__ Nookaew et al., Nucleic Acids Res. 2012.
* `Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments `__ Vijay et al., 2012.
* `Computational methods for transcriptome annotation and quantification using RNA-seq `__, Garber et al., Nat. Methods, 2011.
* `Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. `__, Bullard et al., 2010.
* `A comparison of methods for differential expression analysis of RNA-seq data `__, Soneson and Delorenzi, BMC Bioinformatics, 2013.
* `Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. `__, Wagner et al., Theory Biosci, 2012. Also see `this blog post `__ explaining the paper in detail.
Computing and Data
------------------
* `A Quick Guide to Organizing Computational Biology Projects `__, Noble, PLoS Comp Biology, 2009.
* `Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results `__, Wicherts et al., PLoS One, 2011.
* `Got replicability? `__, McCullough, Economics in Practice, 2007.
Also see this great pair of blog posts on `organizing projects `__ and `research workflow `__.
Links
=====
Humor
-----
* `Data Sharing and Management Snafu in 3 Short Acts `__
Resources
---------
* `Biostar `__
A high quality question & answer Web site.
* `SEQanswers `__
A discussion and information site for next-generation sequencing.
* `Software Carpentry lessons `__
A large number of open and reusable tutorials on the shell, programming,
version control, etc.
Blogs
-----
* http://www.genomesunzipped.org/
Genomes Unzipped.
* http://ivory.idyll.org/blog/
Titus's blog.
* http://bcbio.wordpress.com/
Blue Collar Bioinformatics
* http://massgenomics.org/
Mass Genomics
* http://blog.nextgenetics.net/
Next Genetics
* http://gettinggeneticsdone.blogspot.com/
Getting Genetics Done
* http://omicsomics.blogspot.com/
Omics! Omics!