Next-Gen Sequence Analysis Workshop (2014)¶
This is the schedule for the 2014 MSU NGS course.
This workshop has a Workshop Code of Conduct.
Download all of these materials or visit the GitHub repository.
Day | Schedule |
---|---|
Monday 8/4 |
|
Tuesday 8/5 |
|
Wed 8/6 |
|
Thursday 8/7 |
|
Friday 8/8 |
|
Saturday 8/9 |
|
Monday 8/11 |
|
Tuesday 8/12 |
|
Wed 8/13 |
|
Thursday 8/14 |
|
Friday 8/15 |
|
Dramatis personae¶
Instructors:
- Istvan Albert
- C Titus Brown
- Ian Dworkin
TAs:
- Amanda Charbonneau
- Elijah Lowe
- Will Pitchers
- Aswathy Sebastian
- Qingpeng Zhang
Lecturers:
- Chris Chandler
- Adina Chuang Howe
- Matt MacManes
- Martin Schilling
- Daniel Standage
- Meg Staton
He Who Drives Many Places:
- Cody Nicks
Papers and References¶
Books¶
Practical Computing for Biologists
This is a highly recommended book for people looking for a systematic presentation on shell scripting, programming, UNIX, etc.
RNAseq¶
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Trapnell et al., Nat. Protocols.
One paper that outlines a pipeline with the tophat, cufflinks, cuffdiffs and some associated R scripts.
Statistical design and analysis of RNA sequencing data., Auer and Doerge, Genetics, 2010.
A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nookaew et al., Nucleic Acids Res. 2012.
Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments Vijay et al., 2012.
Computational methods for transcriptome annotation and quantification using RNA-seq, Garber et al., Nat. Methods, 2011.
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments., Bullard et al., 2010.
A comparison of methods for differential expression analysis of RNA-seq data, Soneson and Delorenzi, BMC Bioinformatics, 2013.
Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples., Wagner et al., Theory Biosci, 2012. Also see this blog post explaining the paper in detail.
Computing and Data¶
- A Quick Guide to Organizing Computational Biology Projects, Noble, PLoS Comp Biology, 2009.
- Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results, Wicherts et al., PLoS One, 2011.
- Got replicability?, McCullough, Economics in Practice, 2007.
Also see this great pair of blog posts on organizing projects and research workflow.
Links¶
Resources¶
-
A high quality question & answer Web site.
-
A discussion and information site for next-generation sequencing.
-
A large number of open and reusable tutorials on the shell, programming, version control, etc.
Blogs¶
http://www.genomesunzipped.org/
Genomes Unzipped.
-
Titus’s blog.
-
Blue Collar Bioinformatics
-
Mass Genomics
-
Next Genetics
http://gettinggeneticsdone.blogspot.com/
Getting Genetics Done
http://omicsomics.blogspot.com/
Omics! Omics!
Complete table of contents¶
- Day 1 - Getting started with Amazon
- Day 2 – Running BLAST and other things at the command line
- Variant calling
- Assembling E. coli sequences with Velvet
- Interval Analysis and Visualization
- Running bedtools
- Understanding the SAM format
- R Tutorial for NGS2014
- What is R?
- Installing R
- What is R, really....
- How to close
R
- R Basics
- R as a calculator
- GETTING HELP in R
- Simple functions in base R
- Objects in R, classes of objects, mode of objects.
- Workspaces, and objects in them
- SCRIPT!
- Writing our own functions in R
- Using source() to load your functions
- Regular Sequences
- Indexing, extracting values and subsetting from the objects we have created
- Where to go from here?
- A few advanced topics... For your own amusement (not nescessary for now, but helps for more advanced R programming).
- Syntax style guide
- Random bits
- session info
- R indexing begins at 1 (not 0 like Python) Negative values of indexes in R
- mean something very different. for instance
- TOC
- Section 1: What is R; R at the console; quiting R
- Section 2: R basics; R as a calculator; assigning variables; vectorized computation in R
- Section 3: pre-built functions in R
- Section 4: Objects, classes, modes - Note: should I add attributes?
- Section 5: The R workspace; listing objects, removing objects (should I add attach and detach?)
- Section 6: Getting Help in R
- Section 7: Using A script editor for R
- Section 8: Writing simple functions in R
- Section 8b: Using source() to call a set of functions
- Section 9: Regular sequences in R
- Section 10: Extracting (and replacing), indexing & subsetting (using the index). Can also be used for sorting.
- ..... setting attributes of objects.... (names, class, dim )
- ..... environments (see ?environments)
- Control Flow and loops in R
- Control Flow
- ifelse()
- Other vectorized ways of control flow.
- Simple loops
- for loop
- So for the for loop we would do the following:
- More avoiding loops
- The step above creates a vector of n NA’s. They will be replaced sequentially with the random numbers as we generate them (using a function like the above one).
- Variant calling and exploration of polymorphisms
- A complete de novo assembly and annotation protocol for mRNASeq
- Assembly with SOAPdenovo-Trans
- Mapping and Counting
- Analyzing RNA-seq counts with DESeq
- RNA-seq: mapping to a reference genome with tophat and counting with HT-seq
- RNA-seq: mapping to a reference genome with BWA and counting with HTSeq
- Booting an Amazon AMI
- Updating the operating system
- Install software
- Preparing the reference
- Mapping
- Genome comparison and phylogeny
- Interactive visual genome comparison with Mauve
- Running a genome alignment
- Booting an Amazon AMI
- Logging in & updating the operating system
- Packages to install
- Getting the E. coli genome data
- What is the nearest reference genome?
- Ordering the assembly contigs against a nearby reference
- Making a phylogeny of many E. coli assemblies
- From tree file to figures
- Automation, scripts, git, and GitHub
- MG-RAST and its API
- So you want to get some sequencing data in NCBI?
- Looking at k-mer abundance distributions
- PacBio Tutorial
- RNASeq Transcript Mapping and Counting (BWA and HtSeq Flavor)
- Evaluating the quality of your short reads, and trimming them
- Amazon Web Services instructions
- Start up an EC2 instance
- Logging into your new instance “in the cloud” (Mac version)
- Logging into your new instance “in the cloud” (Windows version)
- Installing Dropbox on your EC2 machine
- Terminating (shutting down) your EC2 instance
- Storing data persistently with Amazon EBS Volumes
- Using Amazon EBS Snapshots for sharing and backing up data
- Transferring Files between your laptop and Amazon instance
- Uploading files to Amazon S3 to share
- Starting up a custom operating system
- Technical guide to the ANGUS course
- Instructor’s Guide to ANGUS Materials
- Workshop Code of Conduct
LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.
comments powered by Disqus