angus 6.0 documentation

Table of Contents¶

Workshop Code of Conduct
- Need Help?
- The Quick Version
- The Less Quick Version
Next-Gen Sequence Analysis Workshop (2017)
- The main workshop materials
Booting a Jetstream Computer Instance for your use!
- Request to log in to the Jetstream Portal
- Use “XSEDE”
- Fill in the username and password and click “Sign in”
- Select Projects and “Create New Project”
- Name the project for yourself, click “create”
- Select the newly created project
- Within the project, select “new”
- Find the “Ubuntu 16.04” image, click on it
- Name it something simple and select ‘m1.medium’
- Wait for it to become active
- Click on your new instance to get more information!
- Miscellany
- Suspend your instance
- Shutting down your instance
- Deleting your instance
Logging in to jetstream from your local terminal with a key file
- Concerning Keys
- Getting the Private Key
- Getting your instance IP address
- On MacOS/Linux
- On Windows
Downloading and Transferring Files
- Download file from the internet to your remote machine
- Transfer Files From Your Instance with Filezilla
Running command-line BLAST
- Updating the software on the machine
- Running BLAST
Running large and long command line jobs - using shmlast!
- Installing shmlast
- Download some data
- Run shmlast!
- Digression: What is shmlast doing?
- Looking at the output
- Some points for discussion
Visualizing BLAST score distributions in RStudio
- Installing and running RStudio on Jetstream
- Enter some R commands
- Some questions for discussion/points to make:
Command-line and RStudio
Short read quality and trimming
- Installing some software
- Data source
BWA and samtools and variant calling
- Getting started
- Download data
- Map data
- Visualize mapping
- Call variants!
- Look at the VCF file
- Look at the VCF file with bedtools.
- Extract reads with samtools.
- Discussion points / extra things to cover
An Introduction to R and Data Analysis
- Install RStudio Web server
- Install the tidyverse packages
- Learn!
RNAseq expression analysis
- Make sure R & RStudo are installed:
- Install edgeR
- Install salmon:
- Change to the appropriate directory:
- Download some data
- Download the yeast reference transcriptome:
- Index the yeast transcriptome:
- Run salmon on all the samples:
- Collect all of the sample counts
- Run edgeR (in R)
- Extra plotting in R
- Questions to ask/address
- More reading
Genome assembly - some basics
- Start up a Jetstream instance
- Install the MEGAHIT assembler
- Download an E. coli data set
- Run the assembler
- Looking at the assembly
- Measuring the assembly
- What are other metrics you could use to evaluate your assembly?
- End of day
Bacterial genome annotation using Prokka
- Installing Prokka
- Running Prokka
- Searching the annotated genes
- References
Introduction to automation
- Keeping a log of what you ran
- While it’s running...
- After it’s done running: put it in a shell script
- Passing variables to a script
- Long-running jobs more generally
- Scripts: ‘bash’ shell scripts, R scripts, and Python scripts
- Final thoughts
K-mers, k-mer specificity, and comparing samples with k-mer Jaccard distance.
- At the beginning
- K-mers!
- K-mers and assembly graphs
- Why k-mers, though? Why not just work with the full read sequences?
- Long k-mers are species specific
- Using k-mers to compare samples against each other
- Installing sourmash
- Generate a signature for Illumina reads
- Compare reads to assemblies
- Make and search a database quickly.
- Compare many signatures and build a tree.
- What’s in my metagenome?
- Final thoughts on sourmash
Exploratory RNAseq data analysis using RMarkdown
- Getting started
- Make sure R & RStudio are installed:
- Download the data for today’s tutorial
- Introduction to RMarkdown
- Markdown
- RMarkdown
- Creating a .Rmd File
- Anatomy of Rmarkdown file
- Chunk Labels
- Chunk Options
- Global Chunk Options
- Tables
- Citations and Bibliography
- Bibliography
- Placement
- Citation Styles
- Citations
- Publishing on RPubs
- Updating RPubs
- Exploratory data analysis with Yeast RNAseq data
Amazing Resources for learning Rmarkdown
Variant calling pipeline for a mammalian genome
- Getting started
- Download trimmed Fastq files
- Mapping
- Generate sorted BAM files
- Merge replicates (one library running on two lanes):
- Mark duplicates
- Prepare for the Genome Analysis Toolkit (GATK) analysis
- Recalibrate Bases
- Variant calling
- Filter Variants
Genome Wide Association analysis (GWAS)
install PLINK 1.9
install vcftools
Install R and RStudio
Make a working directory for the GWAS analysis
Download the sample VCF file and phenotype data
convert VCF into Plink readable format (map,ped) then Plink binary format (fam,bed,bim)
create list of alternative alleles
Run a simple association analysis
Create Manhattan plot
Meta-Analysis of Genome Wide Association Studies
- Download METAL
- Visualize Meta-Analysis Results
Jupyter Notebook and Python for data science.
- So, what is a Jupyter notebook?
- Cool! How do I install Jupyter.
- Ok, I’m set! What’s next?
- The Future of Jupyter
- Reading material
GitHub
- The Dashboard
- Making Repositories
- Cloning Repositories
- Commits
- Issues
- Forking and Pull Requests
- GitHub Education
Publicly available databases
- Finding data of interest
- NCBI
- Downloading data from NCBI
- Other Protein data
- Genomes & Genome Browsers
- Other databases full of many things
- BLAST
- Pathways
- Metagenomes
- Marine organism resources
- Human Genomes
- Tool Aggregators
- Blogs and other useful links
Assessing and Assembling Nanopore data
- Start a Jetstream instance and install software:
- Get Oxford Nanopore MinION data and convert it
- Assess the Data
- Assemble the data
- All-by-all comparisons
Annotate with prokka:
- References:
- Acknowledgements
- canu.report stats from Fundulus olivaceus reads
Exploratory RNAseq data analysis using RMarkdown
- Getting started
- Make sure R & RStudio are installed:
- Download the data for today’s tutorial
- Introduction to RMarkdown
- Markdown
- RMarkdown
- Creating a .Rmd File
- Anatomy of Rmarkdown file
- Chunk Labels
- Chunk Options
- Global Chunk Options
- Tables
- Citations and Bibliography
- Bibliography
- Placement
- Citation Styles
- Citations
- Publishing on RPubs
- Updating RPubs
- Exploratory data analysis with Yeast RNAseq data
Amazing Resources for learning Rmarkdown
Differential expression analysis with DESeq2
- Upgrade R to the very latest (3.4.x)
- Make sure you’re running RStudio
- Install RStudio Web server
- Install the DESeq2 prereqs
- Learn!
Analyzing ChIP-seq data
- What is ChIP-seq?
- Our goal
- Get some sample data
- Setting up the tools
- Let’s do mapping!
- Visualization
- Aligning the control sample
- Finding enriched areas using MACS
- Building a histogram from some ATAC-seq
- Adding a custom track
- References
De novo transcriptome assembly with Trinity
- Installation
- Check that your data is where it should be
- Quality trimming and light quality filtering
- Applying Digital Normalization
- Running the Actual Assembly!
- Evaluation
Annotating de novo transcriptomes with dammit
- Installation
- Database Preparation
- Annotation
- References
For instructors!
- Tutorial authoring guide
- E-mail lists