Navigation
index
next
|
angus 6.0 documentation
»
angus 6.0 documentation
Table Of Contents
Workshop Code of Conduct
Next-Gen Sequence Analysis Workshop (2017)
Booting a Jetstream Computer Instance for your use!
Logging in to jetstream from your local terminal with a key file
Downloading and Transferring Files
Running command-line BLAST
Running large and long command line jobs - using shmlast!
Visualizing BLAST score distributions in RStudio
Command-line and RStudio
Short read quality and trimming
BWA and samtools and variant calling
An Introduction to R and Data Analysis
RNAseq expression analysis
Genome assembly - some basics
Bacterial genome annotation using Prokka
Introduction to automation
K-mers, k-mer specificity, and comparing samples with k-mer Jaccard distance.
Exploratory RNAseq data analysis using RMarkdown
Amazing Resources for learning Rmarkdown
Variant calling pipeline for a mammalian genome
Genome Wide Association analysis (GWAS)
install PLINK 1.9
install vcftools
Install R and RStudio
Make a working directory for the GWAS analysis
Download the sample VCF file and phenotype data
convert VCF into Plink readable format (map,ped) then Plink binary format (fam,bed,bim)
create list of alternative alleles
Run a simple association analysis
Create Manhattan plot
Meta-Analysis of Genome Wide Association Studies
Jupyter Notebook and Python for data science.
GitHub
Publicly available databases
Assessing and Assembling Nanopore data
Annotate with prokka:
Exploratory RNAseq data analysis using RMarkdown
Amazing Resources for learning Rmarkdown
Differential expression analysis with DESeq2
Analyzing ChIP-seq data
De novo transcriptome assembly with Trinity
Annotating de novo transcriptomes with dammit
For instructors!
Docs
Table of Contents
Table of Contents
ΒΆ
Workshop Code of Conduct
Need Help?
The Quick Version
The Less Quick Version
Next-Gen Sequence Analysis Workshop (2017)
The main workshop materials
Booting a Jetstream Computer Instance for your use!
Request to log in to the Jetstream Portal
Use “XSEDE”
Fill in the username and password and click “Sign in”
Select Projects and “Create New Project”
Name the project for yourself, click “create”
Select the newly created project
Within the project, select “new”
Find the “Ubuntu 16.04” image, click on it
Name it something simple and select ‘m1.medium’
Wait for it to become active
Click on your new instance to get more information!
Miscellany
Suspend your instance
Shutting down your instance
Deleting your instance
Logging in to jetstream from your local terminal with a key file
Concerning Keys
Getting the Private Key
Getting your instance IP address
On MacOS/Linux
On Windows
Downloading and Transferring Files
Download file from the internet to your remote machine
Transfer Files From Your Instance with Filezilla
Running command-line BLAST
Updating the software on the machine
Running BLAST
Running large and long command line jobs - using shmlast!
Installing shmlast
Download some data
Run shmlast!
Digression: What is shmlast doing?
Looking at the output
Some points for discussion
Visualizing BLAST score distributions in RStudio
Installing and running RStudio on Jetstream
Enter some R commands
Some questions for discussion/points to make:
Command-line and RStudio
Short read quality and trimming
Installing some software
Data source
BWA and samtools and variant calling
Getting started
Download data
Map data
Visualize mapping
Call variants!
Look at the VCF file
Look at the VCF file with bedtools.
Extract reads with samtools.
Discussion points / extra things to cover
An Introduction to R and Data Analysis
Install RStudio Web server
Install the
tidyverse
packages
Learn!
RNAseq expression analysis
Make sure R & RStudo are installed:
Install edgeR
Install salmon:
Change to the appropriate directory:
Download some data
Download the yeast reference transcriptome:
Index the yeast transcriptome:
Run salmon on all the samples:
Collect all of the sample counts
Run edgeR (in R)
Extra plotting in R
Questions to ask/address
More reading
Genome assembly - some basics
Start up a Jetstream instance
Install the MEGAHIT assembler
Download an E. coli data set
Run the assembler
Looking at the assembly
Measuring the assembly
What are other metrics you could use to evaluate your assembly?
End of day
Bacterial genome annotation using Prokka
Installing Prokka
Running Prokka
Searching the annotated genes
References
Introduction to automation
Keeping a log of what you ran
While it’s running...
After it’s done running: put it in a shell script
Passing variables to a script
Long-running jobs more generally
Scripts: ‘bash’ shell scripts, R scripts, and Python scripts
Final thoughts
K-mers, k-mer specificity, and comparing samples with k-mer Jaccard distance.
At the beginning
K-mers!
K-mers and assembly graphs
Why k-mers, though? Why not just work with the full read sequences?
Long k-mers are species specific
Using k-mers to compare samples against each other
Installing sourmash
Generate a signature for Illumina reads
Compare reads to assemblies
Make and search a database quickly.
Compare many signatures and build a tree.
What’s in my metagenome?
Final thoughts on sourmash
Exploratory RNAseq data analysis using RMarkdown
Getting started
Make sure R & RStudio are installed:
Download the data for today’s tutorial
Introduction to RMarkdown
Markdown
RMarkdown
Creating a
.Rmd
File
Anatomy of Rmarkdown file
Chunk Labels
Chunk Options
Global Chunk Options
Tables
Citations and Bibliography
Bibliography
Placement
Citation Styles
Citations
Publishing on RPubs
Updating RPubs
Exploratory data analysis with Yeast RNAseq data
Amazing Resources for learning Rmarkdown
Variant calling pipeline for a mammalian genome
Getting started
Download trimmed Fastq files
Mapping
Generate sorted BAM files
Merge replicates (one library running on two lanes):
Mark duplicates
Prepare for the Genome Analysis Toolkit (GATK) analysis
Recalibrate Bases
Variant calling
Filter Variants
Genome Wide Association analysis (GWAS)
install PLINK 1.9
install vcftools
Install R and RStudio
Make a working directory for the GWAS analysis
Download the sample VCF file and phenotype data
convert VCF into Plink readable format (map,ped) then Plink binary format (fam,bed,bim)
create list of alternative alleles
Run a simple association analysis
Create Manhattan plot
Meta-Analysis of Genome Wide Association Studies
Download METAL
Visualize Meta-Analysis Results
Jupyter Notebook and Python for data science.
So, what
is
a Jupyter notebook?
Cool! How do I install Jupyter.
Ok, I’m set! What’s next?
The Future of Jupyter
Reading material
GitHub
The Dashboard
Making Repositories
Cloning Repositories
Commits
Issues
Forking and Pull Requests
GitHub Education
Publicly available databases
Finding data of interest
NCBI
Downloading data from NCBI
Other Protein data
Genomes & Genome Browsers
Other databases full of many things
BLAST
Pathways
Metagenomes
Marine organism resources
Human Genomes
Tool Aggregators
Blogs and other useful links
Assessing and Assembling Nanopore data
Start a Jetstream instance and install software:
Get Oxford Nanopore MinION data and convert it
Assess the Data
Assemble the data
All-by-all comparisons
Annotate with prokka:
References:
Acknowledgements
canu.report stats from Fundulus olivaceus reads
Exploratory RNAseq data analysis using RMarkdown
Getting started
Make sure R & RStudio are installed:
Download the data for today’s tutorial
Introduction to RMarkdown
Markdown
RMarkdown
Creating a
.Rmd
File
Anatomy of Rmarkdown file
Chunk Labels
Chunk Options
Global Chunk Options
Tables
Citations and Bibliography
Bibliography
Placement
Citation Styles
Citations
Publishing on RPubs
Updating RPubs
Exploratory data analysis with Yeast RNAseq data
Amazing Resources for learning Rmarkdown
Differential expression analysis with DESeq2
Upgrade R to the very latest (3.4.x)
Make sure you’re running RStudio
Install RStudio Web server
Install the
DESeq2
prereqs
Learn!
Analyzing ChIP-seq data
What is ChIP-seq?
Our goal
Get some sample data
Setting up the tools
Let’s do mapping!
Visualization
Aligning the control sample
Finding enriched areas using MACS
Building a histogram from some ATAC-seq
Adding a custom track
References
De novo transcriptome assembly with Trinity
Installation
Check that your data is where it should be
Quality trimming and light quality filtering
Applying Digital Normalization
Running the Actual Assembly!
Evaluation
Annotating de novo transcriptomes with dammit
Installation
Database Preparation
Annotation
References
For instructors!
Tutorial authoring guide
E-mail lists
Navigation
index
next
|
angus 6.0 documentation
»