| Author: | Mark Robinson |
|---|---|
| Date: | June 10, 2010 |
In this tutorial we are going to work through the process of taking a set of Illumina reads ffrom a ChIP-seq experiment, aligning them to a reference genome and then identifying peaks of enrichment for ChIP specific reads. We’ll be using Myxococcus xanthus as our reference genome. Our ChIP-seq data comes from experiments performed with antibodies for a transcription factor known to be developmentally important, MrpC.
Download the data files from the private data collection and gunzip them:
%% mkdir /mnt/myxo
%% cd /mnt/myxo
<curl commands here>
%% gunzip *.gz
You should now have three files:
% ls
M2_pf.fastq MP1_pf.fastq myxo-genome.fa
Install bowtie as in Mapping with bowtie.
Build your index for the M.xanthus genome:
%% cd /mnt/myxo
%% bowtie-build myxo-genome.fa myxo
You should now be ready to run bowtie for each of the two sets of reads. We’ll simply use the default bowtie parameters here (-n 2 -l 28 -e 70) which allows a maximum of 2 mismatches in the first 28 bases of the read and requires that the sum of the Phred values for the mismatches positions not exceed 70. These are parameters that you might want to modify if you find many of your reads are not alignable, feel free to play! The ‘-m’ switch being set to one ensures that only uniquely alignable reads will be reported; the default setting would report one of potentially many, with no guarantee that the reported alignment was the best.
%% bowtie -p 2 -m 1 myxo MP1_pf.fastq > control.map
%% bowtie -p 2 -m 1 myxo M2_pf.fastq > chip.map
You should see
# reads processed: 12460030
# reads with at least one reported alignment: 8902883 (71.45%)
# reads that failed to align: 3245241 (26.05%)
# reads with alignments suppressed due to -m: 311906 (2.50%)
Reported 8902883 alignments to 1 output stream(s)
from the first bowtie, and
# reads processed: 12972579
# reads with at least one reported alignment: 9625057 (74.20%)
# reads that failed to align: 3090348 (23.82%)
# reads with alignments suppressed due to -m: 257174 (1.98%)
Reported 9625057 alignments to 1 output stream(s)
from the second bowtie.
It’s time to take our aligned reads and try and make some sense of them.
Firstly we’ll need to install Quest:
Download and unpack the code:
%% cd
%% curl -O http://www.stanford.edu/%7evalouev/QuEST/QuEST_2.4.tar.gz
%% tar -zxf QuEST_2.4.tar.gz
%% cd QuEST_2.4
Configure the installation parameters for your platform and compile:
%% perl configure.pl
%% make
Create a genome table:
%% cd /mnt/myxo
%% echo chr 9139764 > myxo.gt
and run the analysis:
%% /root/QuEST_2.4/generate_QuEST_parameters.pl -bowtie_align_ChIP chip.map -bowtie_align_RX_noIP control.map -gt ./myxo.gt -ap ./QuEST_analysis -ChIP_name MrpC
You should now see some activity as QuEST validates your input and attempts to convert your alignments to its own internal format. Eventually you will be prompted with a summary of your data.
Press “y” to continue.
Again, after some further activity you will be prompted to select appropriate running parameters. In this case, we will select the default parameters suggested for TF data.
Press 1 and Enter
You’ll now be asked to select some stringency measures. For our first run through we’ll select option 1 for the most stringent criteria (Note for the record I believe even the relaxed criteria generate high quality peak calls, just lots more of them!), again feel free to repeat and experiment with the consequences of these choices.
Press 1 and Enter
There will be one final prompt asking if you want to go ahead with your selected options...
Press ‘y’ to continue
In a few moments (thank god for small genomes!) QuEST should have completed, leaving you with lots of juicy files full of output.
To see the summary of the analysis:
%% less ./QuEST_analysis/module_outputs/QuEST.out
To view individual peak and region summaries complete with genome coordinates and associated statistics:
%% less ./QuEST_analysis/calls/peak_caller.ChIP.out.accepted
These files are information rich and will form the starting point for much of your analysis from this point on.
Region fields:
Peak fields:
QuEST also generates a number of pre-formatted tracks that can be uploaded and visualised in a genome browser, such as the UCSC genome browser.
To view them, find and view the reference genome of interest in the genome browser. In this case we’ll want to use the archaea and bacterial version of the UCSC genome browser at
First, copy all the files to a common location:
%% mkdir /mnt/quest-data
%% cp QuEST_analysis/tracks/wig_profiles/by_chr/background_unnormalized/chr.wig.gz QuEST_analysis/tracks/data_bed_files/by_chr/RX_noIP/RX_noIP.chr.bed.gz QuEST_analysis/tracks/ChIP_calls.filtered.bed QuEST_analysis/tracks/data_bed_files/by_chr/ChIP/MrpC.chr.bed.gz QuEST_analysis/tracks/bed_graph/MrpC.bedGraph.gz /mnt/quest-data
Now, download all these files to your local computer.
Go to the Web site,
Then:
click on “manage custom tracks” button underneath the main browser display
click on “add custom tracks” button
add the QuEST calls track:
ChIP_calls.filtered.bedadd the sequence read data track:
MrpC.chr.bed.gzadd the sequence read bedgraph display:
MrpC.bedGraph.gzyou should now see all these tracks being uploaded into the browser:
click on the “go to the genome browser” button and navigate to a locus of interest. Here you can search for specific coordinates, spans of coordinates or gene names.
To see a more condensed view of the data disable the raw data display by choosing the “hide” option of the MrpC” track in the section under the main browser display area. hit “refresh” button:
Add control data to the display
click on “manage custom tracks” button underneath the main browser display
click on “add custom tracks” button in the “manage custom tracks” display page
Upload control data for chr11: select file:
RX_noIP.chr.bed.gzclick “Open” then “Submit” button (see the screen shot below)
Upload control data density track: click on “add custom tracks”, select file:
chr.wig.gzhit “Open” and “Submit” button (see the screen shot below)
Refresh to see the new tracks displayed.