Visualizing mappings with Samtools

Author:Rosangela Canino-Koning
Date:June 5, 2010

Installing samtools

Note: You can also use the AMI ‘ami-4403f42d’, which comes with everything installed, instead of running through the install process.

First, set up a new EC2 server as in Renting a computer from Amazon. Make sure to run all of the install commands in Installing your Amazon EC2 system!

Now, we’ll install Samtools, and its supporting libraries:

%% cd
%% curl -O https://s3.amazonaws.com/angus.ged.msu.edu/samtools-0.1.9.tar.bz2
%% tar xjf samtools-0.1.9.tar.bz2
%% cd samtools-0.1.9
%% make
%% cp samtools /usr/local/bin/
%% cd misc/
%% cp *.pl maq2sam-long maq2sam-short md5fa md5sum-lite wgsim /usr/local/bin/
%% cd

Downloading data

Download some data to work with; if you’ve worked through Mapping with bowtie, you’ve already either downloaded or generated these files.

%% cd /mnt
%% mkdir campy
%% cd /mnt/campy
%% curl -O https://s3.amazonaws.com/angus.ged.msu.edu/campy.fa.gz
%% curl -O https://s3.amazonaws.com/angus.ged.msu.edu/campy-pre-1m.fastq.gz
%% curl -O https://s3.amazonaws.com/angus.ged.msu.edu/campy-pre-1m.map.gz
%% gunzip campy*.gz
%% cd

Running samtools

Now that we have everything set up, we’ll convert and index our bowtie mappings into a format that Samtools can use.

Go back to the directory containing your data:

%% cd /mnt/campy

First, we’ll index the campylobacter reference genome.:

%% samtools faidx campy.fa

Then we’ll convert the bowtie-formatted mappings to SAM format. Samtools includes scripts to convert several different types of output formats into SAM.:

%% bowtie2sam.pl campy-pre-1m.map > campy-pre-1m.map.sam

Next, we’ll convert our newly minted .sam file into the compressed binary .BAM format. In order to browse the mappings, the .BAM file must also be sorted and indexed.

SAM to BAM:

%% samtools import campy.fa.fai campy-pre-1m.map.sam campy-pre-1m.map.bam

Sort BAM:

%% samtools sort campy-pre-1m.map.bam campy-pre-1m.map.sorted

Index BAM:

%% samtools index campy-pre-1m.map.sorted.bam

Phew! Now we’re ready to browse the mappings. There are several different ways to view the content. The simplest way is to simply dump out the content of the BAM file.:

%% samtools view campy-pre-1m.map.sorted.bam

This will produce an endless stack of lines that look like this::

HWI-EAS216_0019:4:3:5572:14132#0      16      campy_genome    16176   25      35M     *       0       0       CGATTTGTTCTGAAGAAAAAGCAAGTATGAGATAG     CCCCCCCCCCCBBBCCCCCCCCCCCCCCCCCCCCC     NM:i:0  X0:i:1  MD:Z:35
HWI-EAS216_0019:4:4:2099:16781#0      16      campy_genome    16176   25      35M     *       0       0       CGATTTGTTCTGAAGAAAAAGCAAGTATGAGATAG     CCCCCCCCCCBBBBCCCCCCCCCCCCCCCCCCBCC     NM:i:0  X0:i:1  MD:Z:35
HWI-EAS216_0019:4:2:8490:8096#0       0       campy_genome    16177   25      35M     *       0       0       GATTTGTTCTGAAGAAAAAGCAAGTATGAGATAGT     CCCCCCCCCCCCCCCCCBBBCCCCACCCCCCCBCB     NM:i:0  X0:i:1  MD:Z:35
HWI-EAS216_0019:4:3:9135:19700#0      16      campy_genome    16178   25      35M     *       0       0       ATTTGTTCTGAAGAAAAAGCAAGTATGAGATAGTT     CCCDCCCCDBBBCCCCCCCCCCCCCCCCCCCCCCC     NM:i:0  X0:i:1  MD:Z:35

This is probably not very useful. (Hit CTRL-C to stop the madness!)

Instead, we can target particular sections of the genome.

To dump out the content of a particular contig, address it directly::

%% samtools view campy-pre-1m.map.sorted.bam 'campy_genome'

To dump out the content of a contig, starting at a certain index, reference the contig name and the starting index.:

%% samtools view campy-pre-1m.map.sorted.bam 'campy_genome:1'

And to dump out a particular interval within a contig, reference the contig, the starting index, and the ending index in a range.:

%% samtools view campy-pre-1m.map.sorted.bam 'campy_genome:1-25000'
%% samtools view campy-pre-1m.map.sorted.bam 'campy_genome:24000-25000'
%% samtools view campy-pre-1m.map.sorted.bam 'campy_genome:24900-25000'

The previous commands give you every field in the bowtie/sam output, which might be useful if you are importing into another tool, but they aren’t particularly good for doing direct visual inspection. For this purpose, Samtools provides an interactive interface, ‘tview’.:

%% samtools tview campy-pre-1m.map.sorted.bam campy.fa

You can navigate the mappings using the arrow keys, and view other possible commands by typing a ‘?’.


BONUS

If you feel compelled to do any Python scripting for manipulating the mappings directly, Samtools supports a Python interface called Pysam. You can install it like so:

%% cd
%% wget http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/Pyrex-0.9.9.tar.gz
%% tar -zxvf Pyrex-0.9.9.tar.gz
%% cd Pyrex-0.9.9
%% python setup.py install
%% cd

And then:

%% wget http://angus.ged.msu.edu.s3.amazonaws.com/pysam-0.2-ngs.tar.gz
%% tar -zxvf pysam-0.2-ngs.tar.gz
%% cd pysam-0.2-ngs
%% python setup.py install
%% cd

Note, here we’re using a patched version of pysam that will work under the version of Python available on your EC2 machine.


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.
comments powered by Disqus