Automated rendering of PDB proteins using PyMol

Downloading the .pdb file

As the RCSB offers direct HTTP acess, this step is trivial.

You can use this shell script to download any protein:

#!/bin/bash
# donwload-pdb.sh
wget http://www.rcsb.org/pdb/files/$1.pdb

Call it with the PDB ID to download, e.g. 1ULI:

./download-pdb.sh 1ULI

Rendering using PyMol

First, install PyMol: Either download it from the website or just use your preferred package manager, e.g.:

sudo apt-get install pymol

By calling PyMol with a script instead of in interactive mode, we can automate the process of rendering an image – manual tuning of the perspective etc is likely to improve the results, however.

The following script integrates both the automatic download and the renderer. A temporary .pml (PyMol script) file is created with static settings

#!/bin/bash
# render-pymol.sh
if [ $# -eq 0 ]
  then
    echo "Usage: render-pymol.sh <PDB ID>"
    exit
fi
#Download
wget -qO $1.pdb http://www.rcsb.org/pdb/files/$1.pdb

#Create the rasmol script
echo "load $1.pdb;" > $1.pml
echo "set ray_opaque_background, on;" >> $1.pml
echo "show cartoon;" >> $1.pml
echo "color purple, ss h;" >> $1.pml
echo "color yellow, ss s;" >> $1.pml
echo "ray 1500,1500;" >> $1.pml
echo "png $1.png;" >> $1.pml
echo "quit;" >> $1.pml

#Execute PyMol
pymol -qxci $1.pml
#Remove temporary files
rm -rf $1.pml $1.pdb

Call it like this:

./render-pymol.sh 1ULI

This command will render the result and store it in 1ULI.png

Customizing the render style

In order to change the size of the generated PNG image, change this line:

echo "ray 1500,1500;" >> $1.pml

The numbers represent the width and height of the generated image. Note that increasing the image size will significantly increase the CPU time required to render the image, especially for complex proteins. Running render-pymol.sh with 1500x1500px to render the 1ULI  took 209 seconds on my Notebook as opposed to 33 seconds for 500×500.

These lines define the style of the rendered protein:

echo "show cartoon;" >> $1.pml
echo "color purple, ss h;" >> $1.pml
echo "color yellow, ss s;" >> $1.pml

while this line set the background color to transparent:

echo "set ray_opaque_background, off;" >> $1.pml

If you prefer a white (non-transparent) background instead, you can add this line right after the line containing load $1.pdb:

echo "bg_color white;" >> $1.pml
echo "set ray_opaque_background, off;" >> $1.pml

Results

1ULI

3ULI

Accessing NCBI FTP via rsync

A little-known way of accessing data on the NCBI FTP servers is by using rsync. This method was first mentioned in this mailing list post in 2004.

Using rsync instead of ftp has a few key advantages

  • Fully incremental downloads
  • Resumable downloads
  • Faster than FTP
  • Only a single connection, no problems with FTP active/passive ports (e.g. important for dual-stack lite, e.g. see this excellent post in german )

Read more

A GeneOntology OBO v1.4 parser in Python

The GeneOntology Consortium provides bulk data download for the GO terms in the OBO v1.2 format.

If you Google GO OBO parser, there is something missing. You can easily find parsers in Perl, parsers in Java, but not even BioPython has a parser in Python. The format itself, however seems like it’s tailor-made for Python’s generator concept. Only a few SLOCs are needed to get it work without storing everything in RAM.

I used this parser in a prototype project that allows to search GO interactively (it’s fast). I’m not sure when/if I’ll publish that, but here is the parser code.

Read more

A simple tool for FASTA statistics

The issue

It is surprisingly difficult to compute simple statistics of FASTA files using existing software. I recently needed to compute the nucleotide count and relative GC frequency of a single sequence in FASTA format, but unless you install dependency-heavy native software like FASTX or you develop it by yourself using BioPython or similar, there doesn’t seem to be a simple, dependency-free solution for this simple set of problem.

Read more