Wikipedia:WikiProject Molecular and Cellular Biology/PyMol tutorial

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The WikiProject: Molecular and Cellular Biology supports the use of PyMOL, an open-source molecular visualization program, for the creation of images of biomolecules such as proteins, DNA, and related complexes. This page provides introduces the software and demonstrates how to create high-quality images of proteins. Specific requests for assistance can be posted at our help requests subpage.

Obtaining PyMol[edit]

The official PyMol website is located at Sourceforge: http://pymol.sourceforge.net/. It is copyrighted free software written and distributed by DeLano Scientific.

Installation[edit]

Please see the PyMOL download site for complete information.

The most current version of PyMol freely available as a precompiled build is 0.99r6, which was released in March 2006; later versions are freely distributed only as source code, and require subscription for access to precompiled versions. We do not intend to support compiling your own version, although it is not difficult. The most current version is 0.99r8.

Precompiled versions of 0.99r6 are available for Microsoft Windows 2000 or XP, Mac OS X, Linux, IRIX, and Solaris. Be sure that you download the build appropriate for your operating system, and follow the appropriate instructions from the PyMOL download site. PyMol is copyrighted free software, so please pay attention to the license terms offered to you upon installation. You will be asked to consider sponsoring the project.

A number of external plugins and scripts have been developed for PyMol; many of these are collected and documented in the PyMol Wiki.

Basic usage[edit]

PyMOL interface overview[edit]

The PyMOL user interface consists of three components:

  • PyMOL Viewer: the window in which molecules are manipulated and displayed. This OpenGL window contains a central area in which the molecule is displayed, a right-aligned column containing information about different selections and representations of the molecule, and a single line labeled PyMOL> at the bottom which functions as a simple command line.
  • PyMOL Tcl/TK GUI: This window contains the familiar File/Edit/etc. navigation bar, as well as a command line and controls for manipulating multi-frame structures (these will generally not be important for our purposes)
  • PyMOL Console: This window is optional, and not launched by default. It contains diagnostic information, error messages, and status output messages that result from commands entered in the Tcl/Tk window.

Downloading and opening a PDB file[edit]

Biomolecular structures are deposited in a public database called the Protein Data Bank (PDB). The standard file format for storing molecular coordinates in the PDB is called a pdb file and usually has the file extension .pdb. The vast majority of published molecular structures will be distributed in this format, which is a human-readable text-based format that you can open and read in any text editor, such as Microsoft Notepad.

The usual place to download a molecular structure file is the PDB website, www.pdb.org, which allows you to search based on a four-character PDB ID or based on keyword descriptions of a particular structure. Once you have found the structure you're looking for, click the 'download files' link on the left side of the page, then click the first link labeled "PDB file" and download it to an appropriate location on your computer. You can open the file in PyMOL by clicking File > Open in the window labeled Tcl/Tk GUI and navigating to the location where you saved the file. Alternatively, you can type load <filename> in the Tcl/Tk window, where "filename" is the complete path to the file.

When the file is open, the molecule will be displayed in the Viewer window, although not in a particularly attractive representation.

Basic molecule manipulation[edit]

Manipulating the structure in the Viewer window is easiest with a three-button mouse, as the middle button is assigned unique functions; on some systems clicking the right and left buttons simultaneously will simulate a middle-click. PyMOL calls its manipulation interface the mouse matrix, which is displayed for reference at the bottom of the right-hand panel in the Viewer window. The mouse matrix has two main modes, viewing and editing; viewing is by far the more useful for our purposes. The important viewing functions are summarized below.

Keyboard Left Middle Right
(none) Rotate in 3D Translate Zoom/scale
Shift Grow a selection box Reduce a selection box Move clipping plane (cuts off/shadows distant atoms)
Control+Shift Select Reset origin of rotation Move clipping plane

The editing and viewing modes can be toggled by clicking anywhere in the mouse matrix display. Right- or double-clicking anywhere in the display pane brings up a viewing menu with a simplified set of options, including a zoom function that automatically fits the molecule to the display size.

Selecting[edit]

Single-clicking anywhere on the molecule will make a selection. Below the mouse matrix in the right-hand panel, the word "selecting" is displayed, followed by the selection mode. Clicking this line cycles through selection modes. You can select individual atoms, residues (amino acids), or polypeptide chains automatically by toggling the selection mode.

Each separate selection appears as its own entry in the list of displayed representations in the top section of the right-hand panel. Clicking each box once will toggle the display of that selection or object. By default, the most recent selection is called (sele), but you can give new selections a name on the command line when you create them.

It is usually easier to define selections on the command line using the command sele rather than clicking on the molecule. You can type your commands into the Tcl/Tk window (which contains a scrolling view of the history of past commands) or directly onto the command line in the viewer window (what you type here will appear next to the PyMOL> prompt at the bottom of the window). The simplest selection is something like sele name ca, which selects all of the alpha carbons in a molecule. This will store all of the alpha carbons in the representation (sele). Note that the list of representations in the viewer window distinguishes between selections, whose names are shown in parentheses, and objects, whose names are shown in plain text. To create a new selection, you can use the command sele alpha, name ca, which will produce a new entry in the representations list called "alpha". (The buttons and actions one can perform on each representation are covered in the Representations section below.)

Most important tasks[edit]

Important tasks you can perform on selections include:

  • hide alpha - do not display alpha carbons.
  • show alpha - display alpha carbons.
  • show sphere, alpha - display the alpha carbons as spheres.
  • zoom alpha - reorient the viewer so that the set of alpha carbon fills the viewing window. This is most useful for selections that cover a localized region, for example a small-molecule ligand.
  • color blue, alpha - show alpha carbons in blue.

Naming your selections is not required; commands that take a selection-expression argument can also be used like this: color blue, name ca, which also colors alpha carbons blue without creating a separate entry in the representations list. Another useful command to know is count_atoms; count_atoms alpha will tell you how many alpha carbons are in the selection.

Common selections[edit]

PyMOL provides predefined shortcuts for commonly selected items. They are used as single identifiers, e.g. sele all or sele * to select all of the currently loaded atoms.

Shortcut Abbreviation Description
all * Select all loaded atoms
none none Select nothing
hydro h. Select all hydrogen atoms
hetatm het Select all heteroatoms, as defined by HETATM records in the PDB file

Other types of identifiers are used to select by atom, residue, or chain properties.

  • sele symbol c - selects all carbon atoms
  • sele name ca+cb - selects all alpha and beta carbons
  • sele resn gly - selects all glycine residues
  • sele resi 1-10 - selects residues 1 through 10
  • sele chain a - selects chain A (note: usually chain designations are single letters or numbers)
  • sele ss h+s - selects all atoms assigned alpha helix or beta sheet secondary structure (other options are l = loop, "" = unstructured)

As illustrated by the examples, numerical ranges are indicated by a dash, e.g. sele resi 20-22 for residues 20, 21, and 22. Concatenation is indicated with the + sign, e.g. sele resn asp+asn selects all residues that are either aspartate or asparagine. Parentheses are used for grouping.

Selection operators[edit]

Standard boolean operators work: sele not name ca selects everything except the alpha-carbons, sele name ca and ss h selects all the alpha carbons in helices, and sele resn gly or resi 1-5 creates a selection containing all glycine residues and all atoms from residues 1-5. The same operators can be used to combine named selections into a new selection; e.g. if sele alpha, name ca and sele helix, ss h are already defined, then sele alpha_not_helix, alpha not helix selects alpha carbons not in a helix.

There are also a number of selection operators used for creating local selections according to defined criteria. The most useful ones are summarized below.

  • sele alpha gap 5 - selects everything at least 5 angstroms away from an alpha carbon
  • sele alpha around 5 - selects everything within 5 angstroms of an alpha carbon
  • sele name cg within 5 of alpha - selects all gamma carbons within 5 angstroms of any alpha carbon
  • sele byres alpha around 5 - selects all atoms in all residues that have at least one atom within 5 angstroms of an alpha carbon. That is, if a single atom in a residue meets the selection criteria, the whole residue will be included in the new selection.
  • neighbor alpha - selects all atoms bonded to alpha carbons

Alpha carbons are used here as a simple example, but most of these commands are most useful in displaying an active site or the region around a bound ligand.

Representations[edit]

Each item in the list in the right-hand panel represents a selection or object. When you load a new PDB file, a new object is automatically created. Each item has five boxes to the left of its name, labeled A, S, H, L, and C for Actions, Show, Hide, Label, and Color. Clicking on a box will display a dropdown menu for that selection. Note the distinction between objects and selections; the latter are displayed with parentheses around the name and the former in plain text. You can create your own objects using the same selection syntax as above, with the command create replacing sele. The distinctions are subtle and largely irrelevant for simple purposes, but in general, objects are more flexible than selections. Deleting an object will also entirely remove its contents, while deleting a selection retains the selection's content.

The Show menu begins with a submenu called "as", which contains most of the same options available directly under the Show or Hide menus. The distinction is important: Show > Cartoon displays a cartoon representation of a selection in addition to what is already displayed for that selection, while Show > as > Cartoon replaces all other displays with a cartoon representation.

  • Actions menu: allows you to delete and rename selections and objects, and zoom and orient the view to emphasize a particular object. The "preset" list contains predefined simple macros for generating a basic display, e.g. a cartoon protein structure colored by residue index.
  • Show menu: display common representations such as cartoons, ribbons, sticks, spheres.
  • Hide menu: hide a particular representation (e.g., if both cartoons and ribbons are on, this looks ugly, so Hide > Ribbons will leave only cartoons displayed). Hide > Everything does exactly what it says on the tin.
  • Label menu: allows text labeling of atoms, residues, or chains. This can get messy.
  • Color menu: color options; see Color > By ss for options in coloring by secondary structure, or Color > Spectrum for options in rainbow coloration by residue index.

Display menus[edit]

Not all color and display options are accessible from the color menu associated with each entry in the object list. Most relevant other options are located under the Tcl/Tk window menus.

  • Display > Background - changes the background from the default black
  • Display > Depth Cue - toggles depth cuing fog (default on)
  • Display > Show valences - toggles display of double and triple bonds in line and stick views (default: show all bonds as a single line)
  • Settings > Cartoon - various options for the display of cartoon structures, including size of helices and sheets
  • Settings > Transparency - alter opacity and transparency for different representations; most useful for rendering a surface in partial transparency
  • Settings > Rendering - raytracer options; this is the place to look if rendering is very slow or causing memory errors

Rendering[edit]

PyMol contains a fast built-in raytracing utility for the production of high-quality images. The raytracing utility is very easy to use; simply type the command ray in the Tcl/Tk window or at the viewer prompt. Raytracing is always done at the current size of the window, so you can optimize your use of space by using the zoom command and/or sizing the protein manually using the right mouse button so that the area of interest fills the whole window. Your maximum realistic resolution will be slightly smaller than the resolution of your monitor. After you type ray, your computer might appear to "freeze" or become unresponsive; this is normal, because the raytracing calculation will be using most of your CPU for a short amount of time. Except for very complex representations of surfaces, reasonably fast computers should not freeze for more than a few minutes at most.

After you have raytraced your image, do not try to rotate the molecule or otherwise alter the display. This will cause you to lose your raytraced image and return to the OpenGL-rendered version. The command png <filename> on the command line will save your image as a PNG file as the filename you specify; you can also include a path to a location besides the current directory. Alternatively, use the File > Save Image menu in the Tcl/Tk window and navigate to your preferred location.

Saving and restoring sessions[edit]

PyMol allows you to save its current internal state, including all molecules you have loaded, all selections, objects, and representations, and all display settings in a PyMol session file, which has the file extension .pse. This is very useful for saving intermediate states of complex manipulations, especially since PyMol does not have a generalized "undo" function. Before you take an action that might damage your previous work, you should save your session.

Saving sessions is done in the Tcl/Tk menu under File > Save Session (or File > Save Session As...). Restoring is done through the File > Open command, which can open .pse files as well as plain PDB files. You can also save individual components of your session as separate PDB files using the File > Save Molecule function.

A related and useful command is File > Reinitialize, which restores PyMol to the state in which it was originally launched. This will remove any molecules you've opened or selections you've made.

Recommended representations[edit]

There is no single best way to represent and display all proteins. Ideally, we would be able to show several images of each protein, illustrating its overall fold as well as its active site and any other important structural features. However, if little is known about the protein, or you are only creating a single image, the following parameters are generally recommended. These suggestions apply even if you are using another visualization program.

  • Background is preferably white rather than the default black, for readability and transferability purposes.
  • Representation is preferably cartoon or ribbon to illustrate the protein's secondary structure.
  • Color depends on the size of the molecule being illustrated. For a single polypeptide chain, coloring by secondary structure is usually best. For a multi-protein complex, coloring by individual polypeptide chain or by subunit type offers a clearer illustration of how the structure is assembled.

Cartoons in PyMol[edit]

Manipulating cartoon representations in PyMol is a bit idiosyncratic; in particular, Show > Ribbon (or show ribbon, <object>) does not do what you might expect from other programs. Important manipulations of the cartoon representation are summarized here. To use these commands, you will most likely want to first click Show > as > Cartoon in the Show menu for a loaded PDB. By default, smoothing is enabled in cartoon representations so that helices and especially sheets are shown as roughly straight, rather than bent to follow the protein backbone.

  • To turn off all smoothing: set cartoon_flat_sheets, 0 and set cartoon_smooth_loops, 0. To turn smoothing back on, set both variables to 1.
    • When displaying local side chains along with a cartoon structure, it is best to set cartoon_flat_sheets to 0 so that the cartoon and sidechain representations appear to be connected.
  • cartoon automatic - reset cartoon representations to defaults.
  • cartoon loop - thin ribbons following the backbone, the way loops are shown in the default representation.
  • cartoon oval, cartoon arrow, cartoon rect, cartoon tube - various shapes for tracing out the backbone, of which tube is usually the clearest.

The best way to choose secondary structure color schemes besides the two in the Color > By ss object menu is to use selections. For example, to turn all helices blue, use color blue, ss h.

Because different proteins are best represented with different depictions, we can't give a thorough overview of all possible knobs to twiddle. See Settings > Edit all for a complete listing of parameters (listed alphabetically; all those related to cartoons begin with the word cartoon_) and experiment.

Finally, in line with the visual style of wikipedia, it is best to set the background to white.

Representing NMR structures[edit]

Protein structures solved by X-ray crystallography usually contain only one set of coordinates, while structures solved by protein NMR usually contain multiple sets of coordinates corresponding to alternative models satisfying the set spatial restraints determined by the NMR experiments. NMR structures are deposited in the PDB as a single file containing multiple sets of coordinates. To view NMR structures, open the PDB file as usual; all coordinate sets will be loaded, but only one displayed by default.

  • To view the structures in succession, as frames of a movie, click "Play" in the Tcl/Tk window. The frame rate can be set in the menu Movie > Speed.
  • To view all the structures simultaneously, click Movie > Show all states. Because 'thick' representations like sticks and cartoons can look awkward with multiple overlaid structures, the ribbon and line representations are particularly useful for NMR structures. Be sure to look at the raytraced version of ribbon structures, as it is very different from the OpenGL-rendered realtime view.

Reconstructing biological molecules[edit]

X-ray crystallography structures are sometimes represented in the PDB as asymmetric units. This may include multiple copies of a single molecule, in which case selecting only one of the copies is a simple matter. A more complex procedure is required for multimeric structures where the asymmetric unit PDB coordinates do not correspond to a complete assembled complex. (This is usually because they contain monomers from multiple adjacent multimers.) Reconstructing the biological molecule requires the symexp command to generate symmetry-related copies of the molecule. (Alternatively, one or more biological unit files are usually available directly from the PDB, but come as gzipped files that can be quite large when unzipped.)

  • symexp takes the following arguments: symexp <prefix>,<molecule name>,(<object name>),<distance>. The prefix is used to name the new representations that will be generated by the command, the molecule name is the name of the loaded PDB, the object name is the name of the selection to be duplicated (usually the same as the molecule name), and the distance is the range in angstroms over which to construct copies. The command symexp dup,1G4A,(1G4A),5.0 generates copies of the 1G4A structure that pass within 5 angstroms of the original and names them dup01000000, dup02000000, etc.
  • The chosen cutoff must be large enough to generate at least one complete molecule, which will probably also generate extraneous fragments of related molecules. The command hide (not (1G4A expand 5)) hides objects more than 5 angstroms away from the original. Setting this size smaller than the size that generated the new copies will remove the extra fragments.
  • All symmetry-related fragments can be deleted with delete <prefix>*, as in delete dup*. This is useful because choosing an excessively large radius for the symexp command can generate a very large number of new objects.