Jump to content

Protein Data Bank (file format)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by M palmer45 (talk | contribs) at 21:10, 29 May 2008. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

pdb
Filename extension
.pdb, .ent, .brk
Internet media typechemical/x-pdb
Type of formatchemical file format

The Protein Data Bank (pdb) file format is a textual file format describing the three dimensional structures of molecules held in the Protein Data Bank. Most of the information in that database pertains to proteins, and the pdb format accordingly provides for rich description and annotation of protein properties. However, proteins are often crystallized in association with other molecules or ions such as water, ions, nucleic acids, drug molecules and so on, which therefore can be described in the pdb format as well.

A typical pdb file describing a protein consists of hundreds to thousands of line like the following (taken from a file describing the structure of a synthetic collagen-like peptide :

HEADER    EXTRACELLULAR MATRIX                    22-JAN-98   1A3I
TITLE     X-RAY CRYSTALLOGRAPHIC DETERMINATION OF A COLLAGEN-LIKE
TITLE    2 PEPTIDE WITH THE REPEATING SEQUENCE (PRO-PRO-GLY)
...
EXPDTA    X-RAY DIFFRACTION
AUTHOR    R.Z.KRAMER,L.VITAGLIANO,J.BELLA,R.BERISIO,L.MAZZARELLA,
AUTHOR   2 B.BRODSKY,A.ZAGARI,H.M.BERMAN
...
REMARK 350 BIOMOLECULE: 1                                                       
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C
REMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000
REMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000
...
SEQRES   1 A    9  PRO PRO GLY PRO PRO GLY PRO PRO GLY
SEQRES   1 B    6  PRO PRO GLY PRO PRO GLY
SEQRES   1 C    6  PRO PRO GLY PRO PRO GLY
...
ATOM      1  N   PRO A   1       8.316  21.206  21.530  1.00 17.44           N
ATOM      2  CA  PRO A   1       7.608  20.729  20.336  1.00 17.44           C
ATOM      3  C   PRO A   1       8.487  20.707  19.092  1.00 17.44           C
ATOM      4  O   PRO A   1       9.466  21.457  19.005  1.00 17.44           O
ATOM      5  CB  PRO A   1       6.460  21.723  20.211  1.00 22.26           C
...
HETATM  130  C   ACY   401       3.682  22.541  11.236  1.00 21.19           C  
HETATM  131  O   ACY   401       2.807  23.097  10.553  1.00 21.19           O  
HETATM  132  OXT ACY   401       4.306  23.101  12.291  1.00 21.19           O          
...

The ATOM records describe the coordinates of the atoms that are part of the protein. For example, the first ATOM line above describes the alpha-N atom of the first residue of peptide chain A, which is a proline residue; the first three floating point numbers are its x, y and z coordinates. HETATM records describe coordinates of hetero-atoms, that is those atoms which are not part of the protein molecule. The SEQRES records give the sequences of the three peptide chains (named A, B and C), which are very short in this example but usually span multiple lines. REMARK records can contain free-form annotation, but they also accommodate standardized information; for example, the REMARK 350 BIOMT records describe how to compute the coordinates of the experimentally observed multimer from those of the explicitly specified ones of a single repeating unit.

HEADER, TITLE and AUTHOR records provide information about the researchers who defined the structure; numerous other types of records are available to provide other types of information.

Through the years the file format has undergone many changes and revisions. Its original format was dictated by the width of computer punch cards.

See also

Molecular visualization software capable of displaying pdb files: