mkdssp - Calculate secondary structure for proteins in a PDB file
mkdssp [OPTION] pdbfile [dsspfile]
The mkdssp program was originally designed by Wolfgang Kabsch and Chris
Sander to standardize secondary structure assignment. DSSP is a database of
secondary structure assignments (and much more) for all protein entries in the
Protein Data Bank (PDB) and mkdssp is the application that calculates
the DSSP entries from PDB entries. Please note that mkdssp does not
predict secondary structure.
If you invoke mkdssp with only one parameter, it will be interpreted as
the PDB file to process and output will be sent to stdout. If a second
parameter is specified this is interpreted as the name of the DSSP file to
create. Both the input and the output file names may have either .gz or .bz2
as extension resulting in the proper compression.
- -i, --input filename
- The file name of a PDB formatted file containing the protein
structure data. This file may be a file compressed by gzip or bzip2.
- -o, --output filename
- The file name of a DSSP file to create. If the filename ends in .gz
or .bz2 a compressed file is created.
- -v, --verbose
- Write out diagnositic information.
- Print the version number and exit.
- -h, --help
- Print the help message and exit. The directory containing the parser
scripts for mrs.
The DSSP program works by calculating the most likely secondary structure
assignment given the 3D structure of a protein. It does this by reading the
position of the atoms in a protein (the ATOM records in a PDB file) followed
by calculation of the H-bond energy between all atoms. The best two H-bonds
for each atom are then used to determine the most likely class of secondary
structure for each residue in the protein.
This means you do need to have a full and valid 3D structure for a
protein to be able to calculate the secondary structure. There's no magic in
DSSP, so e.g. it cannot guess the secondary structure for a mutated protein
for which you don't have the 3D structure.
DSSP FILE FORMAT¶
The header part of each DSSP file is self explaining, it contains some of the
information copied over from the PDB file and there are some statistics
gathered while calculating the secondary structure.
The second half of the file contains the calculated secondary
structure information per residue. What follows is a brief explanation for
||The residue number as counted by mkdssp
||The residue number as specified by the PDB file followed by
a chain identifier.
||The one letter code for the amino acid. If this letter is
lower case this means this is a cysteine that form a sulfur bridge with
the other amino acid in this column with the same lower case letter.
||This is a complex column containing multiple sub columns.
The first column contains a letter indicating the secondary structure
assigned to this residue. Valid values are:
|What follows are three column indicating for each of the
three helix types (3, 4 and 5) whether this residue is a candidate in
forming this helix. A > character indicates it starts a helix, a
number indicates it is inside such a helix and a < character
means it ends the helix.
|The next column contains a S character if this residue is a
|Then there's a column indicating the chirality and this can
either be positive or negative (i.e. the alpha torsion is either positive
|The last two columns contain beta bridge labels. Lower case
here means parallel bridge and thus upper case means anti parallel.
|BP1 and BP2
||The first and second bridge pair candidate, this is followed
by a letter indicating the sheet.
||The accessibility of this residue, this is the surface area
expressed in square Ångstrom that can be accessed by a water
||Four columns, they give for each residue the H-bond energy
with another residue where the current residue is either acceptor or
donor. Each column contains two numbers, the first is an offset from the
current residue to the partner residue in this H-bond (in DSSP numbering),
the second number is the calculated energy for this H-bond.
||The cosine of the angle between C=O of the current residue
and C=O of previous residue. For alpha-helices, TCO is near +1, for
beta-sheets TCO is near -1. Not used for structure definition.
||The virtual bond angle (bend angle) defined by the three
C-alpha atoms of the residues current - 2, current and current + 2. Used
to define bend (structure code 'S').
|PHI and PSI
||IUPAC peptide backbone torsion angles.
|X-CA, Y-CA and Z-CA
||The C-alpha coordinates
The original DSSP application was written by Wolfgang Kabsch and Chris Sander in
Pascal. This version is a complete rewrite in C++ based on the original source
code. A few bugs have been fixed since and the algorithms have been tweaked
here and there.
The code desperately needs an update. The first thing that needs implementing is
the improved recognition of pi-helices. A second improvement would be to use
angle dependent H-bond energy calculation.
If you find any, please let me know.
Maarten L. Hekkelman (m.hekkelman (at) cmbi.ru.nl)