SCUD Manual
Why
to use SCUD?
First, SCUD (Structure ClUstering of Decoys) is fast. It is 9
times faster than the traditional pair-wise RMSD (p-RMSD) clustering
method from our test of 2,000 decoys. More speed-gain can be excepted
when a larger set of decoys is clustered.
Second, it provides an alternation for clusterization. The average
near-native-structure selection of our test of 41 proteins from SCUD is
about the same as the traditional p-RMSD clustering. However, the
individual selection varies.
How
to select clusterization parameters?
You can select two clusterization
methods, i.e. SCUD or
traditional pair-wise RMSD, to perform the clustering.
For each clusterization method, you can select whether it is size-based or energy-based.
For energy-based clustering, the decoys are sorted by the energy
and the clusterization starts from the lowest
energy decoy. For size-based clustering (recommended), the decoys are
sorted by the size of the cluster that they are centered and the
clustering
starts from the decoy that is in the center of the largest cluster.
The atoms that are used
to calculate RMSD can be either Cα atom, backbone atoms or all
non-hydrogen atoms.
How to select the cluster cutoff?
You can select the cluster cutoff in two ways:
1. Use the self-determined-cutoff
strategy (recommended) to
determine the cluster cutoff by making
selection of Cluster Cutoff (%).
It is determined by the percentage of decoys in the top 3 clusters
over the total number of decoys. The recommended value is 5 (%)
for a set of more than 1000
decoys. A larger value is needed for a smaller amount of decoys so that
a sufficient number of decoys are included in the top 3 largest
clusters. The calculation of cluster cutoff needs to search a set
of
clustering cutoff (in Å), which is determined by Start/End/Step
of the cutoff searching region.
2. Use the fixed
cutoff value in Å by making selection of Cutoff Searching (Å, optional). You can select the
clustering starts and ends at the same cutoff value.
What's
reference state and where to put it?
Reference state is the decoy which is used in SCUD to remove the
overall rotation of all the decoys. You should select a random decoy
as the reference state to obtain unbiased clustering result. It is not
recommended to use the native
as reference, which may generate a faked good near-native-structure
selection.
The reference state is the first decoy in the Header file. It is only
used when SCUD clusterization method is selected.
How to make
the Header file?
Header file includes the information of decoys to be clustered and the
reference state.
Header file has two columns without any comment lines. The first column
is the decoy's full name. The second column is the energy associated
with that decoy. The energy is
only used when the clusterization is selected as energy-based.
In case using size-based clusterization, set the energy at an arbitary
value.
Example of the Header file:
1GABc1000_fm22.pdb
-1384.25
1GABc1001_fm22.pdb
-1480.27
1GABc1002_fm22.pdb
-1670.61
1GABc1003_fm22.pdb
-1409.63
..........
The first decoy 1GABc1000_fm22.pdb
is used as the reference state in SCUD.
How
to make
the Structures Compressed File?
Currently, the SCUD server only accepts the file format as *.tar.gz,
which is the compressed file for all the decoys listed in Header file. No directory
is allowed in the compressed file. The original decoy must be in PDB
format. Only the coordinates that marked as "ATOM"
will be read until "END",
"TER"
or "ENDMDL"
is met.
Example to generate *.tar.gz file in Unix or Linux system:
tar
-cvf scud.tar 1GABc1???_fm22.pdb
gzip
scud.tar
The two commands will generate a file named scud.tar.gz
that is ready to be uploaded. After unzip (tar
-xzf scud.tar.gz), all the decoys should be in the current
working directory.
What's
the use
of my email address?
Your email address is used to email the rough result and location of detailed results. A temporally
website will be set up for you to download the detailed clusterization
output: http://sparks.informatics.iupui.edu/hli/YourEmailAddress_FileID_MBT, where the three digital M,B,T
represent the
selected clusterization parameters (M: cluster Method; B: cluster base; T: atom Type).
What's
the use of file ID?
The file ID is used to distinguish the different jobs you are submitted.
Where's my result?
The detailed clusterization output files are temporally (for
10 days) posted on the website
of " http://sparks.informatics.iupui.edu/hli/YourEmailAddress_FileID_MBT/
" for you to download, where the three digital M,B,T
represent the
selected clusterization parameters (M: cluster Method; B: cluster base; T: atom Type). The rough result is FileID.out in the website.
The clusterization output file is named as FileID_MBT_Cutoff.dat
on the website, where Cutoff
is the clusterization
cutoff in Å. There are five columns in the output, arranged as the
cluster-size, energy, rank of cluster size when all the
decoys are counted, and the decoy name, respectively. The first column is the final
cluster-size of the decoy. The overall results for different
cutoff is saved in file FileID_MBT.dat.
The concise
clusterization result can be found in FileID.out, for example, as:
Cluster cutoff = 2.80 A @ T3 Cluster-size = 5.15 %
Clusterization output at this selected cutoff is SCUD_223_2.80.dat
Top 5 largest clusters (decoyName ClusterSize) @ 2.80 Å cutoff:
1GABc1418_fm22.pdb 45
1GABc2018_fm22.pdb 35
1GABc1032_fm22.pdb 23
1GABc1728_fm22.pdb 20
1GABc1295_fm22.pdb 17
which shows the top 5 clusters and the cluster cut-off.
Goto
SCUD
This service was established and maintained
by Dr. Zhou's group.
*** The use of this server means that you have
read and accepted
Disclaims, Warranties, Legal Notices