SCUD Manual


Why to use SCUD?

First, SCUD (Structure ClUstering of Decoys) is fast. It is 9 times faster than the traditional pair-wise RMSD (p-RMSD) clustering method from our test of 2,000 decoys. More speed-gain can be excepted when a larger set of decoys is clustered.

Second, it provides an alternation for clusterization. The average near-native-structure selection of our test of 41 proteins from SCUD is about the same as the traditional p-RMSD clustering. However, the individual selection varies.

How to select clusterization parameters?

You can select two clusterization methods, i.e. SCUD or traditional pair-wise RMSD, to perform the clustering.

For each clusterization method, you can select whether it is size-based or energy-based. For energy-based clustering,  the decoys are sorted by the energy and the clusterization starts from the lowest energy decoy. For size-based clustering (recommended), the decoys are sorted by the size of the cluster that they are centered and the clustering starts from the decoy that is in the center of the largest cluster.

The atoms that are used to calculate RMSD can be either Cα atom, backbone atoms or all non-hydrogen atoms.

How to select the cluster cutoff?

You can select the cluster cutoff in two ways:

1. Use the self-determined-cutoff strategy (recommended) to determine the cluster cutoff by making selection of Cluster Cutoff (%). It is determined by the percentage of decoys in the top 3 clusters over the total number of decoys.  The recommended value is 5 (%) for a set of more than 1000 decoys. A larger value is needed for a smaller amount of decoys so that a sufficient number of decoys are included in the top 3 largest clusters. The calculation of cluster cutoff  needs to search a set of clustering cutoff (in Å), which is  determined by Start/End/Step of the cutoff searching region.

2. Use the fixed cutoff value in Å by making selection of Cutoff Searching (Å, optional). You can select the clustering starts and ends at the same cutoff value.

What's reference state and where to put it?

Reference state is the decoy which is used in SCUD to remove the overall rotation of all the decoys. You should select a random decoy as the reference state to obtain unbiased clustering result. It is not recommended to use the native as reference, which may generate a faked good near-native-structure selection.

The reference state is the first decoy in the Header file. It is only used when SCUD clusterization method is selected.

How to make the Header file?

Header file includes the information of decoys to be clustered and the reference state.

Header file has two columns without any comment lines. The first column is the decoy's full name. The second column is the energy associated with that decoy. The energy is only used when the clusterization is selected as energy-based. In case using size-based clusterization, set the energy at an arbitary value.

Example of the Header file:

1GABc1000_fm22.pdb   -1384.25
1GABc1001_fm22.pdb   -1480.27
1GABc1002_fm22.pdb   -1670.61
1GABc1003_fm22.pdb   -1409.63
..........

The first decoy 1GABc1000_fm22.pdb  is used as the reference state in SCUD.

How to make the Structures Compressed File?

Currently, the SCUD server only accepts the file format as *.tar.gz, which is the compressed file for all the decoys listed in Header file. No directory is allowed in the compressed file. The original decoy must be in PDB format. Only the coordinates  that marked as "ATOM" will be read  until  "END", "TER" or "ENDMDL" is met.

Example to generate *.tar.gz file in Unix or Linux system:

tar -cvf scud.tar 1GABc1???_fm22.pdb
gzip scud.tar

The two commands will generate a file named scud.tar.gz  that is ready to be uploaded. After  unzip (tar -xzf scud.tar.gz), all the decoys should be in the current working directory.


What's the use of my email address?

Your email address is used to email the rough result and location of detailed results. A temporally website will be set up for you to download the detailed clusterization output: http://sparks.informatics.iupui.edu/hli/YourEmailAddress_FileID_MBT, where the three digital M,B,T represent the  selected clusterization parameters (M: cluster Method; B: cluster base; T: atom Type).


What's the use of file ID?

The file ID is used to distinguish the different jobs you are submitted.


Where's my result?

The detailed clusterization output files are temporally (for 10 days) posted on the website of " http://sparks.informatics.iupui.edu/hli/YourEmailAddress_FileID_MBT/ " for you to download, where the three digital M,B,T represent the  selected clusterization parameters (M: cluster Method; B: cluster base; T: atom Type). The rough result is FileID.out in the website.

The clusterization output file is named as FileID_MBT_Cutoff.dat on the website, where Cutoff is the clusterization cutoff in Å. There are five columns in the output, arranged as the cluster-size,  energy,  rank of cluster size when all the decoys are counted, and the decoy name, respectively. The first column is the final cluster-size of the decoy.  The overall results for different cutoff is saved in file FileID_MBT.dat.

The concise clusterization result can be found in FileID.out, for example, as:
Cluster cutoff =  2.80  A @ T3 Cluster-size =  5.15  %
Clusterization output at this selected cutoff is SCUD_223_2.80.dat
Top 5 largest clusters (decoyName ClusterSize) @ 2.80 Å cutoff:

1GABc1418_fm22.pdb 45
1GABc2018_fm22.pdb 35
1GABc1032_fm22.pdb 23
1GABc1728_fm22.pdb 20
1GABc1295_fm22.pdb 17
which shows the top 5 clusters and the cluster cut-off.






Goto SCUD

This service was established and maintained by Dr. Zhou's group.

*** The use of this server means that you have read and accepted Disclaims, Warranties, Legal Notices