Highlights
SPARKS: Sequence, secondary structure Profiles And Residue-level Knowledge-based Score for fold recognition.

An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles And Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. [See Software/Service for the Server]


Topology prediction of transmembrane proteins using mean burial propensity

Helices in membrane spanning regions are more tightly packed than the helices in soluble proteins. Thus, we introduce a method that uses a simple scale of burial propensity and a new algorithm to predict transmembrane helical (TMH) segments and a positive-inside rule to predict N-terminal orientation. The method (the topology predictor of Transmembrane Helical proteins Using Mean BUrial Propensity, or THUMBUP) correctly predicted the topology of 55 out of 73 proteins (or, 75%) with known three-dimensional structures (the 3D_helix database). This is the best that can be achieved by any current state-of-art methods. Moreover, we found that the 1D_helix database, because of its inaccuracy, should be avoided as either a training or testing database.


Binding cooperativity of Scapharca dimeric hemoglobin

Cooperative binding of ligands to proteins is one of the methods that nature uses to increase binding efficiency and regulate binding activity. Understanding the mechanism of binding cooperativity is one of the central problems in molecular biology. Much of the current understanding of binding cooperativity is built on experimental and theoretical studies of human tetrameric hemoglobin. However, a detailed dynamic mechanism from one crystallographic endpoint to the other is still missing due to limited experimental information on partially liganded intermediates and the difficulty to simulate conformational changes that have a time-scale longer than tens of nanoseconds. In this paper, we investigate partially liganded intermediate states of a small but strongly cooperative Scapharca dimeric hemoglobin using molecular dynamics simulation methods and reveal the direct role of water molecules in binding cooperativity. [See publication #54] [See a movie clip ][PDF]


Distance-scaled, finite ideal-gas reference (DFIRE) state improves structure-derived potentials of mean force for structure selection and stability prediction.

One of the bottlenecks for the solution of the protein folding problem is the lack of an accurate potential to describe the interactions among amino acid residues and the interactions between the amino acids and the aqueous solvent. This is a complex and challenging problem because it involves the interplay among several different types of interactions. The interaction potential that would yield a complete understanding of the folding phenomena should be derived from the laws of physics. However, the use of such physical-based potentials for ab initio folding studies is limited by available computing power. An alternative method is to extract the potential of mean force from known protein structures. This yields what is called knowledge-based statistical potentials. They are simpler and easier to use than physical-based potentials. The distance-dependent structure-derived potentials developed so far all employed a reference state that can be characterized as a residue (atom)-averaged state. Here, we establish a new reference state based on the principle of statistical mechanics. Results show that the new method improves significantly structure-derived potentials of mean force for structure selection and stability prediction. [See publication #52] [PDF]


The dual role of a loop with low loop contact distance in folding and domain swapping

helices, strands, and loops are the basic build blocks for the structures of proteins. The folding kinetics of helices and strands have been investigated extensively. However, little is known about the formation of loop. Experimental studies show that for some proteins, the formation of a single loop is the rate-determining step for folding, while for others, a loop (or turn) can misfold to serve as the hinge loop region for domain-swapped species. These two seemingly opposite behaviors appear to be the character of a single loop of a model three-helix bundle (fragment B of Staphylococcal protein A) in our all-atom folding simulations. To interpret the modeling result, we developed a simple structural parameter -- the loop contact distance (LCD) or the sequence distance of contacting residues between a loop and the rest of the protein. The parameter is then applied to a number of other proteins including SH3 domains and prion protein. The results suggest that a locally interacting loop (low LCD), can either promote folding or serve as the hinge region for domain-swapping. Thus, there is an intimate connection between folding and domain swapping, a possible cause of misfolding and aggregation. [See publication #50][PDF]


Role of hydrophilic and hydrophobic contacts in folding of the second -hairpin fragment of protein G: Molecular dynamics simulation studies of an all-atom model

Predicting the folding mechanism of the second -hairpin fragment of the Ig-binding domain B of streptococcal protein G is unexpectedly challenging for simplified reduced models because the models developed so far indicated a different folding mechanism from what was suggested from high-temperature unfolding and equilibrium free-energy surface analysis based on established all-atom empirical force fields in explicit or implicit solvent. This happened despite the use of empirical residue-based interactions, multibody hydrophobic interactions, and inclusions of hydrogen bonding effects in the simplified models. In this paper, a recently developed all-atom (except nonpolar hydrogens) model interacting with simple square-well potentials is employed to fold the peptide fragment by molecular dynamics simulation methods. Folding of the new all-atom model is found to be initiated by collapse prior to the formation of main-chain hydrogen bonds. This verifies the mechanism that is proposed from previous all-atom unfolding and equilibrium simulations. The new model further predicts that the collapse is initiated by two nucleation contacts (a hydrophilic contact between D46 and T49 and a hydrophobic contact between Y45 and F52), in agreement with recent NMR measurements. The results suggest that atomic packing and native contact interactions play a dominant role in folding mechanism. [See publication #48][PDF]


Thermodynamics of an all-atom off-lattice model of the fragment B of Staphylococcal protein A: Implication for the origin of the cooperativity of protein folding

It has been well established that the folding transition of many proteins, in particular, of small globule proteins is a first order-like transition (i.e., it is a two-state transition with no detectable intermediates). Proteins can fold cooperatively either from a coil or from a molten globule state with variable secondary structural contents. The origin of cooperativity, however, is not fully understood. The proposed origins of protein's two-state behavior range from helix-coil transitions, heteropolymer collapse, sidechain packing, to the existence of elementary folding units. Although simplified models can exhibit first-order-like transitions, their interpretations vary. In sophisticated lattice models, the cooperativity arises from multibody interactions while different mechanisms (collective orientational rearrangement versus cooperative native-contact formation) are suggested for lattice models with and without sidechains. Studies of C based (without sidechains) off-lattice models, on the other hand, failed to produce a first order coil-to-native folding transition even for highly optimized sequences. Instead, a strong transition to a molten globule state, followed by a weak folding transition, is observed. To better understand the folding thermodynamics as well as the kinetics, there is a need for a more accurate off-lattice model which is reported in this paper. [See publication #47][PDF]


Folding Rate Prediction Using Total Contact Distance

Linear regression analysis found that either contact order (CO) or long-range order (LRO) parameter has a significant correlation with the logarithms of folding rates. This suggests that sequence separation per contact and total number of contacts are both important in determining the rate of folding. Here, the two factors are incorporated into a new parameter, total contact distance (TCD). Significant improvement in correlation is observed.[See publication #44][PDF]