Disordered Proteins and Dynamic Interactions: Comparison
Please note this is a comparison between Version 1 by Jianhan Chen and Version 5 by Beatrix Zheng.

Intrinsically disordered proteins (IDPs) oar regions (IDRs), compared to the well-structural proteins, do not have stable tertiary structures under physiological conditions, and evene highly prevalent and play important roles in biology and human diseases. It is now also recognized that many IDPs remain dynamic even in specific complexes and functional assemblies. It is now recognized that they are highly prevalent and play important roles in biology and human diseases due to the presence of many representative conformational states and potential dynamic interactions, which requires computer simulations for describingComputer simulations are essential for deriving a molecular description of the disordered protein ensembles and dynamic interactions involved for a mechanistic understanding of IDPs in biological functionsy, diseases, and therapeutics.

  • conformational ensemble
  • enhanced sampling
  • generalized Born
  • Gō-model
  • implicit solvent
  • liquid-liquid phase transition
  • replica exchange
  • protein force fields

1. Significance and characterizations1. Introduction

Intrinsically disordered proteins (IDPs or IDRs can be ) or regions (IDRs), compared to well-structured proteins, do not have stable tertiary structures under physiological conditions. Nevertheless, IDPs or IDRs can be found in nearly a third of proteins encoded in the human proteome [1], and they play key roles in a variety of biological processes that underlie vital cellular functions ranging from signaling and regulation to transport [2][3][2,3]. The inherent thermodynamic instability of an IDP’s conformation allows it to respond sensitively to numerous stimuli, including binding, changes in cellular environments (e.g., pH), and post-translational modifications [4][5][6][7][8][4,5,6,7,8]. Such conformational plasticity arguably enables IDPs to interact with multiple signaling pathways and serve as scaffolds to form multi-protein complexes [9]. Importantly, IDPs and IDRs can havouse around 25% of disease-associated missense mutations [10]. They have been considered promising therapeutic targets for treating various diseases (such as chronic diseases) [11][12][13][11,12,13]. While many IDPs have been shown to undergo binding-induced folding transitions upon specific binding [3], many examples are also emerging to demonstrate that IDPs can remain unstructured even in specific complexes and functional assemblies [14][15][16][17][18][19][20][14,15,16,17,18,19,20]. Such a dynamic mode of specific protein interactions seems much more prevalent than previously thought [21][22][23][21,22,23].

Experimental ensemble-averaged characterizations

It is very challenging to provide reliable descriptions of the conformational ensembles of IDPs and IDRs. A disordered state does not lend itself to traditional structural determination methods that are geared toward describing a coherent set of similar structures. Biophysical techniques, such as NMR, SAXS, and FRET, can provide complementary information on various local and long-range structural organizations [7]. However, these ensemble-averaged measurements alone are not sufficient to unambiguously define the heterogeneous ensemble, due to the severely underdetermined nature of the structure calculation problem  [8][24][25][8,24,25]. As a result, studies of IDPs have relied heavily in the traditional structure-function paradigm, by solving the folded structure of the bound state, analyzing coupled binding and folding mechanisms, or identifying putative pre-existing functional structures in the unbound state  [3]. However, the disordered ensemble itself is arguably the central conduit of cellular signaling. The functional mechanism of an IDP is encoded in how the disordered ensemble as a whole responds to various stimuli, be it cooperative binding-induced folding or the redistribution of conformational sub-states in dynamic interactions. Multiple cellular signals can be naturally integrated through cooperative responses of the whole dynamic ensemble  [26][27][28][26,27,28]. Therefore, there is a critical need for reliable characterization of disordered protein conformation ensembles, in both bound and unbound states, in order to establish the molecular basis of IDPs and IDRs in various physiological and pathophysiological processes.

Computational characterizations and simulations

Given the fundamental challenges of characterizing disordered protein states based on ensemble-averaged measurements alone, molecular modeling and simulations have a crucial and unique role to play in mechanistic studies of IDPs and IDRs [29][30][31][32][33] [29,30,31,32,33]. This is reflected byin the ever-continuously increasing numbers of research articles in our research communities that contain keywords “intrinsically disordered” and “molecular dynamics” published in the last 10 years ( Figure 1 ). A particularly attractive approach is to first generate the disordered ensemble using transferable, physics-based force fields without any experimental restraints and then use the later for independent validation  [7]. Such de novo simulations of disordered protein ensembles require both high force field accuracy and adequate sampling of relevant conformational space, pushing the limit of these two central ingredients of molecular dynamics (MD) and Monte Carlo (MC) simulations. The challenges of simulating disordered proteins have driven significant interest in developing better protein force fields and advanced sampling methods ( Figure 1 ). In particular, important advances have been made in the state-of-the-art atomistic force fields for describing the conformational equilibria of ordered and disordered proteins [13]. Enhanced sampling techniques have played crucial roles in both the development and application of atomistic force fields, by allowing one to cross energy barriers faster and accelerate the conformational sampling of IDPs [34][35][36][37][38][39][40][41][34,35,36,37,38,39,40,41]. Nonetheless, atomistic simulations still have limited capability in describing large systems such as biological condensates [42]. For this, multi-scale approaches are necessary to bridge the gaps in experimental and computational time- and length-scales, including implicit solvent models, which remove the solvent degrees of freedom [8], and various coarse-grained models, which significantly reduce both proteins and solvent degrees of freedom [43].

2. Simulating IDP conformations and dynamic interactionsChallenges of Simulating IDP Conformational Equilibria

Accurate force fields for High-dimensional free energy landscape

Compared to the globular proteins that have one or a few well-defined global energy minima, the energy landscape of an IDP is flatter and generally includes many local energy minima separated by modest energy barriers  [44][47]. IDPs and IDRs typically have fewer hydrophobic residues, but a larger number of polar or charged as well as disorder-promoting residues (such as glycine and proline) [45][44]. These sequence features hamper the formation of hydrophobic cores that drive protein folding and thus prevent the formation of stable tertiary structures. Instead, IDPs and IDRs favor forming an ensemble of unfolded or partially folded states. This presents a major challenge for simulation and depends critically on the ability of the force fields to accurately describe the energetics of relevant conformational states, especially for capturing both folded and unfolded states of an IDP. For example, one recent study tested atomistic simulations of IDPs for eight force fields and found marked differences in the describing the conformational ensembles of IDPs, in particular the secondary structure content [46][48]. Similar observations have also been made in other benchmark studies, consistently showing that protein force fields previously optimized for folded proteins are not suitable for simulating disordered protein states, largely due to over-stabilization of protein-protein interactions [47] [49]. These benchmark studies also suggested that the key towards better protein force field was to rebalance protein–protein, protein–water, and water–water interactions. Moreover, it can be also anticipated that polarizable force fields will be needed and become increasingly important for simulating IDP structure and interactions, because they explicitly consider the electronic polarization using various empirical models to provide better description of charged and polar protein motifs in heterogeneous biomolecular environments [48].

Advanced sampling for many relevant conformations

Besides accurate force fields, reliable simulation of IDPs also hinges on sufficient sampling of many relevant conformation states within a reasonable simulation time (Figure 1). Standard MD simulations are generally insufficient to generate representative conformational ensembles, even using the most accurate protein force fields coupled with advance of GPU computing or specialized hardware such as the ANTON supercomputer [49][50]. For example, a recent reanalysis of a 30-μs ANTON trajectory of a 40-residue Aβ40 peptide in explicit solvent revealed very limited convergence even at the secondary structure level [13] [13]. This can be attributed to the diverse and large accessible conformational space of an IDP and the potentially high free energy barriers separating various sub-states that require exponentially longer time to cross. Note that typical simulation times on conventional hardware (such as GPUs) are at least one-order of magnitude shorter. There is thus great danger in relying on standard MD to calculate disordered protein conformational ensembles at the atomistic level. There is a critical need to develop and leverage so-called enhanced sampling techniques, which aim to generate statistically meaningful conformational ensembles with dramatically less computation.

Computational Rstudiepresentative enhanced methods have been proposed by introducing bias s of IDP interaction and assembly are even more demanding. The conformational equilibrium of an IDP can respond sensitively to specific and nonspecific binding, potentials, changing the effective temperature, or using the protocols of replica exchange molecular dynamics (REMD). Importantly, all biased sampling strategies can be readily incorporated within the REX framework to benefit from both classlly shifting from a disordered to somewhat ordered state or fully folded state. In principle, simulations could provide the much-needed spatial and time resolutions to elucidate the kinetics and thermodynamics of coupled folding and binding processes and characterize the mechanistic features. However, the challenge is that this coupled process of folding and binding is a complex reaction involving the formation of many noncovalent interactions, which requires extremely long simulations generally beyond the current capabilities at the atomistic level. As such, coarse-grained models are generally required for computational studies of enhanced saIDP interaction and assembly.

3. The State-of-the-Art Protein Force Fields for Describing IDP Conformations

Empiricaling, including metadynamics (MTD) protein force fields are potential energy functions that [50][51],typically acincelerated MD (aMD)lude physics-motivated bonded and [52],non-bonded utermbrella sampling (US)s carefully parameterized based on a wide range of [53][54],theoretical and experintegratmental data [51]. Thedse tempering samplingforce fields can in principle [55].be It has been reported that these sampling methods can acransferable between folded proteins and IDPs. To achieve this, it is also critical to develop suitable water models and better describe the water–protein interactions [52,53]. Two recent revielerate the crossing of energy barriers w articles have already provided comprehensive descriptions on the latest development of better protein force fields [51,54]. We therefo achieve better sampling efficiencre briefly summarize the state-of-the-art of nonpolarizable and polarizable force fields for IDP dynamics and interactions.

Many previous nonpolandrizable force fields have been used to sample the IDP conformations and dynamic interactions.

Figure 1.significant shortcomings for describing the unfolded or disordered proteins. For example, they typically provide a poor description of the secondary structure content for IDPs and have a preference to give too compact conformations with respect to the Sexperimulating IDP centally measured dimension of IDPs [48,55]. These problems were likely attributed to the unformational dynamics and interactions requires both the balanced parameterization of dihedral torsion space and the description of protein–protein and protein–water interactions [56]. As a resultate-of-the-art, most of the improved force fields and advanced sampling methods.

Amanaged to give more accurate secondary structure propensities by adjusting dihedral parameters or adding grid-baseddi energy correctionally, map (CMAP) parameters [54]. The over-comany key multi-scale approaches allow one to simulate longpactness of disordered proteins can be alleviated by modifying protein–water van der Waals interactions or combining with refined water models [52]. Repr time-scale bioprocessesesentative state-of-the-art force fields includes the latest CHARMM36m/TIP3P* [57], ff19SB/OPC [58], and a99SB-disp/TIP4P-D [50]. Many benchmoare complex systems within the capacity of current computational capability, namely implicit solvent and coarse-grainedk studies have consistently demonstrated that these refined force fields do provide significant improvements in describing not only single folded and disordered proteins, but also the multiprotein systems that are either soluble or aggregate in the solution [55,59,60,61,62]. At the (CG)same models. Implicit treatment of solvent is an efftime, these studies also identified significant remaining limitations in the description of the noncovalent interactions in the multiprotein systems [60]. Recognizing limitive approach to reducations in the ability of a99SB-disp/TIP4P-D force field to accurately describe the computational cost of atomistic IDP simulations. The basic idprotein–protein interactions, a new force field, DES-Amber, was recently developed to provide more accurate simulations of protein–protein complexes while maintaining reliable descriptions of both ordered and disordered single-chain proteins [61]. However, DES-Ambear is to directly estimate the solvstill limited in reproducing the experimental protein–protein association free energy to capture the mean effecies of some protein complexes, in particular for the systems with highly polar interfaces [61]. In the of solvent on the thermodynamic properties of the solutelatter case, it was found that the charged sidechains were buried at the protein–protein interface instead of being solvent-exposed. It was further suggested [56].that Imnonplicit solvent is essentially a multi-scale model, where tholarizable force fields were fundamentally limited in achieving a balanced description of charged groups that were solvent is repre-exposed or buried at a protein–protein interface.

Polarizable force fields entedxplicitly consider the electronic polarization using certain physvarious empirical model while keeping atomiss to provide better description of charged and polar protein motifs in heterogeneous biomolecular environments [63]. Exciticng progress has been details of the solute. These models have emergedmade in the last few years and several polarizable force fields are now available for the stable simulation of proteins in both aqueous and membrane environments [64,65]. Simulas attractive alternatives for simultions using the latest polarizable force fields have also showed a high level of consistency with experimental observations of IDPs and their interactions compared to explicit, particularly the ion solvation and binding thermodynamics, permeation free energy of ions or small charged molecules into the cell membrane, and protein–ligand binding [63]. sFolvent. Several of these GB models can be optimized to provide a balance between computatr example, the Drude-2013 polarizable force field, compared to CHARMM36 force field, is more accurate in describing the folding cooperativity of (AAQAA) 3 peptide, which can be attributed to enhanced backbone dipole moments in the helix state [66]. Additional efficiency and accuracy desired forstudies are still needed to show the necessity of considering polarizable force fields in IDP simulations [57][58][59], bywhere systematic optimization of key physical paramethe significantly higher computational cost adds to the challenge of generating converged ensembles [63]. Existeing compars such as atomic radii to balance solvation and intisons suggest that polarizable force fields, including AMOEBA and Drude models, still frequently have problems in reproducing the nature structures and folding of proteins [67,68,69]. For examoleculaple, stronger protein–water interactions. Applied to various model IDPs with extensive experime in polarizable force fields can destabilize the native protein structure, in opposition to the observations from nonpolarizable force fields where protein–water interactions have traditionally been underestimated [42]. Nonetal data, implicit solvent simulations have providedheless, it can be anticipated that polarizable force fields will continue to be improved and become increasingly important insi for simulating IDP structure and interactions.

4. Enhanced Sampling Methods for Sampling IDP Conformational Ensembles

Enhanced sampling techts on detailed conformational properties of the unbound state and how these properties maniques generally accelerate the crossing of energy barriers to achieve better sampling efficiency, such as by introducing bias potentials, modifying the potential energy itself, and changing the effective temperature. These techniques have proven essential in atomistic simulations of IDPs [70,71], yielding support functionlevels of convergence that could not be achieved even with drastically longer standard constant-temperature MD [32][33][60][61][62]simulations [13].

The central idea oarse-graining has also remained an attractive and often effef biased MD simulations is similar to importance sampling in MC simulations, where a biased potential is introduced to construct a flat free energy landscape along single or multiple collective strategy for extending the accessible time and lengtvariables of interest, such that many states can be readily sampled due to the removal of free energy barriers. The replica-exchange (REX) class of sampling methods, particularly replica exchange molecular dynamics (REMD), has been one of the most popular methods for simulating protein conformations. Figure 2 sh-owscales of the general scheme of REMD simulations. By grouping, where the key point is to first set up multiple (protein) atoms into CG beads and using simplified potential energy functions, CG modeling does not only reduce the system size, often by ~10-fold, but also allows muchreplicas with different unitless unbiased or biased potentials, given as the energy over k B T ( T is the temperature), and then use the Metropolis rule to allow MC to exchange the replicas and maintain the detailed balance. A key advantage of using multiple replicas and maintaining detailed balance is avoiding the reweighting problem generally required for biased simulations. Note that virtually all biased sampling strategies can be readily incorporated within the REX framework to benefit from both classes of enhanced sampling, including metadynamics (MTD) [72,73], accelarger MD erated MD (aMD) [74], umbrella sampling (US) [75,76], and integrated temperiong sampling [77]. In practime steps up to 20 fs. Together, many CG models can be several orders of magnitude more ce, effective REMD protocols require a proper choice of (1) the optimal number of replicas and proper distributions of conditions, to ensure a uniform exchange acceptance rate and efficient than atomistic ones. Numerous CG models have achirandom walk in the condition space, and (2) the choice of those unitless (biased) potentials for effective conformational diffusion at each condition [78]. Here, we dived varying levels of success in studies of protein folding, biide various enhanced sampling strategies into two general groups depending on the need for collective variables and discuss their recent applications to IDP conformational sampling. These methods are summarized Table 1 .

MTD and ing, and assemblyts variants have been [43][63]. Nconsidetheless, there arered one of the most important distinccollective variables (CV)-based sampling methods for protein simulations [90]. MTD uses a history-dependent bias potetween the conformational properties between globulantial, which is generally a sum of Gaussians, to eventually construct a flat free energy landscape along the predetermined CV(s). A well-tempered MTD (WT-MTD) was later developed to increase the convergence, by gradually reducing the size of Gaussians based on the total accumulated bias potential [72,73]. Furthermore, proteins and IDPs, as well as the relative importathe parallel tempering MTD (PT-MTD) and the combinations with other biased sampling methods have been also developed to increase the sampling efficiency and convergence of electfree energy calculations [91,92]. Reproestatic, hydrophobic, and hydrogen-bonding interactions in governing theirentative examples include the PT-MTD that combines WT-MTD with PT or bias-exchange MTD that uses a different CV in each replica, rather than exchanging the temperatures. For example, the PT-WTD and bias-exchange MTD has been employed to obtain the conformational equilibria. Therefore, CGensembles and coupled binding and folding of disordered pKID and KID proteins, using the α-score of helical structures as CVs [79]. It has modeals optimized for the folded proteins are generally not suitable for the IDP simulationso been shown that the REMD-based MTD, compared to conventional MTD or T-REMD, can enhance the conformational sampling of N-Glycans using dihedral angles as CVs to characterize the global motions [93]. ItThe is often necessary to readjust the parameters of protein–protein and protein–solvent inbinding mechanism of two disordered peptides, NRF2 and PTMA, was simulated by the WT-MTD, and the results showed that the WT-MTD method could provide converged free energy profiles with 1.5 μs of sampling time [94]. Togetherac, these applications or add new terms for more accurate dehave shown that MTD-class of sampling methods can be effectively applied to IDP simulations. Beside MTD, another important class of CV-based sampling strategy is the US method [76]. US is not sctription of IDP conformations.

Long-time simulations of IDP interactions and assembly

Coctly an enhanced sampling method like MTD. It typically uses multiple harmonic potentials to focus on sampling varioutational sts states along the collective variables of interest. US is often combined with REMD in studies of IDP interaction and assembly are even more demanding. The confors, as illustrated in a recent 2D window-exchange US simulation of the coupled folding and binding mechanism of HdeA homodimer [80]. The simulational equilibrium of an IDP can respond sensiti was able to capture rare unfolding transitions of the dimer at neutral pH and provided a detailed description of the transition pathways.

REST has provelyn to specific and nonspecific binding, potentially shifting from abe one of the most reliable choices for enhanced sampling of protein folding and particularly disordered to somewhconformational ensembles [113,114]. Sugita atnd ordered state or co-workers leveraged gREST to target the dihedral-angle energy term and successfully folded state. In principle, simulations cousampled folding transitions of beta-hairpins and Trp-cage in explicit water, using fewer replicas but covering wider conformational space compared to REST2 [84]. Walsh et al. applied provide tREST to investigate n16N disordered peptide conformational ensembles [115]. The conformuch-needed spatial and time resolutions to elucidate the kinetics and thermodynamics of coupled folding andations obtained via REST methods showed a high consistency with NMR experimental data. Furthermore, REST are specifically appropriate in simulating IDRs as the disordered region can be targeted in REST without tempering the well-structured region (or water). Zhou and co-workers studied the disordered loop of Staphylococcus aureus sortase A (SrtA) to order transition upon binding procto calcium [116]. Chessesn and Liu characterize the mechanistic featuresd Bcl-xL interfacial conformational dynamics in explicit solvent [117]. HBowever, the challenge isth works directly showed that this coupled process of folding and binding is a complex reaction involving the formation of many noncovalent interactionsREST covered broader conformational spaces for intrinsically disordered regions and led to faster convergence compared to either standard MD or T-REMD simulations. REST simulations have also been successfully integrated with experiments to study how cancer-associated mutations and drug molecules may modulate the disordered ensembles of p53-TAD and Aβ peptides in recent years [118,119,120,121].

Despite wthich requires extremely e success of REST for CV-free enhanced sampling, it does not benefit from targeted acceleration along simulationspecific CVs that are known to be rate limiting. For this, REST (or REX in generally beyond the current capabilities at the atomistic level. As such, coarse-grained model) has been combined with CV-based enhanced sampling to maximize the efficiency of sampling the complex, high dimensional conformational space of proteins. Some of the examples are generally required for computadiscussed in the sections above. Here, we note a couple additional studies of IDP interaction and assembly. For example, they have been used to study many biolrecent examples. By integrating free energy perturbation (FEP) and REST methods, Abel et al. obtained more thorough samplings of different ligand conformations around the active site and realized relative binding affinity predictions [122]. Okamotogi and cal processes, including liquid-liquid phase transitions (LLPS) that are frequently mediated by IDPo-workers have applied the REUS/REST two-dimensional replica-exchange method to predict two protein–ligand complex systems with the help of REST to weaken the solute–solvent interactions but improve the binding events and REUS to enhance the sampling along with the reaction coordinates [29][45][64][65][87]. 

ScholarVision Creations