1. Introduction
The essence of Molecular Simulations (MS) is a statistical mechanics and numerical method governed by the Newtonian laws of motion
[1] for molecular properties, i.e., velocity, position, and energy, towards insights of molecular system while retaining macro-system physio-chemical properties. Two factors have promoted the increased application of molecular simulations over the years (
Figure 1). One is the growing availability of experimentally determined protein structures, such as membrane proteins (ion channels, neurotransmitters and GPCRs etc.)
[2][3], the other is the wide availability of graphics processing units (GPUs), which allows running simulations locally. MS typically analyses protein structure at a minimum of nano to micro-second time scale to reveal the dynamic nature of protein molecules covering a wide variety of biomolecular processes, such as conformational change, ligand binding and protein folding. Among the numerous approaches to MS, the Monte Carlo (MC) Simulation sampling method and the molecular Dynamics (MD) Simulation method are the two common methods. The basic concept of MCS is to generate an ensemble of conformation under specific thermodynamics conditions through stochastic approach; whereas the concept of MD Simulation is to iterate a time-dependent Newtonian equation of motions for hard sphere particles in a system
[4][5], which can provide an ensemble of thermodynamic properties.
Figure 1. The growing use of MD Simulation studies over the years as reflected by publication (1980–2021). Data was from Web of Science.
2. A Brief History of Molecular Simulations
MS was first introduced in 1949 by Metropolis et al. to study particle interaction
[6]. Metropolis proposed a probabilistic approach to approximate the “properties” of a set of particles
[6]. Instead of treating particles as individuals, simulation was applied to measure the interactions of all particles until they reach equilibrium by the governing laws. Its success inspired the development of MS by Alder and Wainwright in 1959
[7]. The early MS algorithm used a rudimentary electronic computer to iterate atom collision. Each atom was assigned an initial velocity and position. Based on the elastic collision, the MS algorithm was applied to simulate attraction and repulsion of particles. In 1964, Rahman et al. published the first study in using MS to analyze liquid Argon
[8]. Their work demonstrated that MS was indeed possible to analyze Lennard Jones potential for interactions between Argon atoms. In 1971, Rahman and Stilinger reported their MS study on modelling liquid water, a system composed of molecules not just atoms
[9]. Their work demonstrated that differing from its solid phases structure, liquid water consists of a random network of hydrogen bonds. In 1976, Warshel and Levitt expanded MS by integrating quantum mechanics and molecular mechanics (QM/MM) to study lysozyme reaction by proposing the exchange of the classical charge of atom
i and
j with quantum mechanics calculations
[10]. In 1977, Karplus and collaborators first used MS to study protein by using constraint method to freeze out fast-degree freedom to reach longer simulation time
[11][12]. Their study led to the Noble Prize in Chemistry awarded to Warshel, Levitt and Karplus in 2013 for the development of multiscale models for complex chemical systems
[10]. Anderson et al. in 1980 used MS to sample the isoenthalpic (constant pressure) ensemble. Anderson’s solution to achieve constant pressure in MD Simulation sampling was to extend dynamic variable by including volume
[13]. Parrinello and Rahman showed that the scheme can be generalized to include shape and volume fluctuations by using Lagrangian mechanics. This made it possible to study the issues such as crystallization and solid–solid phase transition
[14]. Their idea of extending the system dynamic variables was to assume that the system exchanges energy with a fictitious pressure or temperature reservoir. Their method took into consideration the dielectric effect caused by the atomic polarizability and increased the accuracy of the binding site. In 1985, Car and Parrinello pioneered a scheme of combining MS with direct calculation of electronic structure by means of Density Function Theory (DFT). This work was important as it indicated the possibility of combining finite temperature into simulation for electronic structure calculations, which was not possible before
[15]. During 1980s and 1990s, MS approach witnessed a rise in studies of condensed matter with growing access of enhanced computing power; further leading to the challenges of phase equilibria. Moreover, to address these challenges Panagiotopolus revised the MC algorithm, known as
Gibbs ensemble Monte Carlo, to distinguish the phase equilibria approach that only require to simulate the involved phases but by-pass the interface
[16]. Novel algorithms such as
blue moon ensemble [17] hyper-MD
[18] as well as advanced theoretical methods such as Nudged-Elastic Band
[19] and String
[20] were devised to address the challenges of time-scales (long-time dynamics of protein folding) and rare events. Further, the advancement in quantum programs outside chemistry field and the Noble prize in Chemistry 1998 being divided equally between Walter Kohn “for his development of density-function theory” and John A. Pople “for his development of computational methods in quantum chemistry” led to form a unified approach for molecular dynamics and density-function theory. Over the following years, time-dependent density-function theory (TDDFT) further enhanced the accuracy of large-scale simulations of excited state dynamics
[21][22][23]. TDDFT-MD coupled simulations to simulate excited state dynamics of biomolecules and other nanostructures achieves high accuracy through utilizing small number of basic function thereby significantly reduced the memory requirements and computation time compared to plane-wave and real-space grid bases
[24]. Furthermore, utilizing multiple computer processors in parallel for MD force calculations substantially enhanced with IBM’s Blue Matter code for its Blue Gene/L general-purpose supercomputer
[25], resulting in improved parallel performances for the widely used MD platforms NAMD
[26] GROMACS
[27] AMBER
[28]. Increasing innovation and with advent of GPU (Graphics processing units) and special-purpose processors such as Anton (parallel supercomputer to enable fast MD simulations) having computing power to perform up to 20 μs/day
[29] further accelerated the simulation study in different biochemical processes. However, long-timescale simulations requires stringent force field (discussed in following section) compared with short-timescale simulations. To conclude this brief history of MS, it would be appropriate to remark that MS has clearly established itself as a key scientific instrument driven by enhanced computing power, fast and efficient algorithms and force fields (FF) are demonstrated by growing number of publications utilizing both experiments and simulation tools. Major breakthroughs over the years in MS studies are shown in
Figure 2.
Figure 2. The Molecular Simulations timeline showing the breakthrough achievements in MD Simulation studies.
3. Basic Concept of Force Field
Currently, it is a routine to simulate proteins with hundreds of amino acid residues at 10–100 ns surrounded by water and salt
[30][31][32]. User-friendly platforms are widely available, i.e., GROMACS
[33], AMBER
[28], vCHARMM
[34], DL_POLY
[35], NAMD
[26], LAMMPS
[36] have been developed for MD Simulations analysis. The output of the platforms can be visualized and analyzed by external software, i.e., VMD
[37], Chimera
[38]. However, robust simulation requires appropriate parameters for studying a physical system. Force field, a set of mathematical expressions and parameters to describe the inter- and intra- molecular forces, are also essential to describe a physical system.
Three major molecular models have been developed: all-atom
[39][40], coarse grained (CG)
[41][42] and all-atom/coarse-grain mixed models
[43][44][45] (
Table 1). The all-atom force field for MD Simulation of lipid bilayers includes CHARMM, AMBER and OPLS-AA. GROMOS is an atomistic force field with an exception such as CH
n modelled as united-atoms
[46]. CHARMM (Chemistry at HARvard Macromolecular Mechanics) forcefield for lipids is widely used for simulating lipid bilayer and membrane proteins
[47][48]. CHARMM force field is continuously updating and improving with the most recent version of CHARMM36m
[49]. CHARMM36 lipid forcefield is parameterized for lipids
[39], CHARMM36 DNA and CHARMM36 RNA are parameterized for DNA and RNA
[50][51], CHARMM36m is parameterized for protein, and CHARMM General Force Field (CGenFF) is parameterized for drugs and general usage
[52]. AMBER (Assisted Model Building with Energy Refinement) forcefield was developed in parallel. It treats all hydrogen atoms explicitly as CHARMM
[53]. AMBER was designed and parameterized for specific biological systems: AMBER lipids 21 was parameterized for lipids
[54]; AMBERff19SB was parameterized for proteins
[55]; AMBER OL15 and AMBER OL3 were parameterized for DNA and RNA
[56][57]; General AMBER forcefield (GAFF) was parameterized for drugs and general usage
[58]; OPLS-AA (Optimized Parameters for Liquid Simulations All Atom)
[59] was initially designed for simulating thermo-dynamical properties of short-chain hydrocarbons alkanes and later expanded to include lipids through a parameter set called OPLS/L
[60], although the availability of lipids in the OPLS/L forcefield has not been as diverse as that of CHARMM and AMBER-compatible force fields. The latest improvement of OPLS-AA/M was its modification for peptides and protein torsional energetics
[61]. The GROningen Molecular Simulation (GROMOS) forcefield utilizes a different approach for simulating analysis by fitting the parameters against experimental thermo-dynamic data. Its forcefield was generalized into a single package. The latest version is GROMOS 54A8 package updated in 2012
[62].
Table 1. Atomistic and coarse-grained forcefield in MD Simulations.
Compared to all-atom models, coarse-grained models significantly reduce the computing time by decreasing the number of particles explicitly during simulations. Over the last decade, coarse-grained model has also been widely used in protein
[65] and nucleic acid studies
[66][67]. Different coarse-grained models have been developed to extend the timescale of the simulation, since the first model used the concept of coarse grain in 1975 by Levitt and Warshal
[68]. One of the most popular models is the MARTINI for membrane proteins
[42], in which several atoms in protein and lipid are approximated as a single bead and four water molecules are treated as a single particle (known as one bead 4:1 mapping) although the beads can differ by their polarity or hydrophilicity. For particular cases, smaller beads can also be used, such as 3:1 and 2:1 mapping
[69]. In MARTINI version 2.2, beads classified into 18 types are categorized into four groups: Q (charged), P (polar), N (intermediate) and C (apolar). In the latest version MARTINI 3, 29 beads have been sorted into seven groups with additional groups of halo-compounds (X), divalent ions (D) and water (W)
[70]. MARTINI ELNEDIN model modified by utilizing an elastic network, with the peptide backbone beads position on the Cα atoms and heavier bead mass, improves the conformation transition in simulation
[70]. MARTINI-Dry version provides an implicated solvation model
[71]. The Born model is another model where the effects of the solvent and membrane are included implicitly in the simulation
[72][73]. Implicit solvent forcefield is less used as it can cause significant errors due to it smoothen energy landscapes, which causes protein structure to deviate from the experimental crystal structure
[74][75]. Coarse-grained protein models have been mainly used for analyzing protein folding mechanism and protein structure prediction
[76][77]. Every alternate year, the CASP (Critical Assessment of Protein Structure Prediction) experiments provide an excellent platform to test the performance of coarse-grained models for predicting structures
[78]. Several coarse-grained protein models apart from MARTINI are as follows: UNRES (united residue)
[79], AWSEM (associated memory, water mediated, structure and energy model)
[80], OPEP (optimized potential for efficient protein structure prediction)
[81], SURPASS (Single United Residue per Pre-Averaged Secondary Structure fragment)
[82] and CABS (C-alpha, c-beta, side chain)
[83] models have been increasingly utilized for protein folding, structure prediction and interactions. PRIMO
[84] and Scorpion
[85] (solvated coarse-grained protein interaction) models are increasingly used in peptide and small protein structure prediction and protein–protein solvated complexes. The Rosetta centroid mode (CEN) model developed by Rohl et al. is also one of the widely used coarse-grained protein models in CASP protein structure prediction, de novo blind predictions, protein–protein and protein–ligand docking and modelling of protein-DNA interaction
[86]. Coarse-grained models have been further utilized in nucleic acid molecular dynamics to analyze the three dimensional (3D) structural models of RNA
[87][88][89]. Ding et al. introduced the discrete molecular dynamics (DMD) utilizing coarse-grained model to rapidly explore the conformational folding of RNA molecules
[90]. Recently, Jonikas et al. have developed a fully automated coarse-grained model NAST (the nucleic acid simulation tool) using statistical potential capable enough to ensemble over 10,000 RNA plausible (3D) structures
[91].
4. Molecular Simulations in Protein Study
The importance of MS arises from the fact that biomolecules such as proteins are under a dynamic state of motion, which is essential for the function of biomolecules. Although multiple experimental techniques can reveal the structural features of biomolecules, they are often incapable to show the dynamic features. MS provides a means to model the flexibility and conformational changes in the biomolecule at atomistic level, which is difficult to achieve by experimental approaches
[11]. MS is more effective when combined with experiments to validate and improve the accuracy of experimental results. A key feature of MS is its ability to mimic both the in vitro and in vivo conditions, for example, at different pH conditions, in the presence of water and ions, at different salt or ionic concentrations, and in the presence of a lipid bilayer and other cellular components
[92]. MS has been used to study multiple protein-related issues, such as protein-binding, protein–protein interaction and signaling
[93]. The followings are examples.