In the context of chemistry and molecular modelling, a force field is a computational method that is used to estimate the forces between atoms within molecules and also between molecules. More precisely, the force field refers to the functional form and parameter sets used to calculate the potential energy of a system of atoms or coarse-grained particles in molecular mechanics, molecular dynamics, or Monte Carlo simulations. The parameters for a chosen energy function may be derived from experiments in physics and chemistry, calculations in quantum mechanics, or both. Force fields are interatomic potentials and utilize the same concept as force fields in classical physics, with the difference that the force field parameters in chemistry describe the energy landscape, from which the acting forces on every particle are derived as a gradient of the potential energy with respect to the particle coordinates. All-atom force fields provide parameters for every type of atom in a system, including hydrogen, while united-atom interatomic potentials treat the hydrogen and carbon atoms in methyl groups and methylene bridges as one interaction center. Coarse-grained potentials, which are often used in long-time simulations of macromolecules such as proteins, nucleic acids, and multi-component complexes, sacrifice chemical details for higher computing efficiency.
The basic functional form of potential energy in molecular mechanics includes bonded terms for interactions of atoms that are linked by covalent bonds, and nonbonded (also termed noncovalent) terms that describe the long-range electrostatic and van der Waals forces. The specific decomposition of the terms depends on the force field, but a general form for the total energy in an additive force field can be written as [math]\displaystyle{ E_{\text{total}} = E_{\text{bonded}} + E_{\text{nonbonded}} }[/math] where the components of the covalent and noncovalent contributions are given by the following summations: [math]\displaystyle{ E_{\text{bonded}} = E_{\text{bond}} + E_{\text{angle}} + E_{\text{dihedral}} }[/math] [math]\displaystyle{ E_{\text{nonbonded}} = E_{\text{electrostatic}} + E_{\text{van der Waals}} }[/math]
The bond and angle terms are usually modeled by quadratic energy functions that do not allow bond breaking. A more realistic description of a covalent bond at higher stretching is provided by the more expensive Morse potential. The functional form for dihedral energy is variable from one force field to another. Additional, "improper torsional" terms may be added to enforce the planarity of aromatic rings and other conjugated systems, and "cross-terms" that describe the coupling of different internal variables, such as angles and bond lengths. Some force fields also include explicit terms for hydrogen bonds.
The nonbonded terms are computationally most intensive. A popular choice is to limit interactions to pairwise energies. The van der Waals term is usually computed with a Lennard-Jones potential and the electrostatic term with Coulomb's law. However, both can be buffered or scaled by a constant factor to account for electronic polarizability. Studies with this energy expression have focused on biomolecules since the 1970s and were generalized to compounds across the periodic table in the early 2000s, including metals, ceramics, minerals, and organic compounds.[1]
As it is rare for bonds to deviate significantly from their reference values, the most simplistic approaches utilize a Hooke's law formula: [math]\displaystyle{ E_{\text{bond}} = \frac{k_{ij}}{2} (l_{ij}-l_{0,ij})^2 }[/math] where [math]\displaystyle{ k_{ij} }[/math] is the force constant, [math]\displaystyle{ l_{ij} }[/math] is the bond length and [math]\displaystyle{ l_{0,ij} }[/math] is the value for the bond length between atoms [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] when all other terms in the force field are set to 0. The term [math]\displaystyle{ l_{0,ij} }[/math] is often referred to as the equilibrium bond length, which may cause confusion. The equilibrium bond length is the value adopted in equilibrium at 298 K with all other force field terms and kinetic energy contributing. Therefore, [math]\displaystyle{ l_{0,ij} }[/math] is often a few percent different from the actual bond length in experiments at 298 K.[1]
The bond stretching constant [math]\displaystyle{ k_{ij} }[/math] can be determined from the experimental Infrared spectrum, Raman spectrum, or high-level quantum mechanical calculations. The constant [math]\displaystyle{ k_{ij} }[/math] determines vibrational frequencies in molecular dynamics simulations. The stronger the bond is between atoms, the higher is the value of the force constant, and the higher the wavenumber (energy) in the IR/Raman spectrum. The vibration spectrum according to a given force constant can be computed from short MD trajectories (5 ps) with ~1 fs time steps, calculation of the velocity autocorrelation function, and its Fourier transform.[2]
Though the formula of Hooke's law provides a reasonable level of accuracy at bond lengths near the equilibrium distance, it is less accurate as one moves away. In order to model the Morse curve better one could employ cubic and higher powers.[3][4] However, for most practical applications these differences are negligible and inaccuracies in predictions of bond lengths are on the order of the thousandth of an angstrom, which is also the limit of reliability for common force fields. A Morse potential can be employed instead to enable bond breaking and higher accuracy, even though it is less efficient to compute.
Electrostatic interactions are represented by a Coulomb energy, which utilizes atomic charges [math]\displaystyle{ q_i }[/math] to represent chemical bonding ranging from covalent to polar covalent and ionic bonding. The typical formula is the Coulomb law: [math]\displaystyle{ E_{\text{Coulomb}} = \frac{1}{4\pi\varepsilon_0}\frac{q_i q_j}{r_{ij}} }[/math] where [math]\displaystyle{ r_{ij} }[/math] is the distance between two atoms [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math]. The total Coulomb energy is a summation over all pairwise combinations of atoms and usually excludes 1, 2 bonded atoms, 1, 3 bonded atoms, as well as 1, 4 bonded atoms.[5][6][7]
Atomic charges can make dominant contributions to the potential energy, especially for polar molecules and ionic compounds, and are critical to simulate the geometry, interaction energy, as well as the reactivity. The assignment of atomic charges often still follows empirical and unreliable quantum mechanical protocols, which often lead to several 100% uncertainty relative to physically justified values in agreement with experimental dipole moments and theory.[8][9][10] Reproducible atomic charges for force fields based on experimental data for electron deformation densities, internal dipole moments, and an Extended Born model have been developed.[1][10] Uncertainties <10%, or ±0.1e, enable a consistent representation of chemical bonding and up to hundred times higher accuracy in computed structures and energies along with physical interpretation of other parameters in the force field.
In addition to the functional form of the potentials, force fields define a set of parameters for different types of atoms, chemical bonds, dihedral angles, out-of-plane interactions, nonbond interactions, and possible other terms.[1] Many parameter sets are empirical and some force fields use extensive fitting terms that are difficult to assign a physical interpretation.[11] Atom types are defined for different elements as well as for the same elements in sufficiently different chemical environments. For example, oxygen atoms in water and an oxygen atoms in a carbonyl functional group are classified as different force field types.[12] Typical force field parameter sets include values for atomic mass, atomic charge, Lennard-Jones parameters for every atom type, as well as equilibrium values of bond lengths, bond angles, and dihedral angles.[13] The bonded terms refer to pairs, triplets, and quadruplets of bonded atoms, and include values for the effective spring constant for each potential. Most current force fields parameters use a fixed-charge model by which each atom is assigned one value for the atomic charge that is not affected by the local electrostatic environment.[10][14]
Force field parameterizations for simulations with maximum accuracy and transferability, e.g., IFF, follow a well-defined protocol.[1] The workflow may involve (1) retrieving an x-ray crystal structure or chemical formula, (2) defining atom types, (3) obtaining atomic charges, (4) assigning initial Lennard-Jones and bonded parameters, (5) computational tests of density and geometry relative to experimental reference data, (6) computational tests of energetic properties (surface energy,[15] hydration energy[16]) relative to experimental reference data, (7) secondary validation and refinement (thermal, mechanical, and diffusion properties).[17] Major iterative loops occur between steps (5) and (4), as well as between (6) and (4)/(3). The chemical interpretation of the parameters and reliable experimental reference data play a critical role.
The parameters for molecular simulations of biological macromolecules such as proteins, DNA, and RNA were often derived from observations for small organic molecules, which are more accessible for experimental studies and quantum calculations. Thereby, multiple issues arise, such as (1) unreliable atomic charges from quantum calculations may affect all computed properties and internal consistency, (2) data different derived from quantum mechanics for molecules in the gas phase may not be transferable for simulations in the condensed phase, (3) use of data for small molecules and application to larger polymeric structures involves uncertainty, (4) dissimilar experimental data with variation in accuracy and reference states (e.g. temperature) can cause deviations. As a result, divergent force field parameters have been reported for biological molecules. Experimental reference data included, for example, the enthalpy of vaporization (OPLS), enthalpy of sublimation, dipole moments, and various spectroscopic parameters.[4][12][18] Inconsistencies can be overcome by interpretation of all force field parameters and choosing a consistent reference state, for example, room temperature and atmospheric pressure.[1]
Several force fields also include no clear chemical rationale, parameterization protocol, incomplete validation of key properties (structures and energies), lack of interpretation of parameters, and of a discussion of uncertainties.[19] In these cases, large, random deviations of computed properties have been reported.
Some force fields include explicit models for polarizability, where a particle's effective charge can be influenced by electrostatic interactions with its neighbors. Core-shell models are common, which consist of a positively charged core particle, representing the polarizable atom, and a negatively charged particle attached to the core atom through a springlike harmonic oscillator potential.[20][21][22] Recent examples include polarizable models with virtual electrons that reproduce image charges in metals[23] and polarizable biomolecular force fields.[24] By adding such degrees of freedom for polarizability, the interpretation of the parameters becomes more difficult and increases the risk towards arbitrary fit parameters and decreased compatibility. The computational expense increases due to the need to repeatedly calculate the local electrostatic field.
Polarizable models perform well when it captures essential chemical features and the net atomic charge is relatively accurate (within ±10%).[1][25] In recent times, such models have been erroneously called "Drude Oscillator potentials".[26] An appropriate term for these models is "Lorentz oscillator models" since Lorentz[27] rather than Drude[28] proposed some form of attachment of electrons to nuclei.[23] Drude models assume unrestricted motion of the electrons, e.g., a free electron gas in metals.[28]
Historically, many approaches to parameterization of a forcefield have been employed. Numerous classical forcefields relied on relatively intransparent parameterization protocols, for example, using approximate quantum mechanical calculations, often in the gas phase, with the expectation of some correlation with condensed phase properties and empirical modifications of potentials to match experimental observables.[29][30][31] The protocols may not be reproducible and semi-automation often played a role to generate parameters, optimizing for speedy parameter generation and wide coverage, and not for chemical consistency, interpretability, reliability, and sustainability.
Similar, even more automated tools have become recently available to parameterize new force fields and assist users to develop their own parameter sets for chemistries which are not parameterized to date.[32][33] Efforts to provide open source codes and methods include openMM and openMD. The use of semi-automation or full automation, without input from chemical knowledge, is likely to increase inconsistencies at the level of atomic charges, for the assignment of remaining parameters, and likely to dilute the interpretability and performance of parameters.
The Interface force field (IFF) assumes one single energy expression for all compounds across the periodic (with 9-6 and 12-6 LJ options) and utilizes rigorous validation with standardized simulation protocols that enable full interpretability and compatibility of the parameters, as well as high accuracy and access to unlimited combinations of compounds.[1]
Functional forms and parameter sets have been defined by the developers of interatomic potentials and feature variable degrees of self-consistency and transferability. When functional forms of the potential terms vary, the parameters from one interatomic potential function can typically not be used together with another interatomic potential function.[17] In some cases, modifications can be made with minor effort, for example, between 9-6 Lennard-Jones potentials to 12-6 Lennard-Jones potentials.[7] Transfers from Buckingham potentials to harmonic potentials, or from Embedded Atom Models to harmonic potentials, on the contrary, would require many additional assumptions and may not be possible.
All interatomic potentials are based on approximations and experimental data, therefore often termed empirical. The performance varies from higher accuracy than density functional theory calculations, with access to million times larger systems and time scales, to random guesses depending on the force field.[34] The use of accurate representations of chemical bonding, combined with reproducible experimental data and validation, can lead to lasting interatomic potentials of high quality with much less parameters and assumptions in comparison to DFT-level quantum methods.[35][36]
Possible limitations include atomic charges, also called point charges. Most force fields rely on point charges to reproduce the electrostatic potential around molecules, which works less well for anisotropic charge distributions.[37] The remedy is that point charges have a clear interpretation,[10] and virtual electrons can be added to capture essential features of the electronic structure, such additional polarizability in metallic systems to describe the image potential, internal multipole moments in π-conjugated systems, and lone pairs in water.[38][39][40] Electronic polarization of the environment may be better included by using polarizable force fields[41][42] or using a macroscopic dielectric constant. However, application of one value of dielectric constant is a coarse approximation in the highly heterogeneous environments of proteins, biological membranes, minerals, or electrolytes.[43]
All types of van der Waals forces are also strongly environment-dependent because these forces originate from interactions of induced and "instantaneous" dipoles (see Intermolecular force). The original Fritz London theory of these forces applies only in a vacuum. A more general theory of van der Waals forces in condensed media was developed by A. D. McLachlan in 1963 and included the original London's approach as a special case.[44] The McLachlan theory predicts that van der Waals attractions in media are weaker than in vacuum and follow the like dissolves like rule, which means that different types of atoms interact more weakly than identical types of atoms.[45] This is in contrast to combinatorial rules or Slater-Kirkwood equation applied for development of the classical force fields. The combinatorial rules state that the interaction energy of two dissimilar atoms (e.g., C...N) is an average of the interaction energies of corresponding identical atom pairs (i.e., C...C and N...N). According to McLachlan's theory, the interactions of particles in media can even be fully repulsive, as observed for liquid helium,[44] however, the lack of vaporization and presence of a freezing point contradicts a theory of purely repulsive interactions. Measurements of attractive forces between different materials (Hamaker constant) have been explained by Jacob Israelachvili.[44] For example, "the interaction between hydrocarbons across water is about 10% of that across vacuum".[44] Such effects are represented in molecular dynamics through pairwise interactions that are spatially more dense in the condensed phase relative to the gas phase and reproduced once the parameters for all phases are validated to reproduce chemical bonding, density, and cohesive/surface energy.
Limitations have been strongly felt in protein structure refinement. The major underlying challenge is the huge conformation space of polymeric molecules, which grows beyond current computational feasibility when containing more than ~20 monomers.[46] Participants in Critical Assessment of protein Structure Prediction (CASP) did not try to refine their models to avoid "a central embarrassment of molecular mechanics, namely that energy minimization or molecular dynamics generally leads to a model that is less like the experimental structure".[47] Force fields have been applied successfully for protein structure refinement in different X-ray crystallography and NMR spectroscopy applications, especially using program XPLOR.[48] However, the refinement is driven mainly by a set of experimental constraints and the interatomic potentials serve mainly to remove interatomic hindrances. The results of calculations were practically the same with rigid sphere potentials implemented in program DYANA[49] (calculations from NMR data), or with programs for crystallographic refinement that use no energy functions at all. These shortcomings are related to interatomic potentials and to the inability to sample the conformation space of large molecules effectively.[50] Thereby also the development of parameters to tackle such large-scale problems requires new approaches. A specific problem area is homology modeling of proteins.[51] Meanwhile, alternative empirical scoring functions have been developed for ligand docking,[52] protein folding,[53][54][55] homology model refinement,[56] computational protein design,[57][58][59] and modeling of proteins in membranes.[60]
It was also argued that some protein force fields operate with energies that are irrelevant to protein folding or ligand binding.[41] The parameters of proteins force fields reproduce the enthalpy of sublimation, i.e., energy of evaporation of molecular crystals. However, protein folding and ligand binding are thermodynamically closer to crystallization, or liquid-solid transitions as these processes represent freezing of mobile molecules in condensed media.[61][62][63] Thus, free energy changes during protein folding or ligand binding are expected to represent a combination of an energy similar to heat of fusion (energy absorbed during melting of molecular crystals), a conformational entropy contribution, and solvation free energy. The heat of fusion is significantly smaller than enthalpy of sublimation.[44] Hence, the potentials describing protein folding or ligand binding need more consistent parameterization protocols, e.g., as described for IFF. Indeed, the energies of H-bonds in proteins are ~ -1.5 kcal/mol when estimated from protein engineering or alpha helix to coil transition data,[64][65] but the same energies estimated from sublimation enthalpy of molecular crystals were -4 to -6 kcal/mol,[66] which is related to re-forming existing hydrogen bonds and not forming hydrogen bonds from scratch. The depths of modified Lennard-Jones potentials derived from protein engineering data were also smaller than in typical potential parameters and followed the like dissolves like rule, as predicted by McLachlan theory.[41]
Different force fields are designed for different purposes. All are implemented in various computers software.
MM2 was developed by Norman Allinger mainly for conformational analysis of hydrocarbons and other small organic molecules. It is designed to reproduce the equilibrium covalent geometry of molecules as precisely as possible. It implements a large set of parameters that is continuously refined and updated for many different classes of organic compounds (MM3 and MM4).[67][68][69][70][71]
CFF was developed by Arieh Warshel, Lifson, and coworkers as a general method for unifying studies of energies, structures, and vibration of general molecules and molecular crystals. The CFF program, developed by Levitt and Warshel, is based on the Cartesian representation of all the atoms, and it served as the basis for many subsequent simulation programs.
ECEPP was developed specifically for the modeling of peptides and proteins. It uses fixed geometries of amino acid residues to simplify the potential energy surface. Thus, the energy minimization is conducted in the space of protein torsion angles. Both MM2 and ECEPP include potentials for H-bonds and torsion potentials for describing rotations around single bonds. ECEPP/3 was implemented (with some modifications) in Internal Coordinate Mechanics and FANTOM.[72]
AMBER, CHARMM, and GROMOS have been developed mainly for molecular dynamics of macromolecules, although they are also commonly used for energy minimizing. Thus, the coordinates of all atoms are considered as free variables.
Interface Force Field (IFF)[73] was developed as the first consistent force field for compounds across the periodic table. It overcomes the known limitations of assigning consistent charges, utilizes standard conditions as a reference state, reproduces structures, energies, and energy derivatives, and quantifies limitations for all included compounds.[1][74] It is compatible with multiple force fields to simulate hybrid materials (CHARMM, AMBER, OPLS-AA, CFF, CVFF, GROMOS).
The set of parameters used to model water or aqueous solutions (basically a force field for water) is called a water model. Water has attracted a great deal of attention due to its unusual properties and its importance as a solvent. Many water models have been proposed; some examples are TIP3P, TIP4P,[118] SPC, flexible simple point charge water model (flexible SPC), ST2, and mW.[119] Other solvents and methods of solvent representation are also applied within computational chemistry and physics some examples are given on page Solvent model. Recently, novel methods for generating water models have been published.[120]