The integration of experimental and computational methods can assist and enrich the interpretation of the experimental results, providing new detailed molecular understanding of the systems.
One of the main aims in molecular biochemistry is to obtain mechanistic insights into the function of biomolecules. To accomplish this, researchers must design experiments that provide new information about the molecule in question by using a variety of biochemical and biophysical techniques. Subsequently, the experimental data have to be correlated with the specific characteristics of the molecule under study. This process is sometimes a straightforward interpretation, but in many others cases, it is difficult to decipher the molecular meaning of the data. Consequently, one of the main roles of an experimentalist is to interpret the data to obtain new information on a specific molecular mechanism based on the results.
With the advent of new computational methods, one of the experimentalist desires is to be able to incorporate the experimental data into a detailed representation of the different mechanisms using in-silico modeling to assist and enrich the interpretation. This conjunction could provide researchers with a new detailed molecular understanding and allow for the proposal of more complete mechanisms.
Although the combination of computational methods and experiments has been used for a long time, for instance, computational approaches have relied on experimental data to calibrate the force fields, while experiments have used computing power to process and analyze the data, this review focused on the use of computational methods to assist in the interpretation of experimental results.
This combination of methods can use four major different strategies:
Experimental and computational protocols are performed independently, and then the results of both methods are compared. The first step in a molecular simulation would consist of sampling different conformations, which can be performed using a detailed atomic or coarse-grained representation (less detailed). The sampling protocols can be molecular dynamics (MD), Monte Carlo simulation (MC), or any other sampling technique. In the best case scenario, computational models and experimental data correlate and complement each other. However, on some occasions, the biomolecular process under investigation is a “rare” event, and therefore, successfully sampling this event using a simulation technique requires a global search of the entire conformational space, which could be challenging. To solve this problem, several variations to enhance the sampling of conformations such as replica exchange molecular dynamics, metadynamics, and accelerated MD have been developed. However, even with these advanced techniques, the sampling and accuracy of the generated structures are still bound by the limits of the force field and the theoretical model used, and sometimes the experimental data and the simulation do not correlate.
This independent approach has been by far the most explored method, and although extremely powerful, we are more interested in a more integrated approach.
In a guided simulation, data obtained through experiments are used to effectively guide the three-dimensional conformation sampling in the computational method. This is usually done by the addition of external energy terms related to the experimental data into the computational protocol (restraints). Each restraint has its target value (experimental distribution), against which the back-calculated values would be compared in each simulation step. Since the guided methods involve evaluating the models during the simulation, they need to be implemented directly in the software. This type of guided simulation has been used in several programs.
In a conceptually different strategy, the computation method is performed first to generate a large number of different conformation molecules (large ensemble), and then the experimental data are used to filter (search and select) the results. Only conformations that correlate with the experimental data are selected. The generation of the initial pool of conformations can be performed by any of the simulation sampling techniques already mentioned. Sometimes, even less computational demanding protocols are used, such as generating a large pool of random conformations or simulated annealing. Then, different protocols based on maximum entropy or maximum parsimony are used to select conformations that fit the data.
A different category of computational method would be molecular docking, which refers to methodologies that predict the final structure of a complex, starting with the structure of the two free molecules. Docking protocols are composed of two basic steps, a sampling algorithm to generate different binding conformations (poses) and a scoring process that assesses the quality of each pose. In guided docking, the experimental data are used to help define the binding sites. In principle, the experimental data can be used either in the sampling or the scoring process.
It should be noted that many of the experimental biophysical techniques report average values over many molecules and long periods of time. Consequently, a better correlation has often been observed with back-calculated data from an ensemble of conformations than with data from just a single conformer. All of the strategies listed above can be used to obtain an ensemble of conformations that are compatible with the set of experimental average values. Hence, a large number of programs have been created to select ensembles that fit the experimental data, differing in the way in which the initial ensemble is generated as well as in the algorithm used to search and select the final ensemble.
The use of one strategy over the others would depend on the specific characteristics of each study. However, we can list some of the advantages and disadvantages that would make it more likely to choose one approach over the other. The computational sampling in the independent approach is not restricted to sample a specific region of the conformational space and therefore can provide information on “unexpected” conformations. Additionally, if one is interested in the specific sequential pathways of a process, un-bias sampling can provide a plausible pathway based on the physical model in which the computational method is based. On the other hand, one of the main advantages of the guided simulation approach is that the restraints considerably limit the conformational space and, in principle, the “observed experimental” conformations are sampled more efficiently. The main disadvantage of this approach is that the experimental data have to be implemented as a restraint during the sampling, and this could be a difficult task and in most cases would require certain computational knowledge.
In the search and select approach, the sampling process is uncoupled and is performed independently of the experiential data, and consequently the integration of different methodologies and more than one experimental restraint is simpler. Furthermore, it is possible to incorporate new experimental data without the need to generate a new confrontational ensemble. One of the drawbacks would be that the initial pool must contain the “correct” conformations, and therefore it also requires a large sampling of the conformational space, however, several programs that easily generate a large pool of structure have been developed. Finally, if what one wants is to understand the formation of a complex, the best approach would probably be the use of guided docking.
In order to be able to integrate the experimental results into these approaches, it is necessary to compare the experimental data with a back-calculated value from the computational method.
Since the conformation of biomolecules undergoes variations with time and functional state, providing a detailed molecular description that incorporates these changes based solely on experimental results is a difficult task.
The integration of experimental data with computational techniques allows us to obtain a detailed interpretation of the results that would not be achievable using only experimental methods.
We are certain that the integration and applicability of some experimental techniques with computational methods are going to continue, and we anticipate new developments and integration with other experimental techniques.