AI-Based Nano-Scale Material Property Prediction for Li-Ion Batteries: Comparison
Please note this is a comparison between Version 1 by Mohit Anil Lal and Version 2 by Peter Tang.

The status quo for techniques used in the discovery of new and novel materials to enhance battery technologies has progressed from expensive and time-consuming empirical trial and error methods to the more recent first principles approach of using quantum mechanics (QM), Monte Carlo simulations and molecular dynamics (MD). QM calculations evaluate electron–electron interactions bby solving the complex Schrödinger equation, thereby enabling accurate results for a wide variety of properties. The emergence of ML, deep learning (DL) and artificial intelligence (AI) has helped alleviate the bottlenecks posed by QM and MD simulations and has made it possible to expand the scope of theour search for novel materials in the chemical compound space (CCS) .

  • quantum mechanics
  • molecular dynamics
  • artificial intelligence
  • machine learning
  • nanotechnology

1. Introduction

The chemical compound space (CCS) is the theoretical space consisting of every possible compound known (and unknown) to us [1][2][1,2]. Even some of our largest databases consisting of approximately 108 known substances are a mere drop in the ocean compared with an estimated

known substances are a mere drop in the ocean compared with an estimated

10180 180
substances that possibly make up the CCS [3][4][3,4]. Needless to say, the next big discovery of a compound that can revolutionize energy storage devices of the future is far from trivial.
The status quo for techniques used in the discovery of new and novel materials to enhance battery technologies has progressed from expensive and time-consuming empirical trial and error methods to the more recent first principles approach of using quantum mechanics (QM) [5][6][7][8][9][5,6,7,8,9], Monte Carlo simulations and molecular dynamics (MD) [10][11][12][13][14][10,11,12,13,14]. QM calculations evaluate electron–electron interactions bby solving the complex Schrödinger equation, thereby enabling accurate results for a wide variety of properties. However, the computational cost is a bottleneck for molecules larger than a couple hundred atoms. Hence, for multi-component or multi-layer structures such as the solid electrolyte interface layer, QM is not a feasible approach. Additionally, many battery components including ionic and polymer electrolytes, crystal structures and electrode–electrolyte interactions [11][15][16][17][18][11,15,16,17,18] are better analyzed on larger length and time scales that are inaccessible with QM. MD simulations simplify particle–particle interactions to five main types of interactions, namely nonbonded, bonded, angle, dihedral and improper interactions. These interactions, which can be obtained using a simple algebraic equation, reduce the computational cost significantly and are applicable to systems almost 106

times larger. To analyze ion migration in perovskite nickelate with 200 atoms, QM techniques, even using deensity functional theory (DFT) approximation to reduce computational costs, require about 105 core-hours of computational time in a picosecond range simulation. On the other hand, MD simulations with 105 atoms required only 104

core-hours of computational time [19]. Thus, MD simulations enable the analysis of a wide variety of properties and behavior of materials at the atomic scale, such as the crystal structure, thermal properties and mechanical properties, which are often too complex to model using QM calculations. In a recent review, Sun et al. [20] presented the use of MD simulations to optimize lithium metal batteries, investigating the transport structure of Li ions, the electrochemical process at the electronic, atomic or molecular level, the Li+ transport mechanism and the Li deposition behavior in detail.
Though MD simulations are widely used to investigate the properties of materials at the atomic level, these simulations rely on experimentally derived interatomic potential parameters that determine the forces between particles [21]. This dependence on prior experimental data poses a challenge in using MD to design new and novel materials. To address this issue, Lanjan et al. [22] recently proposed a novel computational framework that couples QM calculations with MD simulations. This generates a wide range of crystal structures by varying a single system parameter (e.g., bond length) while keeping other parameters relaxed at their minimum energy level. The QM calculations are then used to evaluate the system’s energy as a function of these changes, and the resulting data points are used to fit the interaction equations to estimate the potential parameters for each type of particle–particle interaction. Employing this framework enables the study of crystal structures with the accuracy of QM calculations but at the speed and system sizes permissible by MD techniques. While this framework enhances nano-based computational methods, the QM calculations still need massive amounts of computational power, which can be significantly reduced with the AI-based technique proposed in this work.
The emergence of ML, deep learning (DL) and artificial intelligence (AI) has helped alleviate the bottlenecks posed by QM and MD simulations and has made it possible to expand the scope of theour search for novel materials in the CCS. ML and DL algorithms are orders of magnitude faster than ab initio techniques. Unlike the QM-based simulations, which can take days to complete, ML algorithms can produce results within seconds. The use of AI has brought a paradigm shift in research related to improving battery technology as well as molecular property prediction and material discovery in general. For example, Sandhu et al. [23] used DL to examine the optimal crystal structures of doped cathode materials in lithium manganese oxide (LMO) batteries. Failed or unsuccessful synthesis data were used to predict the reaction success rate for the crystallization of templated vanadium selenites [24]. Using QM and ML techniques, Lu et al. [25] developed a method to predict undiscovered hybrid organic-inorganic perovskites (HOIPs) for photovoltaics. Their screening technique was able to shortlist six HOIPs with ideal band gaps and thermal stabilities from 5158 unexplored candidates. To identify material compositions with suitable properties, Meredig et al. [26] built an ML model trained on thousands of ground state crystal structures and used this model to scan roughly 1.6 million candidate compositions of novel ternary compounds to produce a ranked list of 4500 stable ternary compositions that would possibly represent undiscovered materials.
The broad approach employed when using AI-based property prediction models consists of three overarching components: a reference database consisting of relevant quantum mechanical data which is used to fit the AI model; a mathematical representation that not only uniquely describes the attributes of the reference materials but also enables effective model training; and finally a suitable AI model that can accomplish the learning task itself.

2. Database

The fundamental premise of AI is the ability to draw inferences from patterns in data and enable an accurate prediction in unknown domains. Hence, the data, which make up the training examples for our learning task, becomes a critical aspect for successful prediction. With the introduction of the Materials Genome Initiative in 2011 [27], the United States signaled the importance of unifying the infrastructure for material innovation and harnessing the power of material data. In lieu of the same goal, there has been an advent of various materials databases, such as the Inorganic Crystal Structure Database (ICSD) [28], the Open Quantum Materials Database (OQMD) [29], the Cambridge Structural Databases [30], the Harvard Clean Energy Project [31], the Materials Project [32] and the AFLOWLIB [33]. Specifically, the size of the training examples, the diversity of the dataset and the degrees of freedom all contribute to how effective the learning task for a specific objective can be [34]. In predicting properties such as the band gap energy and glass-forming ability for crystalline and amorphous materials, Ward et al. [35] methodically selected a chemically diverse set of attributes taken from the OQMD. Similarly, for electronic-structure problems, Schütt et al. [36] noted that the density of states at the Fermi energy is the critical property of concern. In predicting this property, around 7000 crystal structures from the ICSD were used, observing higher predicted variance for certain configurations and the need to extend the training set in these specific areas. The process of material discovery is complex and diverse, and it is not surprising that there is no one-size-fits-all database that can accurately predict the properties of all materials. The physical and chemical characteristics of materials vary widely, requiring different methods and techniques for precise analysis and prediction. Moreover, the current methodologies rely on the availability of well-curated data or the ability to manually generate such data, which is a daunting and often infeasible task, especially for new and unexplored materials. Thus, there is a need to develop generalizable and adaptable approaches that can efficiently handle a diverse range of materials, properties and configurations without the need for extensive data generation or curation.

3. Molecular Representation

ML algorithms draw inferences from data to establish a relationship between the atomic structure and the properties of a system. To enable the best possible structure-property approximation, a good representation of the material (also referred to as the ‘fingerprint’ or ‘descriptor’) is crucial. The first Hohenberg–Kohn theorem of DFT proves that the electron density of a system contains all the information needed to describe its ground state properties, and it is a ‘universal descriptor’ that can be used to predict these properties without knowledge of the details of the interactions between the electrons [37]. Crucially, for ML, a good molecular representation is invariant to rotation and translation of the system as well as permutation of atomic indices [38]. Therefore, unfortunately, the electronic density is not a universally suitable representation of a system. Additionally, a good descriptor must be unique, continuous, compact and computationally cheap [38]. Often, there are multiple molecular geometries that possess similar values for a property. Hence, there is no single universal representation for all properties leading to hundreds of molecular descriptors that are suitable only for a small subset of the CCS and a small subset of properties [39]. A commonly used molecular representation that satisfies the above-mentioned criteria of a good representation is the ‘Coulomb matrix’. It uses the same parameters that constitute the Hamiltonian for any given system, namely the set of Cartesian coordinates RI and nuclear charges ZI [40]. While the Coulomb matrix representation has shown tremendous success for property prediction in finite systems, it is unable to do the same for infinite periodic crystal structures [36]. Hansen et al. [41] proposed a new descriptor called ‘bag-of-bonds’ that performed better due to incorporating the many-body interactions of a system. In fact, the use of different descriptors in an ML endeavor for material property prediction is so common that there are open-source software packages that provide implementations for a myriad of different descriptors [38]. Unfortunately, a lack of clarity on the right descriptor makes the use of AI inaccessible to researchers that possess domain expertise but lack the needed knowledge of AI. Additionally, the lack of generalizability of a chosen descriptor makes the current AI-based techniques inaccurate and narrow in scope.

4. AI Model

In addition to an appropriate database and the precise molecular representation, a critical aspect in the material property prediction process is the choice of the AI algorithm. AI algorithms can be categorized into supervised learning, unsupervised learning and reinforcement learning. Supervised learning uses a standard fitting procedure that attempts to determine a mapping function between the known input features and the corresponding output labels. The goal is to make accurate predictions for new, unseen data. In contrast, unsupervised learning does not have prior knowledge of the desired output, and the goal is to find patterns and structures in this unlabeled data. Reinforcement learning uses an iterative trial-and-error process where the actions are determined based on reinforcement in the form of a reward-penalty system. The goal here is to maximize the cumulative reward over time. Supervised learning is the most widespread category of learning used in materials research. Different models may be better suited for certain types of materials or properties, and the choice of model often depends on the available data and the specific goals of the prediction task. Akbarpour et al. [42] found that artificial neural networks (ANNs) performed better in predicting the synthesis conditions of nano-porous anodic aluminum oxide at the interpore distance in comparison with both multiple linear regression and experimental studies. On the other hand, for the modeling and synthesis of zeolite synthesis, Manuel Serra et al. [43] found that support vector regression (SVR) outperformed ANNs and decision trees. Fang et al. [44] proposed a novel hybrid methodology for forecasting the atmospheric corrosion of metallic materials where the optimal hyperparameters for an SVR model were automatically determined using a generic algorithm. These examples highlight the need for AI expertise when choosing the right algorithm for a given application, which can be a barrier to making AI methods accessible for materials-based research.