Battery fires have become more common owing to the increased use of lithium-ion batteries. Therefore, monitoring technology is required to detect battery anomalies because battery fires cause significant damage to systems. We used Mahalanobis distance (MD) and independent component analysis (ICA) to detect early battery faults in a real-world energy storage system (ESS). The fault types included historical data of battery overvoltage and humidity anomaly alarms generated by the system management program. These are typical preliminary symptoms of thermal runaway, the leading cause of lithium-ion battery fires. The alarms were generated by the system management program based on thresholds. If a fire occurs in an ESS, the humidity inside the ESS will increase very quickly, which means that threshold-based alarm generation methods can be risky. In addition, industrial datasets contain many outliers for various reasons, including measurement and communication errors in sensors. These outliers can lead to biased training results for models.
1. Introduction
In recent years, the need for power plants utilizing renewable energy sources such as solar and wind power has emerged due to environmental concerns. However, the operation of a power plant demands a significant amount of energy. Even if the maximum generation capacity of the power plant decreases, the required power consumption does not decrease proportionally. For instance, producing sufficient power during periods of high demand can be challenging due to environmental and operational conditions. Conversely, energy waste may occur during periods of low demand and high generation. Energy storage systems (ESSs) have been designed to address these energy management challenges. In most power plants, batteries are integrated into ESSs to operate efficiently and maintain a balance between energy demand and supply.
There are four main types of energy storage technologies: mechanical, thermal, chemical, and electrical [
1,
2]. A mechanical storage system stores energy in two forms: potential and kinetic [
3]. Potential energy is stored by pumping water from a lower reservoir to an elevated reservoir via pumped hydroelectric storage (PHS). Energy storage technology using the form of kinetic energy includes a flywheel that accelerates a rotating mass around a fixed axis. Thermal energy storage (TES) is a technology for the efficient management of thermal energy. TES technologies applied in various fields are comprehensively summarized in [
4]. In [
5], the authors conducted numerical simulation works related to the melting of an organic phase change material in a TES system. Chemical energy storage consists of batteries including lithium-ion, lead–acid, or nickel–metal hydride. Details about this technology will be covered later. Electrical energy storage (EES) systems are a crucial component in building sustainable energy technologies. Capacitors and magnets are the most popular forms of electrical energy storage. A traditional capacitor stores energy by removing electrons from one metal plate and depositing them onto another metal plate. A representative example of energy storage technology using magnets is superconducting magnetic energy storage (SMES). In SMES, energy is stored in a magnetic field generated by a direct current flowing through a superconducting coil cooled to ultra-low temperatures [
5].
ESSs mainly consist of a battery for energy storage, battery management system (BMS), power conversion system (PCS), and energy management system (EMS). A BMS is an instrument that monitors the battery and controls the charging and discharging of power. The PCS is a device that converts the electrical characteristics (AC/DC, voltage, frequency) to store power from a power source within the ESS in the battery or discharge it to the power grid. The EMS plays a role in monitoring and controlling the status of the battery and PCS, serving as the operational system for monitoring and controlling the ESS. The batteries for ESSs are modularized and stacked inside racks. These racks are combined according to the required capacity and installed in a container. Typically, a single ESS container has a capacity of 1 to 5 MWh. Due to the modular nature of the batteries, they can be configured based on the required ESS scale, allowing for easy management down to the cell level. The configured battery is managed through the BMS.
Lead–acid and nickel–metal hydride batteries have been widely used in the past. Lead–acid batteries are secondary batteries that utilize the electrochemical reaction between lead and sulfuric acid, making them more economical than other secondary batteries. However, they are relatively heavy, given the capacity of the cells, and lead is used to fabricate the batteries; therefore, they cause environmental issues. Nickel–metal hydride batteries have the advantage of high capacity owing to their higher energy density per unit volume compared to lead–acid batteries. Additionally, they are less likely to pollute the environment than lead–acid batteries. However, nickel–metal hydride batteries suffer from a memory effect that reduces their capacity when charged without full discharge. Memory effects directly affect economics and battery performance. Lithium-ion batteries (LIBs) were developed to address these problems. They have a high energy density and significantly reduced memory effects. LIBs are widely used in portable electronic devices such as smartphones because they are less likely to self-discharge when not in use. Moreover, they are increasingly used in defense, automation, and aerospace industries owing to their high energy density.
LIBs can develop unexpected internal or external faults that potentially lead to thermal runaway. Thermal runaway occurs when the abuse or failure of a lithium-ion battery causes a chain reaction in battery temperature and internal pressure cycling, resulting in uncontrollably high temperatures inside the cell or pack [
6]. If thermal runaway continues, the battery can explode and cause a fire, damaging the entire system. In recent years, the continued demand for smaller and lighter electronic devices has led to LIBs being designed to have higher energy densities, which can potentially lead to more destructive accidents [
7]. In particular, the thermal runaway of lithium-ion batteries installed in power plants can result in large-scale fires. Thus, technologies are required to monitor the status of batteries in real-time and detect abnormalities (faults) in advance.
2. A Brief Review of Fault Detection Approaches for LIBs
Much effort has been made to prevent and mitigate catastrophic consequences by mathematically modeling battery failures to diagnose thermal runaway [
8,
9,
10,
11,
12]. These efforts should be accompanied by specialized knowledge to understand not only the behavior of the battery but also the complex mechanisms of its failures. In addition, the mathematical modeling of battery failure has limitations owing to the different specifications and operating environments of industrial systems. Consequently, many scholars have used data-driven approaches to detect anomalies (faults) in batteries [
13,
14,
15].
Data-driven approaches for fault detection aim to identify abnormal patterns by learning historically observed data from a target system. This requires only a large number of quality observations and does not require knowledge of the target system. Typical early warning signs of LIB thermal runaway include increased temperature and pressure inside and outside the battery owing to extreme charging and discharging and increased humidity around the battery owing to off-gassing [
16]. For the early detection of battery thermal runaway, ambient environmental data such as the temperature and humidity of the battery are utilized, and electrical data such as the current and voltage measured during battery charging and discharging are often used. In many studies utilizing real-world battery data, researchers have developed experimental environments to obtain observations. Ma et al. [
17] installed a small battery management system (BMS) and analyzed the measured data using the statistical method PCA-KPCA to detect faults. In [
18], the authors proposed an entropy-based fault detection method for connecting lithium-ion cells. Xiong et al. [
19] developed a probabilistic rule-based method to detect over-discharge faults. An integrative application of the interleaved voltage measurement topology and improved correlation coefficient was presented to diagnose various faults [
20]. For studies that utilized data generated through simulations, Ma et al. [
21] proposed an improved Z-score test to detect cell connection faults.
3. Preliminary
The models used in data-driven approaches for fault detection are primarily multivariate statistical, machine learning, and distance-based models. Statistical methods are the traditional fault detection methods. They perform fault detection by statistically analyzing the characteristics of multivariate datasets and calculating them into a monitoring statistic. Principal component analysis (PCA) is the most traditional and popular multivariate statistical approach.
SPE and Hotelling’s
T2 are often used as monitoring statistics [
17,
22,
23] and are considered as monitoring charts to indicate a system’s status. Although PCA is a powerful method, it is based on the statistical assumption that latent variables follow a Gaussian distribution. This assumption can be a limitation of PCA because the hidden variables in real industrial process data often follow a non-Gaussian distribution. Several machine learning algorithms have been applied to classify battery cell imbalance and damage, including logistic regression artificial neural networks (ANNs) [
24] and kernel support vector machines (SVMs) [
25]. Classical regression techniques, such as Gaussian process regression [
26] and deep learning approaches [
27,
28], have also gained considerable attention. A convolutional neural network (CNN) capable of extracting image features can be utilized [
29]. Network models require a large amount of training data for effective learning owing to the characteristics of the model structure. However, in actual industries, it takes time to acquire adequate data for training. If insufficient data are used to train a network, the model may perform poorly. The fundamental principle of distance-based models for fault detection involves employing the difference in the distance between normal and abnormal samples. Mahalanobis distance (MD) [
30] and the local outlier factor (LOF) [
31] are commonly used in distance-based models. These methods may exhibit low performance when the difference between healthy and faulty data is small.
In this study, some outliers were removed from the dataset by MD and independent component analysis (ICA) was used to detect battery anomalies in the ESS. In general, outliers can occur when collecting random samples. In particular, outliers can be contained in datasets measured in real-world industrial processes for various reasons, such as measurement and communication errors of the sensors, because a lot of sensors are installed to monitor the status of a target system. If a model is trained on data containing outliers, it may be less efficient, leading to poor performance. To reduce outliers, the communication status of the sensors and each facility should be periodically inspected. Moreover, system administrators should continuously verify that real-time observations are normal. MD is a simple, distance-based method that can consider the statistical properties of a dataset, and it can effectively identify outliers in normal samples with low variation. In other words, when MD is applied to real-world ESSs, the outliers are automatically identified and removed. ICA is a multivariate statistical technique used to identify hidden independent components (ICs) underlying observations, signals, or random variables. As mentioned above, PCA-based fault detection techniques implicitly assume that the observations at one time are statistically independent of previous observations and that the latent variables follow Gaussian distribution. However, these assumptions are invalid for actual industrial processes owing to their dynamic and nonlinear properties. Therefore, monitoring results based on PCA tends to result in false alarms and poor detectability [
32]. Martin and Morris [
33] used multivariate normality tests on scores and found that the latent variables in many industrial processes rarely have multivariate Gaussian distributions. On the other hand, ICA assumes that the latent variables follow a non-Gaussian distribution; it can decompose multivariate data into statistical ICs with less information loss.
To demonstrate the performance of the proposed method, researchers utilized the measured historical fault data from the battery of an ESS connected to a solar power plant. The fault types were battery overvoltage and humidity anomaly alarms generated by a system management program. When battery overvoltage occurs, the temperature and pressure inside the battery increase, which can cause thermal runaway. In the case of humidity abnormalities, as gases are released before thermal runaway occurs, the humidity around the battery increases. Therefore, both types of anomalies are closely related to battery fires.
This entry is adapted from the peer-reviewed paper 10.3390/en17020535