Data-Driven Methods in Power Grids: Comparison
Please note this is a comparison between Version 1 by Volker Hoffmann and Version 2 by Camila Xu.

Applications of data-driven methods in power grids are motivated by the need to predict and mitigate intermittency in a (future) grid that is expected to lean heavily on renewables. This article briefly reviews some applications of data-driven methods. This article reviews some of the existing work on (i) equipment degradation, (ii) forecasting (and control) of production and demand, and (iii) grid-scale power quality.

  • machine learning
  • power systems
  • harmonic distortion

1. Background

The introduction of ever-increasing amounts of intermittent renewable generation, coupled with the increasing electrification of societies, leads to an increased strain on the power grid and its operation [1][2][3][1,2,3]. In order to maintain high security of supply, it is paramount to evolve the tools used for power systems operations [4]. One such tool would be the ability to predict undesired events with sufficient prediction horizon to facilitate mitigating actions [5][6][7][5,6,7].
The development of such tools is encouraged by recent advancements in data-driven techniques, machine learning (ML), available data volumes, and computational resources [8][9][10][8,9,10]. These algorithms can derive insights from data without being explicitly told what to look for in the vast data steams [11][12][11,12], which is particularly beneficial in the domain of power system fault prediction. An explicit detailed modeling of the power system is cumbersome and would not encapsulate conditions the modeler does not know about, that could lead to faults, such as icing on transmission lines, faults in critical components or reoccurring abnormalities.
Data driven methods are only as good as the data they rely on, and only have the capability to predict situations that the model have been trained on [13][14][15][13,14,15]. In a best case scenario, the models are trained on a complete and large dataset, and it can rely on the automatic tuning of model parameters [16][17][16,17]. This is, however, often not the case in real word applications. In the case of fault prediction in the power system, the number of faults occurring are very small compared to normal operating conditions [18].
To achieve high performance with data driven methods, the analyst must therefore pre-process the data—essentially guiding the algorithms in selecting their focus. This type of pre-processing includes dimensionality reduction, feature selection, feature engineering, and rescaling of features and prediction targets [19][20][19,20]. While there are aspects of an art (or, more precisely, intuition based on experience and domain knowledge) to these activities, they depend on an understanding of the behaviour of the underlying power system.

2. Data-Driven Methods in Power Grids

Applications of data-driven methods in power grids are motivated by the need to predict and mitigate intermittency in a grid that leans heavily on renewables [21][22][21,22]. Works tend to focus on: (i) equipment degradation; (ii) forecasting (and control) of demand and production; or (iii) grid-scale power quality (PQ) and continuity of supply. For equipment degradation, focus is either on individual assets (usually with the aim of predictive maintenance) or their interaction with the grid at large. The most relevant assets are wind turbines, hydroelectric power plants, photovoltaic power plants, and distribution transformers. Focusing on key assets (and their subcomponents), refs. [23][24][23,24] used event and state logs from wind-turbine control systems to train supervised learning algorithms (neural networks, boosted trees, and support vector machines). They report successful prediction of fault states with lead times in the order of five minutes to an hour. In a similar vein, refs. [25][26][25,26] monitoring data from sub-components (e.g., compressors, generators, turbines) are used to detect and predict anomalous behaviour in hydro power stations. They demonstrate implementations of self-organizing maps and neural networks within the control loops, but unfortunately do not report on model performance. For photovoltaic systems, forecasting of faults appears to be less advanced and the literature focuses on fault detection and characterization. For example, ref. [27] integrates system data (currents, voltages, temperature) and uses neural networks to detect and classify abnormal operating conditions. Based on multispectral drone imagery, ref. [28] deploys convolutional neural networks (CNNs) to detect various types of panel damage. Overall, there is significant potential in machine learning approaches to predicting the condition of photovoltaic system due to the large amount of non-correlated data sources (weather, system data, and imagery), see also [29][30][29,30]. Finally, multiple works attempt to predict failure of distribution transformers by combining event logs and data from outgassing of insulating oil. While [31] deploys a fairly complicated scheme involving agents, neural networks, and evolutionary methods, ref. [32] uses gradient boosted trees and claims a superior performance compared to their reviewed literature. The state-of-the-art in the use of machine learning to predict transformer failures is reviewed in [33]. On the production side, data-driven forecasting methods for wind and photovoltaic systems are mainly concerned with: (i) using (and improving upon) numerical weather prediction models; and (ii) relating the weather conditions to actual power output. For example, ref. [34] uses neural networks to accelerate wind-field computation for a complicated topography while [35] uses model ensembles (k-nearest neighbours, support vector regression, and decision trees) to relate local wind-speed measurements to turbine power output. For solar forecasting, ref. [36] compare 68 machine learning-based forecasting models and find that (a) tree-based methods perform best but (b) that there is significant variation between the performance of different models in space and time. See also [37][38][37,38] for reviews. Hydro power forecasting, on the other hand, is more often cast as a scheduling problem. For example, ref. [39] feeds climate data, expected demand curves, and market conditions into a reinforcement learning system for optimal (most profitable) long-term scheduling. See also [40] for a recent review. Research on demand forecasting, on the other hand, is frequently coupled to control schemes for residential and commercial smart buildings [41][42][41,42] or vehicle-to-grid technologies [43][44][43,44]. In addition, there is a sprawling literature on customer segmentation [45][46][45,46], building performance assessments [47], and residential level demand forecasting [48][49][48,49]. With a focus on components and their impact on the remainder of the grid, ref. [50] uses the recurrent incidence of minor events to predict major outages, ref. [51] couple event logs from distribution transformers to meteorological data, and [52] connects meteorological data to component states to predict the impact of extreme weather. Focusing on power quality alone, refs. [53][54][53,54] detect and identify PQ anomalies using either neural networks and decision trees, extensive feature engineering, or semi-supervised learning approaches, respectively. Finally, ref. [55] include anomaly prediction and—by using random forests—obtains inherently explainable models. Similarly, researcheours' own recent works have also focused on predicting PQ disturbances using a variety of data sources, methods, and features [56][57][58][59][60][61][56,57,58,59,60,61]. Unfortunately, most works (including ouresearchers' own) omit describing the underlying data, and instead jump straight to feature engineering and machine learning.
Video Production Service