1000/1000
Hot
Most Recent
Due to the high degree of intermittency of renewable energy sources and the growing interdependences amongst formerly separated energy pathways, the modeling of adequate energy systems is crucial to evaluate existing energy systems and to forecast viable future ones. However, this corresponds to the rising complexity of energy system models (ESMs) and often results in computationally intractable programs. To overcome this problem, time series aggregation is frequently used to reduce ESM complexity. As these methods aim at the reduction of input data and preserving the main information about the time series, but are not based on mathematically equivalent transformations, the performance of each method depends on the justifiability of its assumptions.
The following section deals with the general concept of TSA. For the mathematical examinations of the following section, the nomenclature of Table 1 is used.
Table 1. Nomenclature for the mathematical examinations in the following section.
The input data D usually consists of one time series for each attribute, i.e., D=A×S. The set of attributes A describes all types of parameters that are ex-ante known for the energy system, such as the capacity factors of certain technologies at certain locations or demands for heat and electricity that must be satisfied. The set of time steps describes the shape of the time series itself, i.e., sets of discrete values that represent finite time intervals, e.g., 8760 time steps of hourly data to describe a year. For all methods presented in the following, it is crucial that the time series of all attributes have identical lengths and TR. The possible shape of this highly resolved input data is shown in the left upmost field in Figure 2.
Figure 2. Methods of time series aggregation (TSA) for energy system models (ESMs).
One approach for aggregating the input time series is to merge multiple time series of attributes with a similar pattern. However, this can only be performed for attributes describing similar units (e.g., the capacity factors of similar wind turbines) or similar customer profiles (i.e., the electricity demand profiles of residential buildings). As this approach is often chosen to merge spatially distributed but similar technologies, it is not considered as TSA in the narrow sense, but as spatial or technological aggregation, as the number of time steps is not reduced in these cases. This is illustrated in the right upmost field in Figure 2.
TSA, as it is understood in this review, is the aggregation of redundant information within each time series, i.e., in the case of discrete time steps, the reduction of the overall number of time steps. This can be done in several ways. One way of reducing the number of time steps, as is shown in the central field of Figure 2, is the merging of adjacent time steps. Here, it needs to be highlighted that the periods shown in this field are for illustrative purposes only: The merging of adjacent time steps can be performed for full-length time series or time periods of time series only. Moreover, the merging of adjacent time steps can either be done in a regular manner, e.g., every two time steps are represented by one larger time step (downsampling) or in an irregular manner according to, e.g., the gradients of the time series (segmentation). A third possible approach is to individually variate the temporal resolution for each attribute, i.e., using multiple time grids, which could also be done in an irregular manner, as pointed out by Renaldi et al. ^{[31]}.
Another approach for TSA is based on the fact that many time series exhibit a fairly periodic pattern, i.e., time series for solar irradiance have a strong daily pattern. In the case of perfect periodicity, a time series could thus be represented by one period and its cardinality without the loss of any information. Based on this idea, time series are often divided into periods as already shown in the middle of Figure 2. As the periods are usually not constant throughout a year (e.g., the solar irradiance is higher in the summer than in the winter), the periods can either be merged based on their position in the calendar (time slices and averaging) or based on their similarity (clustering), as shown at the bottom of Figure 2.
In the following, methods that merge time steps or periods in a regular manner, i.e., based on their position in the time series only, will be referred to as time-based methods, whereas aggregation based on the time steps’ and periods’ values will be called feature-based. In this context, features refer not only to statistical features as defined by Nanopoulos et al. ^{[32]}, but in a broader sense to information inherent to the time series, regardless of whether the values or the extreme values of the time series themselves or their statistical moments are used ^{[33]}.
2.1. Resolution Variation
The simplest and most intuitive method for reducing the data volume of time series for ESMs is the variation of the TR. Here, three different procedures can be distinguished that have been commonly used in the literature:
2.1.1. Downsampling
Downsampling is a straightforward method for reducing the TR by representing a number of consecutive discrete time steps by only one (longer) time step, e.g., a time series for one year of hourly data is sampled down to a time series consisting of 6 h time steps. Thus, the number of time steps that must be considered in the optimization is reduced to one sixth, as demonstrated by Pfenninger et al. ^{[34]}. As the averaging of consecutive time steps leads to an underestimation of the intra-time step variability, capacities for RES tend to be underestimated because their intermittency is especially weakly represented ^{[34]}. Figure 3 shows the impact of downsampling the PV profile from hourly resolution to 6-h time steps, resulting in one sixth of the number of time steps. In comparison to the original time series, the underestimation of extreme periods is remarkable. This phenomenon also holds true for sub-hourly time steps ^{[35]}^{[36]}^{[37]} and, for instance, in the case of an ESM containing a PV cell and a battery for a residential building, this not only has an impact on the built capacities, but also on the self-consumption rate ^{[35]}^{[37]}. For wind, the impact is comparable ^{[36]}. As highlighted by Figure 2, downsampling can also be applied to typical periods. To the best of our knowledge, this was initially evaluated by Yokoyama et al. ^{[38]} with the result that it could be a crucial step to resolve a highly complex problem, at least close to optimality. The general tendency of downsampling to underestimate the objective function was shown in a subsequent work by Yokoyama et al. ^{[39]} and the fact that this is not necessarily the case when combined with other methods in a third publication ^{[40]}.
Figure 3. The time series of photovoltaic capacity factors downsampled to 1460 6 h time steps.
2.1.2. Segmentation
In contrast to downsampling, segmentation is a feature-based method of decreasing the TR of time series with arbitrary time step lengths. To the best of our knowledge, Mavrotas et al. ^{[41]} were the first to present an algorithm for segmenting time series to coarser time steps based on ordering the gradients between time steps and merging the smallest ones. Fazlollahi et al. ^{[42]} then introduced a segmentation algorithm based on k-means clustering in which extreme time steps were added in a second step. In both works, the segmentation methods were applied to typical periods, which will be explained in the following chapters. Bungener et al. ^{[43]} used evolutionary algorithms to iteratively merge the heat profiles of different units in an industrial cluster and evaluated the different solutions obtained by the algorithm with the preserved variance of the time series and the sum of zero-flow rate time steps, which indicated that a unit was not active. Deml et al. ^{[44]} used a similar, but not feature-based approach, as Mavrotas et al. and Fazlollahi et al. ^{[41]}^{[42]} for the optimization of a dispatch model. In this approach, the TR of the economic dispatch model was more reduced the further time steps lay in the future, following a discretized exponential function. Moreover, they compared the results of this approach to those of a perfect foresight approach for the fully resolved time horizon and a model-predictive control and proved the superiority of the approach, as it preserved the chronology of time steps. This was also pointed out in comparison to a typical periods approach by Pineda et al. ^{[45]}, who used the centroid-based hierarchical Ward’s algorithm ^{[46]} with the side condition to only merge adjacent time steps. Bahl et al. ^{[47]}, meanwhile, introduced a similar algorithm as Fazlollahi et al. ^{[42]} inspired by Lloyd’s algorithm and the partitioning around medoids algorithm ^{[48]}^{[49]} with multiple initializations. This approach was also utilized in succeeding publications ^{[50]}^{[51]}. In contrast to the approach of Bahl et al. ^{[47]}, Stein et al. ^{[52]} did not use a hierarchical approach, but formulated an MILP in which not only extreme periods could be excluded beforehand, but also so that the grouping of too many adjacent time steps with a relatively small but monotone gradient could be avoided. The objective function relies on the minimization of the gradient error, similar to the method of Mavrotas et al. ^{[41]}. Recently, Savvidis et al. ^{[53]} investigated the effect of increasing the TR at times of the zero-crossing effect, i.e., at times when the energy system switches from the filling of storages to withdrawing and vice versa. This was compared to the opposite approach, which increased resolution at times without zero crossing. They also arrived at the conclusion that the use of irregular time steps is effective for decreasing the computational load without losing substantial information. Figure 4 shows advantages of the hierarchical method proposed by Pineda et al. ^{[45]} compared to the simple downsampling in Figure 3. The inter-daily variations of the PV profile are much more accurately preserved choosing 1460 irregular time steps compared to simple downsampling with the same number of time steps.
Figure 4. The time series of photovoltaic capacity factors segmented to 1460 time intervals using hierarchical merging of adjacent time steps based on centroids as proposed by Pineda et al. ^{[45]}.
2.1.3. Multiple Time Grids
The idea of using multiple time grids takes into account that different components that link different time steps to each other, such as storage systems, have different time scales on which they operate ^{[14]}^{[15]}^{[54]}. For instance, batteries often exhibit daily storage behavior, whereas hydrogen technologies ^{[14]}^{[15]} or some thermal storage units ^{[54]}^{[55]} have seasonal behavior. Because of this, seasonal storage is expected to be accurately modeled with a smaller number of coarser time steps. Renaldi et al. ^{[31]} applied this principle to a solar district heating model consisting of a solar thermal collector, a backup heat boiler, and a long- and a short-term thermal storage system to achieve the optimal tradeoff between the computational load and accuracy for modeling the long-term thermal storage with 6 h time steps and the remaining components with hourly time steps. It is important to highlight that the linking of the different time grids was achieved by applying the operational state of the long-term storage to each time step of the other components if they lay within the larger time steps of the long-term storage. This especially reduced the number of binary variables of the long-term storage (because it could not charge and discharge at the same time). However, increasing the step size led to an even further increase in calculation time, as the operational flexibility of the long-term storage became too stiff and the benefit from reducing the number of variables of the long-term storage decreased. Thus, this method requires knowledge about the characteristics of each technology beforehand. Reducing the TR of single components is a highly demanding task and is left to future research.
2.2. Typical Periods
2.2.1. Time-Based Merging
Time-based approaches of selecting typical periods rely on the modeler’s knowledge of the model. This means that characteristics are included that are expected to have an impact on the overall design and operation of the ESM. As will be shown in the following, this was most frequently done for TDs, although similar approaches for typical weeks ^{[63]} or typical hours (i.e., TTSs) ^{[71]} exist. As pointed out by Schütz et al. ^{[72]}, the time-based selection of typical periods can be divided into month-based and season-based methods, i.e., selecting a number of typical periods from either each month or from each season. However, we divide the time-based methods in consecutive typical periods and non-consecutive typical periods that are repeated as a subset with a fixed order in a pre-defined time interval.
2.2.1.1. Averaging
The method that is referred to as averaging in the following, as per Kotzur et al. ^{[62]}, focuses on aggregating consecutive periods into one period. To the best of our knowledge, this idea was first introduced by Marton et al. ^{[73]}, who also introduced a clustering algorithm that indicated whether a period of consecutive typical periods of Ontario’s electricity demand had ended or not. In this way, the method was capable of preserving information about the order of TDs. However, it was not applied to a specific ESM. In contrast to that method, one TD for each month at hourly resolution, resulting in 288 time steps, was used by Mavrotas et al. ^{[41]}, Lozano et al. ^{[74]}, Schütz et al. ^{[75]}, and Harb et al. ^{[76]}. Although thermal storage systems have been considered in the literature ^{[74]}^{[75]}^{[76]} (as well as a battery storage by Schütz et al. ^{[75]}), they were constrained to the same state of charge at the beginning and end of each day. The same holds true in the work of Kotzur et al. ^{[62]}. Here, thermal storage, batteries and hydrogen storage were considered and the evaluation was repeated for different numbers of averaged days. Buoro et al. ^{[63]} used one typical week per month to simulate operation cycles on a longer time scale. Kools et al. ^{[77]}, in turn, clustered eight consecutive weeks in each season to one TD with 10 min resolution, which was then further down-sampled to 1 h time steps. The same was done by Harb et al. ^{[76]}, who compared twelve TDs of hourly resolution to time series with 10 min. time steps and time series down-sampled to 1 h time steps. This illustrates that both methods, downsampling and averaging, can be combined. Voll et al. ^{[78]} aggregated the energy profiles even further with only one time step per month, which can also be interpreted as one TD per month down-sampled to one time step. To account for the significant underestimation of peak loads, the winter and summer peak loads were included as additional time steps. Figure 5 illustrates the impact of representing the original series by twelve monthly averaged consecutive typical days, i.e., 288 time steps instead of 8760.
Figure 5. The time series of photovoltaic capacity factors represented by twelve monthly averaged periods as used in other studies ^{[41]}^{[74]}^{[75]} and reproduced by Kotzur et al. ^{[62]} using the python package tsam ^{[62]} (i.e., 288 different time steps).
2.2.1.2. Time Slices
To the best of our knowledge, the idea of time slices (TSs) was first introduced by the MESSAGE model ^{[9]}^{[79]} and the expression was reused for other models, such as THEA ^{[80]}, LEAP ^{[81]}, OSeMOSYS ^{[82]}, Syn-E-Sys ^{[83]}, and TIMES ^{[84]}^{[85]}. The basic idea is comparable to that of averaging, but not based on aggregating consecutive periods. Instead, TSs can be interpreted as the general case of time-based grouping of periods. Given the fact that electricity demand in particular not only depends on the season, but also on the weekday, numerous publications have used the TS method for differentiating between seasons and amongst days. In the following, this approach is referred to as time slicing, although not all of the cited publications explicitly refer to the method thus. Instead, the method is sometimes simply called “representative day” ^{[38]}^{[39]}^{[86]}^{[87]}^{[88]}^{[89]}^{[90]}^{[91]}, “TD” ^{[41]}^{[92]}^{[93]}^{[94]}^{[95]}^{[96]}^{[97]}^{[98]}^{[99]}, “typical daily profiles” ^{[16]}^{[17]}, “typical segment” ^{[100]} “time slot” ^{[101]}, or “time band” ^{[102]}. Accordingly, the term “TS” is used by the majority of authors ^{[103]}^{[104]}^{[79]}^{[80]}^{[81]}^{[82]}^{[83]}^{[105]}^{[106]}^{[107]}^{[108]}^{[109]}. The most frequent distinction is made between the four seasons ^{[16]}^{[17]}^{[103]}^{[104]}^{[80]}^{[81]}^{[82]}^{[83]}^{[93]}^{[99]}^{[105]}^{[107]}^{[108]}^{[109]} or between summer, winter. and mid-season ^{[110]}^{[79]}^{[41]}^{[38]}^{[39]}^{[65]}^{[86]}^{[87]}^{[88]}^{[90]}^{[95]}^{[96]}^{[98]}^{[102]}^{[111]}^{[112]}, but other distinctions such as monthly, bi-monthly, or bi-weekly among others ^{[110]}^{[79]}^{[41]}^{[89]}^{[91]}^{[92]}^{[94]}^{[97]}^{[100]}^{[106]} can also be found. Within this macro distinction, a subordinate distinction between weekdays and weekend days ^{[16]}^{[17]}^{[79]}^{[82]}^{[83]}^{[89]}^{[91]}^{[94]}^{[99]}^{[100]}^{[102]}, weekdays, Saturdays, and Sundays ^{[93]}^{[105]}^{[107]}, Wednesdays, Saturdays, and Sundays ^{[80]}^{[81]}, or others, such as seasonal, median, and peak ^{[110]} can be found. In contrast to the normal averaging, each TS does not follow the previous one, but is repeated in a certain order a certain number of times (e.g., five spring workdays are followed 13 times by two weekend spring days before the summer periods follow). As a visual inspection of Figure 5 and Figure 6 shows, the TS method relying on the distinction between weekdays and seasons is not always superior to a monthly distinction. The reason for this is that some input data such as the PV profile from the example have no weekly pattern and spacing the typical periods equidistantly is the better choice in this case if no other input time series (such as, e.g., electricity profiles) must be taken into account. Thus, the choice of the aggregation method should refer to the pattern of the time series considered to be especially important for the ESM. For instance, the differences between week- and weekend days is likely more important to an electricity system based on fossil fuels and without storage technologies, whereas an energy system based on a high share of RES, combined heat and power technologies, and storage units is more affected by seasonality.
Figure 6. The time series of photovoltaic capacity factors represented by twelve time slices (TSs) (average Wednesday, Saturday and Sunday for each season) as used by Nicolosi et al. and Haydt et al. ^{[80]}^{[81]} (i.e., 288 different time steps).
2.2.1.3. Time Slices/Averaging + Downsampling/Segmentation
2.2.2. Feature-Based Merging
In contrast to representing time series with typical periods based on a time-based method, typical periods can also be chosen on the basis of features. In this section, the clustering procedure is explained both conceptually and mathematically. To the best of our knowledge, one of the first and most frequently cited works by Domínguez-Muñoz et al. ^{[113]} used this approach to determine typical demand days for a CHP optimization, i.e., an energy system optimization model with discrete time steps, even though it was not applied to a concrete model in this work. For this purpose, all time series are first normalized to encounter the problem of diverse attribute scales. Then, all time series are split into periods P, which are compared to each other by transforming them for each value x of each attribute a at each time step t within the period to a hyper-dimensional data point. Those data points with low distances to each other are grouped into clusters and represented by a (synthesized or existing) point within that cluster considered to be a “typical” or “representative” period. Additionally, a number of clustering algorithms are not centroid-based, i.e., they do not preserve the average value of the time series ^{[72]} which could, e.g., lead to a wrong assumption of the overall energy amount provided by an energy system across a year. To overcome this problem, time series are commonly rescaled in an additional step. This means that time series clustering includes five fundamental aspects:As the clustered data are usually relatively sparse, while the number of dimensions increases with the number of attributes, the curse of dimensionality may lead to unintuitive results incorporating distance metrics ^{[115]}, such as the Euclidean distance ^{[114]}^{[116]}^{[117]}^{[118]}. Therefore, a dimensionality reduction might be used in advance ^{[119]}^{[120]}^{[121]}, but is not further investigated in this work for the sake of brevity. In the following, each of the bullet points named above will be explained with respect to their application in TSA for ESMs. Further, the distance metric, clustering method, , because the number of clustering methods used for ESMs is small. Figure 6 shows the mandatory steps for time series clustering used for ESMs, which are presented in the following. Figure 7 shows the time series of photovoltaic capacity factors represented by 12 typical days (TDs) using k-means clustering and the python package tsam ^{[62]}.
Figure 6. Steps for clustering time series for energy system models (ESMs).
Figure 7. The time series of photovoltaic capacity factors represented by twelve typical days (TDs) using k-means clustering and the python package tsam ^{[62]} (i.e., 288 different time steps).
2.2.2.1. Preprocessing and Normalization
Clustering normally starts with preprocessing the time series, which includes a normalization step, an optional dimensionality reduction and an alignment step. Because of the diversity of scales and units amongst different attributes, they must be normalized before applying clustering algorithms to them. Otherwise, distance measures used in the clustering algorithm would focus on large-scaled attributes and other attributes would not be properly represented by the cluster centers. For example, capacity factors are defined as having values of between zero and one, whereas electricity demands can easily reach multiple gigawatts. Although a vast number of clustering algorithms exist, the min-max normalization is used in the majority of publications ^{[14]}^{[18]}^{[104]}^{[62]}^{[72]}^{[42]}^{[66]}^{[67]}^{[119]}^{[122]}^{[123]}^{[124]}^{[125]}. For the time series of an attribute a∈A={1,…,Na}consisting of s∈S={1,…,Ns} time steps, the normalization to the values assigned to a in time step s is calculated as follows:
In cases in which the natural lower limit is zero, such as time series for electricity demands, this is sometimes ^{[34]}^{[59]}^{[61]}^{[68]}^{[126]}^{[127]}^{[128]}^{[129]} reduced to:
Another normalization that can be found in the literature ^{[130]}^{[71]}^{[131]}^{[132]}^{[133]} is the z-normalization that directly accounts for the standard deviation, rather than for the maximum and minimum outliers, which implies a normal distribution with different spreads amongst different attributes:
In the following, the issue of dimensionality reduction will not be considered due to the fact that it is only used in a small number of publications ^{[119]}^{[120]}^{[121]} and transforms the data into eigenvectors to tackle the non-trivial behavior of distance measures used for clustering in hyper-dimensional spaces ^{[117]}.
A time series can further be divided into a set of periods P and a set of time steps within each period T, i.e., S=P×T. The periods are clustered into non-overlapping subsets PC, which are then represented by a representative period, respectively. A representative period consists of at least one discrete time step and, depending on the number and duration of time steps, it is often referred to as a typical hour, snapshot or system state, typical or representative day, or typical week. The data D=A×P×T can thus be rearranged so that each period is represented by a row vector in which all inter-period time steps of all attributes are concatenated, i.e.,
The row vectors of Darr are now grouped with respect to their similarity. Finally, yet importantly, it must be highlighted that the inner-period time step values can also be sorted in descending order, which means that, in this case, the duration curves of the periods are clustered as done in other studies ^{[18]}^{[124]}^{[134]}^{[135]}. This can reduce the averaging effect of clustering time series without periodic patterns such as wind time series.
2.2.2.2. Algorithms, Distance Metrics, Representation
Although a vast number of different clustering algorithms exist ^{[70]}^{[136]} and have been used for time series clustering in general ^{[114]}, only a relatively small number of regular clustering algorithms have been used for clustering input data for energy system optimization problems, which will be presented in the following. Apart from that, a number of modified clustering methods have been implemented in order to account for certain properties of the time series. The goal of all clustering methods is to meaningfully group data based on their similarity, which means minimizing the intra-cluster difference (homogeneity) or maximizing the inter-cluster difference (separability) or a combination of the two ^{[137]}. However, this depends on the question of how the differences are defined. To begin with, the clustering algorithms can be separated into partitional and deterministic hierarchical algorithms.
Partitional Clustering
One of the most common partitional clustering algorithms used in energy system optimization is the k-means algorithm, which has been used in a variety of studies ^{[14]}^{[15]}^{[24]}^{[34]}^{[62]}^{[72]}^{[33]}^{[42]}^{[47]}^{[51]}^{[56]}^{[57]}^{[58]}^{[59]}^{[60]}^{[71]}^{[121]}^{[122]}^{[123]}^{[125]}^{[126]}^{[129]}^{[131]}^{[132]}^{[133]}^{[138]}^{[139]}^{[140]}^{[141]}^{[142]}^{[143]}^{[144]}^{[145]}^{[146]}. The objective of the k-means algorithm is to minimize the sum of the squared distances between all cluster members of all clusters and the corresponding cluster centers, i.e.,
The distance metric in this case is the Euclidean distance between the hyperdimensional period vectors with the dimension dim(vec(T×A))
and their cluster centers ck, i.e.,
where the cluster centers are defined as the centroid of each cluster, i.e.:
This NP-hard problem is generally solved by an adopted version ^{[49]} of Lloyd’s algorithm ^{[48]}, a greedy algorithm that heuristically converges to a local minimum. As multiple runs are performed in order to improve the local optimum, improved versions (such as k-means++) for setting initial cluster centers have also been proposed in the literature ^{[147]}.
The only difference regarding the k-medoids algorithm is that the cluster centers are defined as samples from the dataset that minimize the sum of the intra-cluster distances, i.e., that are closest to the clusters’ centroids.
This clustering algorithm was used by numerous authors ^{[14]}^{[19]}^{[130]}^{[113]}^{[62]}^{[72]}^{[47]}^{[51]}^{[59]}^{[123]}^{[125]}^{[134]}^{[135]}^{[144]}^{[148]}^{[149]}^{[150]}^{[151]}^{[152]}, either by using the partitioning around medoids (PAM) introduced by Kaufman et al. ^{[153]} or by using an MILP formulation introduced by Vinod et al. ^{[154]} and used in several studies ^{[14]}^{[130]}^{[113]}^{[62]}^{[123]}^{[144]}^{[149]}. The MILP can be formulated as follows:
Subject to:
In a number of publications ^{[130]}^{[62]}^{[72]}^{[59]}^{[123]}^{[125]}^{[144]}, k-medoids clustering was directly compared to k-means clustering. The general observation is that k-medoids clustering is more capable of preserving the intra-period variance, while k-means clustering underestimates extreme events more gravely. Nevertheless, the medoids lead to higher root mean squared errors compared to the original time series. This leads to the phenomenon that k-medoids outperforms k-means in the cases of energy systems sensitive to high variance, as in self-sufficient buildings, e.g., as shown by Kotzur et al. ^{[62]} and Schütz et al. ^{[123]}. In contrast to that, k-means outperforms k-medoids clustering in the case of smooth demand time series and non-rescaled medoids that do not match the overall annual demand in the case of k-medoids clustering, as shown by Zatti et al. ^{[125]} for the energy system of a university campus.
Agglomerative Clustering
In contrast to partitional clustering algorithms that iteratively determine a set consisting of k clusters in each iteration step, agglomerative clustering algorithms such as Ward’s hierarchical algorithm ^{[46]} stepwise merge clusters aimed at minimizing the increase in intra-cluster variance
in each merging step until the data is agglomerated to k clusters. The algorithm is thus deterministic and does not require multiple random starting point initializations. Analogously to k-means and k-medoids, the cluster centers can either be represented by their centroids ^{[130]}^{[144]} or by their medoids ^{[18]}^{[34]}^{[130]}^{[62]}^{[45]}^{[59]}^{[119]}^{[124]}^{[127]}^{[128]}^{[133]}^{[144]}. The general property that centroids underestimate the intra-period variance more severely due to the averaging effect is equivalent to the findings when using k-means instead of k-medoids.
Rarely Used Clustering Algorithms
Apart from the frequently used clustering algorithms in the literature, two more clustering algorithms were used in the context of determining typical periods based on unsorted time intervals of consistent lengths.
K-medians clustering is another partitional clustering algorithm that is closely related to the k-means algorithm and has been used in other studies ^{[72]}^{[123]}. Taking into account that the Euclidean distance is only the special case for of the Minkowski distance ^{[155]}
K-medians generally tries to minimize the sum of the distances of all data points to their cluster center in the Manhattan norm, i.e., for γ=1
and the objective function ^{[156]}^{[157]}:
Time Shift-Tolerant Clustering Algorithms
The last group of clustering algorithms applied for TSA in ESMs is time shift-tolerant clustering algorithms. These algorithms not only compare to the values of different time series at single time steps (pointwise), but also compare values along the time axis with those of other time series (pairwise). In the literature ^{[130]}^{[144]}, dynamic time warping (DTW) and the k-shape algorithm are used, both of which are based on distance measures that are not sensitive to phase shifts within a typical period, which is the case for the Euclidean distance. The dynamic time-warping distance is defined as:
where w describes the so-called warping path, which is the path of minimal deviations across the matrix of cross-deviations between any entry of xp and any entry of ck ^{[130]}^{[159]}. The cluster centers ck are determined using DTW Barycenter averaging, which is the centroid of each time series value (within an allowed warping window) assigned to the time step ^{[160]}. Moreover, a warping window ^{[130]}^{[144]} can be determined that limits the assignment of entries across the time steps. Shape-based clustering uses a similar algorithm and tries to maximize the cross-correlation amongst the periods. Here, the distance measure to be minimized is the cross-correlation and the period vectors are uniformly shifted against each other to maximize it ^{[130]}^{[144]}^{[159]}^{[161]}. It must be highlighted that both dynamic time warping and shape-based distance, have only been applied on the clustering of electricity prices, i.e., only one attribute ^{[130]}^{[144]}. Moreover, Liu et al. ^{[133]} also applied dynamic time warping to demand, solar, and wind capacity factors simultaneously. However, it is unclear how it was guaranteed that different attributes were not compared to each other within the warping window which remains a field of future research. Furthermore, a band distance, which is also a pairwise rather than a pointwise distance measure, was used in a k-medoids algorithm by Tupper et al. ^{[152]}, leading to significantly less loss of load when deriving operational decisions for the next day using a stochastic optimization model.
Due to the fact that not all of the methods rely on the representation of each cluster by its centroid (i.e., the mean in each dimension), these typical periods do not meet the overall average value when weighted by their number of appearances and must be rescaled. This also holds true for the consideration of extreme periods, which will be explained in the following chapters. Accordingly, the following section will be referred to if rescaling is considered in the implementation of extreme periods. To the best of our knowledge, the first work that used clustering not based on centroids was that of Domínguez-Muñoz et al. ^{[113]}, in which the exact k-medoids approach was chosen as per Vinod et al. ^{[154]}. Here, each attribute (time series) of each TD was rescaled to the respective cluster’s mean, i.e.,
Furthermore, Domínguez-Muñoz et al. ^{[113]} discarded the extreme values that were manually added from the rescaling procedure. A similar procedure, which was applied for each time series, but not for each TD, was introduced by Nahmmacher et al. ^{[127]}, who used hierarchical clustering based on Ward’s algorithm ^{[46]} and chose medoids as representatives, which was later used in a number of other studies ^{[14]}^{[18]}^{[130]}^{[62]}^{[124]}^{[144]}. Here, all representative days were rescaled to fit the overall yearly average when multiplied by their cardinality and summed up, but not the average of their respective clusters, i.e.,
Schütz et al. ^{[72]}^{[123]}, Bahl et al. ^{[47]}, and Marquant et al. ^{[134]}^{[135]} refer to the method of Domínguez-Muñoz et al. ^{[113]}, but some used it time series-wise and not cluster- and time series-wise. Schütz et al. ^{[72]}^{[123]} were the first to highlight that both approaches are possible. It also needs to be highlighted that these methods are not the only methods, as Zatti et al. ^{[125]}, for instance, presented a method to choose medoids within the optimization problem without violating a predefined maximum deviation from the original data, but for the sake of simplicity, it focused on the most frequently used post-processing approaches. Additionally, other early publications, such as by Schiefelbein et al. ^{[148]}, did not use rescaling at all. Finally, yet importantly, the rescaling combined with the min-max normalization could lead to values over one. Accordingly, these values were reset to one so as to not overestimate the maximum values and the rescaling process was re-run in several studies ^{[14]}^{[18]}^{[62]}^{[124]}^{[127]}. In contrast, Teichgräber et al. ^{[130]}^{[144]} used the z-normalization with rescaling in accordance with Nahmmacher et al. ^{[127]}, but did not assure that the original extreme values were not overestimated by rescaling.
2.2.3. Modified Feature-Based Merging
2.2.4. Linking Typical Periods
2.3. Random Sampling
Another minor group of publications uses TSA based on random sampling. This means that the time steps or periods are randomly chosen from the original time series and considered to be representative for the entire time series. Most of the methods in the following deal with single time steps instead of periods, which is an acceptable simplification when the impact of storage capacity or other intertemporal constraints on the system design can be neglected ^{[151]}.
2.3.1. Unsupervised
However, in the years since 2012, these methods were substituted by supervised random sampling methods.
2.3.2. Supervised
Munoz et al. ^{[169]} applied supervised random sampling for 1 up to 300 daily samples out of a dataset of seven years, which were then benchmarked against the k-means clustering of typical hours. A similar method was used by Frew et al. ^{[170]}, who took two extreme days and eight random days from the dataset and weighted each day so that the sum of squared errors to the original wind, solar and load distribution was minimized. This procedure was then repeated for ten different sets of different days, with the average of each optimization outcome calculated at the end. With respect to time steps, Härtel et al. ^{[59]} either systematically determined samples taking every n_{th} element from the time series or randomly chose 10,000 random samples from the original dataset and selected the one that minimized the deviation to the original dataset with respect to moments (e.g., correlation, mean and standard variation). Another complex algorithm for representing seasonal or monthly wind time series was proposed by Neniškis et al. ^{[79]} and tested in the MESSAGE model. This approach took into account both the output distribution (duration curve) for a TD and the inter-daily variance, not to be exceeded by more than a predefined tolerance, while using a random sampling process. However, only the typical days for wind were calculated in this way, whereas the other time series (electricity and heat) were chosen using TS. Recently, Hilbers et al. ^{[151]} used the sampling method twice with different numbers of random initial samples drawn from 36 years. From a first run, the 60 most expensive random samples were taken and included in a second run with the same number of samples.
These methods are fairly comparable to the method of clustering TTSs. However, the initial selection of samples is based on random choice.
2.4. Miscellaneous Methods
Apart from the random sampling methods that cannot be systematically categorized with the scheme in Figure 1, an even smaller number of publications cannot be grouped in any way with respect to their TSA methods. For the sake of completeness, however, they are presented in the following.
To summarize, special methods that cannot be categorized in any way appear in an irregular manner, but can have special implications for the improvement of preexisting methods.