Please note this is a comparison between Version 1 by Zhengwei Qu and Version 2 by Conner Chen.

False data injection attacks (FDIAs), as a covert cyber-attack method, pose a huge challenge to the safe and stable operation of smart grids by illegally hacking into power systems to tamper with measurement data and thus undermine data integrity.

- false data injection attacks (FDIAs)

In recent years, sensing, communication, and control technologies have been able to realize the seamless integration in smart grids. Hence, the physical and network fields of the power system are deeply integrated to form a cyber-physical system [1]. After collecting measurement data through remote terminal units (RTUs), the smart grid relies on the state estimation algorithm to achieve its regulation. Thus, the main purpose of cyberattacks is to undermine or to mislead the state estimation mechanism, leading to incorrect decision-making in the energy management system (EMS). In a highly complex and automated environment, a cyber-attack may propagate to the entire system, triggering grid paralysis, mass outage incidents, and so on, such as the massive power outage that occurred in Venezuela on 7 March 2019 [2].

False data injection attacks (FDIAs), as a covert cyber-attack method, pose a huge challenge to the safe and stable operation of smart grids by illegally hacking into power systems to tamper with measurement data and thus undermine data integrity ^{[3][4][5]}[3,4,5]. In [6], Liu et al. first proposed the concept of FDIAs and mentioned that attackers can use power system topology and parameter information to construct a well- designed attack vector that bypasses traditional bad data detections (BDDs) and destroys the integrity of smart grid information. Since attackers can construct extremely hidden FDIAs without relying on system configuration information, it is difficult for traditional model-based detection methods and boundary protection systems to handle such FDIAs. In order to run the power grid safely and steadily, an effective FDIAs testing scheme needs to be studied and developed, which has been intensively studied by many researchers.

From the perspective of defenders, some methods have been improved for the state estimation algorithm in the study of FDIAs detection. The improved state estimation methods mainly include residual detection method [7], measurement transformation detection method [8], and some detection methods related to the use of Kalman filters ^{[9][10]}[9,10]. The FDIAs detection method based on state estimation is mainly used for static analysis and detection of attacks at specific moments. When the power system fluctuates, it is prone to missed detection and false detection [11].

The increase in the deployment of wide-area measurement system provides massive data for the analysis of power system data. Therefore, artificial intelligence technology should be gradually increased in the FDIAs detection, mainly including support vector machine [12], extreme learning machine [13], fuzzy c-means clustering [14], deep learning ^{[15][16]}[15,16], integrated learning [17], etc. The advantages of such methods are that they do not need to solve complex power system time domain equations and their calculation speed is fast. However, the disadvantage is that the test results are highly dependent on the training process of the model. Improper selection of training samples can directly affect detection performance.

Since the power system is in continuous dynamic operation and a space-time correlation exists between different measurement data or state variables, most attacks are continuous. Therefore, it is feasible to consider using historical data for trajectory prediction analysis to detect FDIAs, which mainly includes statistical consistency detection, sequence consistency detection and sensor trajectory prediction.

Kurt et al. [18] used the generalized cumulative sum (CUSUM) algorithm for quickest detection of FDIAs. This method is robust to time-varying state, attack and attacked instruments in both centralized and distributed environments. Similarly, Li et al. [19] proposed a sequence detector based on a broad analogy for sequential detection of FDIAs in the smart grid. This detector is significantly superior to first order CUSUM detector in terms of robustness and average detection delay performance. In [20], Malhotra et al. proposed a stacked LSTM prediction network to effectively detect time series anomalies or failures, modeling the prediction error as a multivariate Gaussian distribution to evaluate abnormal behavior. By analyzing and learning the original measurement data, ^{[21][22]}[21,22] used different methods to detect abnormal data which did not conform to the historical measurement distribution, which however failed to detect false data matching the historical measurement distribution. Khalid et al. [23] proposed multi- sensor track fusion-based model prediction for malicious attacks in PMUs, which can use smoothing algorithm based on Kalman particle filter to detect attacks at each monitoring node. The online FDIAs detection process of SCADA and PMU hybrid measurement is proposed in [24], which can effectively find the spatial hidden FDIAs based on multi-matching state prediction. However, when conditions such as load mutation or equipment failure occur, the state prediction results are seriously misaligned and thus affect the detection results.

Considering the time correlation of node measurement, Zhao et al. [25] compared prediction data with collected data based on short-term state prediction method, and further built detection index in combination with traditional measurement residual analysis. In order to solve the problem that it is impossible to detect attacks similar to historical data, Gu et al. [26] considered the characteristics of measured data variation and proposed a detection method based on Kullback-Leibler distance (KLD). However, the method failed to detect attacks on some nodes. A real-time detection scheme of FDIAs based on joint transformation is proposed in [27], but the detection accuracy is reduced when the attack is less intense.

The detection method based on trajectory prediction analysis is mainly used to predict the distribution of state variables according to the operation law of the system state and of the historical database. By comparing the running track, various types of FDIAs can be detected effectively. However, there are two problems when the probability density function is used to represent the data running track. One is the problem of overlapping distributions, and the other is the difficulty of detecting historical data replay attacks.

Assuming that the grid has N+1 nodes and M measurement devices. Based on the common linear DC model, measurement equation and state equation of discrete linear power system are given as follows:
where h(⋅) is the measurement function; zt=[zT1,t,zT2,t,…,zTM,t]T is the measured vector at time t; zM,t=[zM,t,1,zM,t,2,…,zM,t,λ]T is the measured vector of the M-th measurement device; et=[eT1,t,eT2,t,…,eTM,t]T∼N(0,σ2eIMλ) is the measured noise vector; f(⋅) is the transfer function of state vector x at time t−1; xt=[x1,t,x2,t,…,xN,t]T is the state vector; vt=[v1,t,v2,t,…,vN,t]T∼N(0,σ2vIN) is the process noise vector; IN is the unit matrix.

$${z}_{t}=h\left({x}_{t}\right)+{e}_{t}$$

$${x}_{t}=f({x}_{t-1{}_{})+{v}_{t}}$$

In (1) and (2), between each time interval *t*−1 and *t*, *λ*∈{1,2,3,…} is usually small. Therefore, the collected measurement data between time *t*−1 and *t* needs to be processed at time *t*.
## 3. Bad Data Detection and Identification

System with data acquisition and monitoring control can collect real-time measurement data and make state estimation. In order to eliminate the error caused by non-human factors [28] and ensure the reliability of the state estimation results, there is a built-in BDD scheme in EMS for bad data detection and identification. The essence of the traditional method of detecting and identifying bad data can be summed up as residual method. The residual vector r is first determined by calculation, and then different detection standards are used for judgment. In other words, bad data can be detected by calculating r as follows:
where r is the residual vector; I is the unit matrix; e is the measurement error; R−1 is the weight matrix; S=I−H(HTR−1H)−1HTR−1 is the residual sensitivity matrix of order m×m.

$$$$
r
=
z
−
z
^
=
h
(
x
)
+
e
−
(
h
(
x
)
+
H
(
x
−
x
^
)
)
=
e
−
H
(
H
T
R
−
1
H
)
−
1
H
T
R
−
1
e
=
(
I
−
H
(
H
T
R
−
1
H
)
−
1
H
T
R
−
1
)
e
=
S
e
r=z−z^=h(x)+e−(h(x)+H(x−x^))=e−H(HTR−1H)−1HTR−1e=(I−H(HTR−1H)−1HTR−1)e=Se

Taking the extremum detection method of objective function [29] as an example. The extremum of objective function established by residual vector is as follows:
where J(x^) approximates the χ2 distribution of m−n degrees of freedom. Given the detection confidence interval, bad data exists when the detection indicator exceeds the threshold γ0 and the probability is p, where γ0=χ2(m−n),p, p=Pr(J(x^)≤χ2(m−n),p). Define the target function detector:

$$$$
J
(
x
^
)
=
[
z
−
h
(
x
^
)
T
]
R
−
1
[
z
−
h
(
x
^
)
]
=
r
T
R
−
1
r
J(x^)=[z−h(x^)T]R−1[z−h(x^)]=rTR−1r

$$$$
D
J
(
x
^
)
(
z
)
=
{
1
J
(
x
^
)
>
γ
0
,
bad
data
0
J
(
x
^
)
≤
γ
0
,
no
bad
data
DJ(x^)(z)={1 J(x^)>γ0, bad data0 J(x^)≤γ0, no bad data

In order to further eliminate bad data by identifying them, the generally adopted criterion is the “*3σ*” principle. When the system has bad data, the measurement corresponding to the maximum residual should be corrected and the above detection process should be repeated until all elements in the residual vector are within the threshold.
## 4. Principle of False Data Injection Attack

Attacker can successfully inject into measurement data by constructing the effective attack vector. Traditional FDIAs are typically given as follows:
where *a* is the injected false data attack vector; *z*_{a} is the attacked measurement vector; *x* is the estimation vector of original measurement vector *z* without attack.

$$$$
z
a
=
z
+
a
=
h
(
x
)
+
a
+
e
za=z+a=h(x)+a+e

If *z* can bypass the traditional bad data detector based on residuals, then a can also bypass BDD, satisfying the following equation:
where c=[c1,c2,⋯,cn]T is the arbitrary non-zero vector of n×1, which represents the vector that is deviated by the system state vector after FDIAs. xa=(HTR−1H)−1HTR−1za =x+c is the vector of n×1, which represents the state estimator of za. The purpose of FDIAs is to mislead the system operator to take xa as the state vector, so the expressions of za and residual ra are respectively as follows:

$$$$
a
=
H
c
a=Hc

$$$$
z
a
=
H
x
+
H
c
+
e
=
H
(
x
+
c
)
+
e
=
H
x
a
+
e
za=Hx+Hc+e=H(x+c)+e=Hxa+e

$$$$
r
a
=
z
a
−
H
x
=
z
+
a
−
H
(
x
+
c
)
=
z
−
H
x
ra=za−Hx=z+a−H(x+c)=z−Hx

At this point, the traditional method of bad data detection and identification fails to FDIAs, which allows attacker to tamper with the measurement data at will.