To overcome the problems of the LOF in stream environments, we have designed a new methodology to detect local outliers. This methodology contains two phases: (1) the detection phase and (2) the summarization phase. For the detection phase, the ILOF is used with a skipping scheme [
75,
77]. For the summarization phase, the Genetic Density Summarization algorithm (GDS), based on the genetic algorithm (GA), is used to summarize the dataset. The framework of our methodology, named Genetic-based Incremental Local Outlier Factor (GILOF) [
115], works as follows: First, the maximum size of the window (
W) is determined as
W-size. After that, the threshold of the LOF is applied to detect the local outliers, relying on the threshold
θ, then using the GDS to summarize the data points. Thereafter, the GILOF uses the ILOF together with a skipping scheme to detect the local outlier when an incoming data point arrives. It is worth noting that the skipping scheme is used in order to detect the sequence of outliers, when the sequence of outliers is outlier data points that are trying to build a new class. The GILOF continues to detect the outlier data points and to compute the LOF values for every new data point until the window reaches the
W-size. When the window becomes full, the GDS algorithm is applied to the window in order to summarize
50%W of the older data points in the window; it does so by choosing
25%W of the fittest data points to represent the
50%W of the older data points. After this, the GILOF deletes the older
50%W of the data points from the window and the selected
25%W of data points is transferred to the window and merges with the remaining
50%W. These
75%W of data points joins with the newly incoming data points in the window. When the window reaches the W-size again, the GDS repeats the same process. The video in [
116] displays a simulation of the GILOF system process. shows the overall design and workflow for the methodology.
Figure 7. Diagram of the overall design and workflow for the methodology.
Local Outlier Factor by Reachability Distance (LOFR) is similar to the LOF, but the LOFR has a different calculation method, in which it does not need to apply the local reachability density [
117]. To calculate the score of outlierness, the LOFR uses
k-distance,
k-nearest neighbor, and Reachability distance
Rd. Subsequently, the Reachability distance
Rd of data point
p will be divided by the average
Rd of the data point
p neighbors. This new calculation method for local outlier detection can produce a lower “outlierness” score than the LOF. The LOFR can produce a more accurate outlierness score in various datasets. The LOFR score is calculated by using Equation (6).
The GILOF algorithm is discussed extensively in [
115]. By trying to enhance the effectiveness of the GILOF algorithm, we propose another calculation method for the LOF, which is named LOFR. The newly adapted algorithm in the data stream is named Genetic-based Incremental Local Outlier Factor by Reachability Distance (GILOFR). The GILOFR algorithm is also extensively discussed in [
117]. For future work, the other traditional local outlier detection algorithms, such as the COF, LoOP, LOCI, and the INFLO, etc., can be adapted to work in a data stream. To execute these traditional algorithms in a data stream, the mechanisms of the above-mentioned methods, such as the GILOF and DILOF algorithms, should be applied.