To address the limitation of the LOF in data streams, new methods should be developed. The primary purpose of any new method would be to measure the LOF score in all of the following circumstances: (1) keeping only a small part of the dataset in the computer memory; (2) the algorithm has no prior knowledge about the data distribution; (3) for an incoming data point pt, the algorithm should verify whether it is an outlier or inlier at the current time T; and (4) when the algorithm detects the current outlier, it has no prior knowledge about future data points.
To overcome the problems of the LOF in stream environments, we have designed a new methodology to detect local outliers. This methodology contains two phases: (1) the detection phase and (2) the summarization phase. For the detection phase, the ILOF is used with a skipping scheme [
75,
77]. For the summarization phase, the Genetic Density Summarization algorithm (GDS), based on the genetic algorithm (GA), is used to summarize the dataset. The framework of our methodology, named Genetic-based Incremental Local Outlier Factor (GILOF) [
115], works as follows: First, the maximum size of the window (
W) is determined as
W-size. After that, the threshold of the LOF is applied to detect the local outliers, relying on the threshold
θ, then using the GDS to summarize the data points. Thereafter, the GILOF uses the ILOF together with a skipping scheme to detect the local outlier when an incoming data point arrives. It is worth noting that the skipping scheme is used in order to detect the sequence of outliers, when the sequence of outliers is outlier data points that are trying to build a new class. The GILOF continues to detect the outlier data points and to compute the LOF values for every new data point until the window reaches the
W-size. When the window becomes full, the GDS algorithm is applied to the window in order to summarize
50%W of the older data points in the window; it does so by choosing
25%W of the fittest data points to represent the
50%W of the older data points. After this, the GILOF deletes the older
50%W of the data points from the window and the selected
25%W of data points is transferred to the window and merges with the remaining
50%W. These
75%W of data points joins with the newly incoming data points in the window. When the window reaches the W-size again, the GDS repeats the same process. The video in [
116] displays a simulation of the GILOF system process. shows the overall design and workflow for the methodology.
Local Outlier Factor by Reachability Distance (LOFR) is similar to the LOF, but the LOFR has a different calculation method, in which it does not need to apply the local reachability density [
117]. To calculate the score of outlierness, the LOFR uses
k-distance,
k-nearest neighbor, and Reachability distance
Rd. Subsequently, the Reachability distance
Rd of data point
p will be divided by the average
Rd of the data point
p neighbors. This new calculation method for local outlier detection can produce a lower “outlierness” score than the LOF. The LOFR can produce a more accurate outlierness score in various datasets. The LOFR score is calculated by using Equation (6).