1. Introduction
Over the past few years, Denial of Service (DoS), Operating System (OS) fingerprinting, and Domain Name System (DNS) botnet techniques have undergone significant advancements, becoming increasingly complex and challenging to identify
[1][2][3][4][5][6][7][8][1,2,3,4,5,6,7,8]. While these activities may not always constitute an attack in the traditional sense, they encompass a spectrum of activities, including reconnaissance and denial of services. The distinction lies in the intent and context of such actions.
Among all the types of cyber attacks, DoS attacks are considered the most harmful, as they can entirely disconnect an organization from the Internet or severely impede network links, resulting in a significant disruption of packet delivery
[8][9][10][8,9,10]. Similarly, OS fingerprinting attacks are crucial for cybercriminals to identify the OS of a target system, which can be exploited to launch further attacks or gain unauthorized access to the system. Unlike DoS attacks, which aim to disrupt the target system, OS fingerprinting is a crucial component of reconnaissance rather than a standalone attack. In this stage, attackers gather crucial data about the target system regarding any potential security weaknesses that could be exploited, such as its network layout, interdependencies of services, and OSes in use
[11][12][11,12]. While OS fingerprinting is not typically considered an attack, it plays a significant role in the broader context of cyber activities. It may be part of an attack described within the cyber kill chain framework. A widespread tool most commonly used for reconnaissance is NMAP. The output of NMAP provides information such as device type, OS family and generation, Common Platform Enumeration (CPE) representation, OS details, uptime guess, network distance, details of the operating system, estimation of uptime, network distance, prediction of Transmission Control Protocol (TCP) sequences, and generation of Internet Protocol (IP) Identifier (ID) sequences
[13]. To perform fingerprinting, NMAP sends out a series of probes, analyzes the responses received, and compares them against a database of known OS characteristics. By carefully analyzing network traffic, NMAP is widely used for attempting to determine the OS running on the target device
[14]. However, it is worth noting that the accuracy of NMAP’s OS identification in real-world networks and on the Internet remains a topic of debate, and there is limited research available on this subject to serve as a reference
[15].
Denial of Service (DoS) attacks have been increasingly disruptive and pose a significant challenge to the stability and security of networked systems. These attacks have been utilized for political purposes, encompassing cyberwarfare, hacktivist actions, and acts of terrorism
[16]. Over time, the landscape of cyber threats has evolved, and the prevalence and impact of DoS and Distributed DoS (DDoS) attacks have continued to escalate. As such, these attacks now stand as formidable adversaries that security systems must confront
[8][17][18][8,17,18]. The DoS GoldenEye and Http Unbearable Load King (HULK) attacks are malicious techniques aimed at overwhelming a target web server’s resources, rendering it inaccessible to legitimate users. In the case of GoldenEye, it operates by bombarding the target server with an extensive barrage of HyperText Transfer Protocol (HTTP) GET or POST requests, exploiting the HTTP protocol’s statelessness. This incessant flood of requests forces the server to allocate resources to each incoming request, leading to resource exhaustion and rendering the server unable to handle legitimate traffic. Similarly, the HULK attack generates numerous HTTP requests rapidly, aiming to exhaust the web server’s resources, especially its network bandwidth and computing capacity. Both attacks share the goal of causing service unavailability, but they employ different methods to achieve this disruptive outcome, making them potent tools in the arsenal of cyber attackers
[8][19][8,19]. Attackers are adopting more intricate strategies, utilizing botnets composed of compromised devices to launch DDoS attacks. These botnets can execute coordinated, large-scale attacks, making detection and mitigation more challenging. Attack vectors and techniques are also diversifying, including reflection and amplification attacks that exploit vulnerable servers to magnify the assault’s impact. Moreover, attackers increasingly leverage encryption to obfuscate their activities and make attribution more complex. In response to these advancements, defenders are developing more resilient mitigation measures and leveraging machine learning, Artificial Intelligence (AI), and advanced traffic analysis to detect and thwart attacks in real-time. As the arms race between attackers and defenders continues, the evolution of DoS attacks remains critical in cybersecurity
[20][21][22][20,21,22].
Amidst this ever-evolving landscape, a particular threat has gained prominence as well. As an insidious strain of malware, DNS botnets have significantly intensified their presence on the digital landscape and caused several negative economic impacts
[7]. Their operational methods are remarkably disruptive, utilizing an extensive network of compromised devices to coordinate and execute synchronized attacks. The primary impact of DNS botnets is their potential to overwhelm and flood their target with malicious DNS traffic, rendering the targeted system virtually inaccessible. This assault can paralyze essential online services and disrupt business operations, thereby incurring significant economic losses. These economic disruptions can affect organizations ranging from corporations and financial institutions to e-commerce platforms, healthcare, and governmental agencies. As a result, DNS botnets have emerged as a formidable threat in the cybersecurity arena. The harm inflicted by DNS botnets goes beyond mere financial losses. The consequences of these attacks extend to an erosion of trust and reputation for the targeted entities. Downtime resulting from a DNS botnet assault can leave a lasting impact on the user experience, potentially driving customers away and causing long-term damage to an organization’s brand
[6][7][23][24][6,7,23,24]. Additionally, service disruptions can result in significant data loss, posing a severe threat to the confidentiality and integrity of sensitive information. The consequences become dangerous when these attacks are directed towards vital infrastructure. They can potentially disrupt essential services such as healthcare, transportation, and emergency response systems, as noted in various studies
[25][26][25,26]. It is worth highlighting that the increasing integration of Internet of Things (IoT) devices into critical infrastructure renders them particularly susceptible to exploitation. These connected devices often lack robust security measures, making them attractive targets for malicious actors seeking to amplify the impact of their attacks
[27][28][27,28].
2. Network Scanning Incident Classification
In
[29][44], the authors present a passive Operating System (OS) fingerprinting approach to detect unauthorized OSes in an enterprise network. They use Transmission Control Protocol (TCP) fields such as Time to live (TTL), total length, and window size to detect the OSes generating the packets. Their methodology involves analyzing network traffic and comparing it to a database of known OS fingerprints. The results show a high accuracy in identifying and detecting unauthorized OSes. Another paper aims to do something similar, but instead looks at the popular network scanning tool NMAP
https://netml.github.io (accessed on 5 January 2024)
[11][12][11,12]. Their paper discusses the importance of detecting NMAP scanning behavior to protect hosts from malicious attacks. They discuss that traditional defense methods like firewalls are less effective at detecting NMAP scanning. However, Intrusion Detection Systems (IDS) can monitor network security events and alert when abnormalities appear. The authors propose a Comprehensive NMAP Detection Rules (CNDR) set based on the Suricata system and consider IDS evasion. The CNDR achieves a detection rate of 100% for regular NMAP scanning and 91.7% for the detection accuracy of NMAP with IDS evasion on the authors’ designed dataset.
3. DDoS Attack Incident Classification
Considerable efforts have been dedicated to classifying and detecting Denial of Service (DoS) attacks. Many papers employ similar approaches, which aim to detect and classify DoS attacks
[30][31][32][33][34][45,46,47,48,49]. The Distributed Reflection Denial-of-Service attack (DRDoS) is covered extensively in
[30][45]. Their study examines and identifies the differences between TCP-based and User Datagram Protocol (UDP)-based DRDoS attacks. They examine the DRDoS attack, and find that it exploits vulnerabilities in the UDP protocol to flood a target with traffic; they explain that the UDP protocol is vulnerable because it allows the amplification of responses and does not verify the source Internet Protocol (IP) addresses. A proposed solution is IEWA, which combines increased expenses and weak authentication to protect the Network Time Protocol (NTP). Another paper in which DRDoS is examined is
[31][46]. The authors evaluate the susceptibility of popular UDP-based protocols to DRDoS attacks, finding 14 protocols vulnerable, with traffic multiplied up to a factor of 4670. They further identify millions of potential amplifiers for six vulnerable protocols, and evaluate countermeasures against DRDoS attacks, showing that poorly designed rate-limiting solutions are evaded by some attacks, and packet-based filtering techniques are also evaded. They propose a threat model that analyzes P2P botnets that use amplification attacks to understand the potential severity of such attacks. The key focus is that an attacker aims to consume all available bandwidth of a victim by using systems that reflect the attack traffic to the victim. Another work allows sharing attack data and anomaly profiles with other parties without disclosing data
[32][47]. Recent works have explored FL methods on the CIC-DDoS2019
[35][50] dataset, including LwResnet, FLDDoS, and FIDS. However, some solutions rely on the vanilla FEDAVG algorithm, which can increase the convergence time on unbalanced non-i.i.d. attack data and may jeopardize clients’ privacy with data-sharing mechanisms
[32][47]. In response to these challenges, the authors suggest an adaptive Federated Learning (FL) approach named FLAD. FLAD facilitates the collaborative training of deep learning models using distributed profiles of cyber threats while preserving the confidentiality of training data. The proposed solution manages the FL process by dynamically allocating additional computation resources to members with more intricate attack profiles, all without the need to share test data. The study showcases that FLAD surpasses the performance of the original FL algorithm in terms of convergence time and accuracy across various unbalanced datasets featuring heterogeneous Distributed Denial of Service (DDoS) attacks. Meanwhile, other approaches aim to more closely fingerprint at the packet level
[33][48]. The authors use an approach to classify traffic patterns based on their statistical properties, including packet length, packet inter-arrival time, and time-to-live. The proposed system generates application fingerprints based on transport layer packet-level and flow-level features. These fingerprints identify Distributed Denial of Service (DDoS) attacks by analyzing statistical information collected at the flow level. Paper
[34][49] discusses how easily DoS attacks can be launched, but detection and response are often manual and slow. Present methods relying on packet headers are also vulnerable to spoofing. The framework relies on header content, transient ramp-up behavior, and spectral analysis, making it more challenging to spoof. By evaluating a regional Internet Service Provider (ISP)’s access links, they could detect 80 live attacks. The framework has several applications, including aiding in the rapid response to attacks, developing realistic models of DoS traffic, and estimating the level of DoS activity on the Internet.
4. Automated Attack Classification and Intrusion Detection
Several studies have been conducted on classifying and detecting network attacks using machine learning algorithms. The following papers
[36][37][38][39][40][41][42][51,52,53,54,55,56,57] all have similarities in their methodology, but differ in which dataset and problem they tackle. Bayu Adhi et al. discuss using deep neural networks (DNNs) for classifying attacks in the transportation layer of Internet of Things (IoT) networks in their paper
[36][51]. They discuss how anomaly detection is considered one of the most demanding tasks in intrusion detection systems (IDSs), and the authors propose a robust DNN classifier model that can intelligently detect different kinds of attacks. The proposed method is evaluated on three benchmark datasets (UNSW-NB15, CIDDS-001, and GPRS) in wired and wireless network environments. The authors use a grid search strategy to obtain the best parameter settings for each dataset, and their experimental results show that the DNN approach is practical regarding accuracy, precision, recall, and false alarm rate. Another recent paper looked into preventing and detecting cyber attacks on IoT devices
[37][52]. They evaluate various machine learning techniques, including k-nearest neighbor, support vector machine, decision tree, naive Bayes, random forest, artificial neural network, and logistic regression, for both binary and multi-class classification. The authors use the Bot-IoT dataset to compare and evaluate the algorithms based on accuracy, precision, recall, accuracy, and log loss metrics. Their results demonstrate that random forest outperforms other algorithms compared to binary classification, while k-nearest neighbor (kNN) performs best in multi-class classification. The paper also provides an overview of the increasing number of IoT devices, the associated risks of cyber-attacks, and the limitations of traditional intrusion detection systems in detecting these attacks. The authors of
[38][53] provide a method of identifying DDoS attacks using a semi-supervised machine learning approach. The approach involves obtaining clusters of network traffic data using unsupervised methods and then labeling them through a voting method to mark normal, DDoS, and suspicious traffic. The dataset used consists of three features extracted using Principal Component Analysis (PCA), and three machine learning algorithms are applied—kNN, Support Vector Machine (SVM), and Random Forest (RF)—to classify the labeled traffic data. The emergence of Software-Defined Networking (SDN) has given rise to a new form of networking, bringing about new types of attacks
[39][54]. Ahuja et al. propose a machine learning-based solution to classify benign traffic from DDoS attack traffic by using novel features for DDoS attack detection. They create a dataset of SDN traffic logs and use a hybrid machine learning model of Support Vector Classifier with Random Forest (SVC-RF) to classify traffic. The authors highlight the security issues of SDN and explain how DDoS attacks can occur at different architectural planes. To achieve this, the authors create a dataset of SDN traffic logs with novel features for DDoS detection, using a hybrid machine learning model to classify traffic and evaluate the model’s accuracy. The authors used a hybrid model of Support Vector Classifier with Random Forest (SVC-RF) to classify the SDN traffic logs. Regarding threat classification, two papers provide an excellent framework
[41][42][56,57]. The two papers use a similar approach of proposing a new method to detect anomalies. The
[41][56] paper proposes an algorithm that involves three steps: identifying related packets/flow records, deriving metrics related to the anomaly, and classifying the anomaly using a signature-based approach. In contrast, ref.
[42][57] proposes developing machine-learning models to classify HyperText Transfer Protocol (HTTP) requests as normal or malicious to detect web application attacks. They both validate their proposed methods on datasets while highlighting the importance of automated classification.
5. DNS-Based Botnet Detection
There have also been many studies in the field of Domain Name System (DNS) botnet detection. Singh et al. discuss the limitations of existing surveys, which either need more in-depth comparisons or cover the full spectrum of DNS-based botnet detection techniques. This work aims to tackle this problem; the research focuses on botnet detection methods, specifically those using the DNS Protocol. The contributions of the study include categorizing DNS-based botnet detection techniques, providing an analysis of each technique within these categories, and proposing essential attributes for an innovative DNS-based botnet detection system
[6][7][6,7]. Similarly, in a related context, the following work discusses a proposed system called “Notos” for dynamic DNS reputation scoring, which has shown promise in identifying malicious domains with high accuracy and low false positives in an extensive ISP’s network
[23]. Another more active approach was introduced by Ma et al., who proposed an active probing approach by looking at DNS query characteristics, such as leveraging the Time to live (TTL)-based caching mechanism of R-DNS servers. By probing the cache of these servers, the monitoring system can observe cache behavior and, in turn, estimate DNS query activities. This approach significantly reduces management costs and privacy concerns
[24]. Other research also leverages DNS query data to detect malicious DNS traffic
[6][43][44][45][6,58,59,60]. Another study has adopted a comparable methodology in a published paper, employing Genetic Algorithms (GA) to enhance the feature selection capabilities of an Intrusion Detection System (IDS). This approach aims to optimize the system’s performance within resource-constrained environments such as the Internet of Things (IoT)
[46][42]. The collective body of research in DNS botnet detection highlights the diverse strategies employed to address this critical cybersecurity concern, from passive monitoring to active probing, all with the common aim of enhancing network security and safeguarding against malicious DNS traffic.
6. Feature Selection Approaches: Recursive Feature Elimination (RFE) and Autoencoder Analysis
While researchers' approach only covers using Genetic Algorithms (GA) to find the most prominent features in packet header data, there are other approaches which aim to do the same thing. Such approaches are Recursive Feature Selection (RFE) and Autoencoders. There have been studies which focus on using Recursive Feature Selection (RFE) to perform intrusion detection
[47][48][49][35,40,61]. The Recursive Feature Elimination (RFE) algorithm systematically removes features, initially evaluating classifier performance with the entire feature set and progressively generating subsets by eliminating features. This iterative process determines the most effective subset
[47][35]. Awad et al. combine cross-validation with feature elimination, further refining the feature count and enhancing model performance
[48][40]. Similarly other methodologies such as Autoencoders are also viable for feature selection. The authors from
[50][62] introduce a novel approach leveraging Autoencoder (AE) technology to discern behavioral patterns in IoT attacks. Their method innovatively constructs features by autonomously learning semantic similarities between command-derived data, providing improved clustering and a deeper understanding of attack behavioral patterns compared to traditional approaches. Another approach looks specifically at SQL injection attacks. Thalji et al. proposed leveraging an Autoencoder network (AE-Net) to automatically engineer features for detecting SQL injection attacks. By extracting deep features from SQL textual data, the AE-Net facilitates the creation of a more efficient data representation. Their method, integrated with the extreme gradient boosting classifier, achieved a k-fold accuracy score of 0.99, surpassing existing approaches. Employing techniques like hyperparameter tuning and validation via k-fold cross-validation ensured robust performance evaluation
[51][38].