As mentioned before, the amount of data collected by IoT devices and sensors is immense and contains valuable forensic evidence. This data can help identify and prevent unauthorised access within smart environments. The authors of [
5] designed a new framework known as IoTDots to help protect the data collected by various smart devices and applications. This features two main components: the IoTDots analyser and the IoTDots modifier. The former scans the source code of the applications and detects forensic information. The latter automatically inserts tracking logs and reports the results. However, to reduce the amount of manual analysis required in DFI, ref. [
16] proposed a methodology for the automatic prioritising of suspicious file artefacts. Rather than providing the final analysis results, this methodology aims to predict and recommend the artefacts that are likely to be suspicious. A supervised machine-learning approach is used, which makes use of previously processed case results. One of the most discussed challenges in DFI is the growing volume of data. Since the majority of file artefacts on seized devices are usually irrelevant to the investigation, manually retrieving suspicious files relevant to the investigation is very difficult. In support of DF, “intelligent methods” are proposed, which include the ability of computers to learn a specific task from data, data mining, machine learning, soft computing, and traditional artificial intelligence. This term is commonly used to express ways to automate problem solving in DF, and two main intelligent approaches are utilised, namely rule-based and anomaly-based [
17]. The authors of [
18] introduced a novel and practical DF capability for smart environments, since current smart platforms lack any digital forensic capability for identifying, tracing, storing, or analysing data generated in these environments. The collector and the analyser are the two main components of VERITAS. The collector employs mechanisms to automatically collect forensically relevant data from the smart environment. The analyser then uses a first-order Markov chain model to extract valuable and usable forensic evidence from the collected data for the purposes of a forensic investigation. Therefore, to discover and declare the presence of adversaries, DF necessitates intensive data analysis, such as retrieving and confirming system logs, blockchain information evaluation, and so on. Hence, ref. [
19] proposed a blockchain-assisted shared audit framework to analyse DF data in an IoT environment. This was created to identify the sources and causes of data scavenging attacks in virtualised resources. It uses blockchain technology to manage access logs and controls. Using logistic regression ML and cross-validation, access-log data is examined for the consistency of adversary event detection. The number of cases needing DF competence and the volume of data to be processed have overburdened digital forensic investigators. Automated evidence processing based on artificial intelligence techniques holds considerable potential for speeding up the digital forensic analysis process while improving case-processing capacity [
4]. In DFI, automation uses ML techniques for classification. ML techniques can obtain important information for investigations more efficiently by exploiting existing digital evidence-processing knowledge. Additionally, digital-evidence triage was developed for the prompt detection, processing, and interpretation of digital evidence. Currently, with AI techniques, the investigator determines the priority of device gathering and processing at a crime scene [
4]. Furthermore, ref. [
20] proposed an intelligent framework based on clustering and classification. The model learns from past crimes, and, when a new crime is registered, some of the crime information needs to be inserted by the investigator, such as the crime type, location, and time. The clustering process then automatically groups the new crime with previous similar crimes in the system using the k-nearest neighbour and crime-matching classification algorithms. In this way, the investigator can gain insights into the pre-investigation process by exploring the new crime, which is then clustered with previous similar crimes. Moreover, with the growth of cybercrime that targets minors, chat logs can be examined to detect and report harmful behaviour to law authorities. This can make a significant difference in protecting youngsters on social media platforms from being abused by cyber predators. Since DFI is done primarily by hand, the enormous volume and variety of data cause DF investigators to have a tough assignment; Ref. [
21] suggested an approach using a DF process model backed by ML methodologies, to enable the automatic finding of hazardous talks in chat logs. One of the most fundamental characteristics of any smart device in an IoT network is its ability to acquire a bigger set of data than has been produced and then send the obtained data to the destination/receiver server through the internet. Thus, IoT-based networks are particularly vulnerable to simple or sophisticated assaults, which must be discovered early in the data transmission process in order to protect the network against these hostile attacks. The authors of [
22] developed and built an intelligent intrusion detection system utilising machine-learning models so that assaults in the IoT network may be discovered. The adaptability of IoT devices raises the probability of continual attacks on them. Due to the low processing power and memory of IoT devices, security researchers have found it challenging to preserve records of diverse attacks performed on these devices during a DFI. The authors of [
23] proposed an intelligent forensic analysis mechanism, to automate the detection of attacks on IoT devices based on the machine-to-machine framework. However, the proposed mechanism combines several ML techniques and different forensic analysis tools to detect different types of attacks. Furthermore, by providing a third-party logging server, the problem of evidence gathering has been overcome. To assess the effects and types of attacks and violations, forensic analysis is done on logs utilising a forensic server. In addition, ref. [
24] indicated that the use of ML and deep-learning algorithms is effective for cyber-attack discovery, identification, and tracing by proposing a framework of cyber-attacks against smart satellite networks. In addition, IoT forensics and smart environments, with their recognised challenges, provide a great opportunity to develop new forensic tools to make the task of forensic investigators easier, which can be used for acquiring, preserving, and also analysing such forensic data. The authors of [
25] proposed a user-friendly tool for smart devices that support WiFi and used smart-environment scenarios to allow forensic investigators, network administrators, and data scientists access to various features of network traffic with simple steps. The proposed tool allows network traffic features to be computed in real time on any WiFi access point running the OpenWrt firmware, avoiding the time-consuming tasks of dumping network traffic and implementing the procedures needed to analyse the captured traffic. On the other hand, due to the lack of examination and available data, ref. [
26] selected a smart fridge as an IoT device to be examined and investigated. The dataset was examined using two ML algorithms, Bayes net and decision stump. Each algorithm represents a distinct idea. A stump tree is a simple version of the decision-tree ML technique. The Bayes net is useful for estimating the likelihood of numerous recognised causes, one of which is the occurrence of an event. The validation results indicate that the Bayes net algorithm is more accurate than the decision stump tree.
Research shows that the main issues that face DF investigators in the smart environment are the large volume of data and attack and violation detection. The proposed solutions are summarised in Figure 1 and Figure 2. The authors decided to split the summary into two separate figures, since there were two main themes detected in all existing solutions: the first theme involved MLF solutions for large amounts of data, while the second theme involved MLF solutions for attack and violation detection.