Image-Based Malware Detection: Comparison
Please note this is a comparison between Version 1 by Stavros Shiaeles and Version 2 by Mona Zou.

Image conversion of malicious binaries, or binary visualisation, is a relevant approach in the security community. It has exceeded the role of a single-file malware analysis tool and has become a part of Intrusion Detection Systems (IDSs) thanks to the adoption of Convolutional Neural Networks (CNNs).

  • image-based malware detection
  • convolutional neural networks
  • space-filling curves
  • machine learning

1. Introduction

Image conversion of malicious binaries, or binary visualisation, is a relevant approach in malware detection and analysis. Recent advancements in Deep Learning (DL) and computer vision have allowed security researchers to successfully incorporate image processing techniques in their arsenal [1]. According to this approach, 2D visualisations of malicious and benign files can be used to train Machine Learning (ML)-based classifiers to detect the existence of malware in new entities. In this regard, binary visualisation is fast becoming a key component for Intrusion Detection Systems (IDSs) that wish to overcome the limitations of traditional signature-based techniques. For example, one such limitation is high signature generation sensitivity imposed by even minor differences in code among malware variants [2][3][2,3]. Another restriction that is forced by a signature-based detection scheme is the need for constant (and often manual) database updating [2].
Image-based IDSs will also play a vital role in the future as malware detection tools will have to comply with the EU’s General Data Protection Regulation’s (GDPR) “privacy by design” approach. According to this, malware detection software should avoid any personal data processing, or employ encryption/pseudoanonimisation techniques in any other case [4][5][4,5]. For the moment, GDPR, which came into force in 2018, does not impose specific technical directions for applications that stand between security and personal data processing. At the same time, it releases any data that is processed to a state of anonymity from complying with the policy [5]. In that respect, image conversion of personal data (such as network traffic) can act as a form of masking user credentials and facilitate the development of GDPR-compliant security tools.
Image conversion of malware binaries is therefore a promising tool for the security community. Since the first attempts of visualising binaries [6][7][6,7] and the first image-based malware classification system [8], there has been a large volume of published studies experimenting with image generation or training classifiers. For example, some studies have been focused on texture analysis [8][9][10][8,9,10], or DL [2][3][11][12][2,3,11,12], all of which demonstrated positive results towards malware detection.

2. Malware/Benign File Detection

Ref. [13][17] proposed a system based on binary visualisation with a fuzzy set-based colouring scheme. The images were then passed to several CNN classifiers to detect malicious behaviour. Ref. [14][23] presented a pattern-based approach for insider threat classification, where they used image-based feature representation to convert log data into images for Wavelet CNN classification. Ref. [15][24] proposed a Windows malware detection system using CNN and AlexNet learning models. Image conversion of Windows PE files was also the subject of [16][25], who performed an extensive experiment of a malware detection technique based on RGB images. More specifically, they tested 15 fine-tuned DL models for feature extraction and 12 ML algorithms as the final classifier, from which the winning models were RegNetY320 and SVM. In their research, they also employed data augmentation and transfer learning. Even though [17][26] did not achieve high accuracy scores when applying CNN classification to their visualisations, they presented an analysis of the interpretability of the malicious patterns within the images. Ref. [18][27] proposed a visualisable malware detection method based on multi-dimension dynamic behaviours. However, their analysis was carried out on a limited malware dataset. Ref. [19][28] introduced an ensemble architecture for malware detection that is adaptable to different types of malware. The authors developed a combination of multiple ML algorithms and used an ensemble technique to improve the accuracy of the model. Ref. [20][29] also published malware detection using deep CNN in combination with image-based representations of malware samples.

3. Malware Detection and Family Classification

Ref. [21][15] proposed a hybrid framework for image-based malware classification based on space-filling curves. The authors argued that their approach could effectively capture the relationships between binary code and its associated graphical representation. Their approach used traditional ML algorithms to achieve high accuracy and efficiency, outperforming existing malware classification methods. Ref. [2] presented an enhancement for image-based malware classification using ML with low-dimension normalised input images to reduce the dimensionality of the data. However, as they mentioned, the normalisation process may not be suitable for all types of malware. Ref. [22][30], who had previously used CNN for malware feature extraction in [23][31], incorporated fuzzy logic in their approach by using convolutional fuzzy neural networks based on feature fusion and the Taguchi method. Ref. [24][32] presented a malware detection model that is based on Multi-Feature Fusion (MFF) and Histogram of Dynamic Binary Analysis (HDBA). The authors experimentally demonstrated its ability to detect malware variants and its high accuracy in comparison to existing methods with a detection time of 9.63 ms.
To compensate for “the problem of unpredictable truncation” caused by the use of different image widths in malware visualisation, Ref. [25][33] proposed a malware classification method based on a fusion of Efficient-Net and 1D-CNN. Their method exhibited high accuracy in detecting malware and the ability to classify different malware families. Ref. [26][34] introduced S-DCNN, an effective DL model which used image representation of binaries and combined transfer and ensemble learning. The model consisted of three CNNs (ResNet50, Xception, and EfficientNet-B4) and a multilayered perceptron as the final classifier. In their work, Ref. [27][35] presented a triplet neural network (NN) approach to learning similarities between vectors in the latent space. Their method used evolutionary optimisation and an ensemble network to return results by separating entangled representations. The technique demonstrated the segregation of Trojan-based structural features and demonstrated the lower segregation of Simda-typed data in the latent space. Ref. [28][36], who employed Generative Adversarial Networks (GANs) in their experiment, presented Iron-Dome, a multi-modal malware detection system. Iron-Dome secures IoT networked systems at runtime by classifying executable-based greyscale images.
The authors in [29][30][37,38] proposed a new method for malware detection that combined Markov images and transfer learning. The experiments were successful in terms of accuracy and speed but at the same time, they may require more computational resources compared to traditional ML approaches. Ref. [31][39] proposed the use of visualisation techniques along with the B2IMG and Gabor Filters as image augmentation methods and various CNN architectures to improve malware detection results. Their results were optimal for the B2IMG technique when combined with coloured images. Ref. [32][40] discussed the use of transfer learning in a EfficientNet3 CNN to classify malware efficiently. The authors utilised pre-trained deep NNs as a base model and fine-tuned them on malware data to achieve high accuracy. According to them, transfer learning proved that it can significantly reduce the training time and computational resources required. Ref. [33][41] concentrated on reducing the effort needed for malware data labelling. The authors suggested a technique for choosing a subset of data to serve as prototypes for representing the complete dataset and recommended the use of VGG16.
Ref. [34][42] presented a new approach to malware detection using greyscale binary visualisation and a hyperparameterised CNN. Their method achieved a high accuracy score, while they also provided the code for the training process. Ref. [35][43] proposed a multilayer DL approach for malware classification in 5G-enabled Industrial Internet of Things (IIoT) systems. The approach used a combination of CNN to detect malware, even in 5G systems, where malware attacks are becoming increasingly common, offering real-time connectivity. Ref. [36][44] focused on Windows malware detection by exploring the effectiveness and efficiency of the LightGBM algorithm, a gradient-boosting framework that uses tree-based learning algorithms. The authors claim that their approach was effective and fast, both in detecting and classifying malware. In their article, Ref. [37][45] experimented with various colour models in their visualisation technique to perform malware classification methods. All colour model techniques exhibited equally high performance when they were paired with feature extraction and the SVM model. Ref. [38][46] presented BinImg2Vec a method that transformed the binary code of malware samples into images and then used the data2vec framework to embed the images in a low-dimensional space. The embedded images were then used as inputs to a CNN for classification.
Ref. [39][47] proposed MalCNN, a new enhancement for malicious image classification using NNs. Even though the authors claim that their approach outperformed existing methods in terms of accuracy and efficiency, they did not provide a comprehensive comparison with other methods to validate this claim. Additionally, the article did not mention any limitations or potential drawbacks of the proposed method. Ref. [40][48] presented a static malware classification approach using greyscale image representation and lightweight CNNs that is suitable for IoT environments. Ref. [41][49] proposed a new method using a combination of automated transmutation and CNNs. However, these articles did not provide a detailed comparison of their proposed methods with other existing methods. Ref. [42][50] presented a lightweight CNN for image-based malware classification on embedded systems. The authors claim that their approach is effective in classifying malware on embedded systems because of its limited demands on computational resources. Ref. [43][51] used a Multi-Layer Perceptron (MLP) model. The authors evaluated their method on a dataset of malware images to show the efficiency of MLP, which was highly hyperparameterised. Ref. [44][52] attempted a malware classification method based on a lightweight architecture of CNN called MalShuffleNet, which is designed to reduce the computational cost of the model without sacrificing accuracy.
There were also several studies that incorporated the use of GANs in their methodology. Ref. [45][53] converted malware executables into images, which were then used to train a GAN for family classification. The authors used a variety of techniques for image generation and ML classification. The results favoured the combination of colourmap images and AC-GAN. Ref. [46][54] used Auxiliary-classifier GAN (AC-GAN) to generate malware images and then evaluate them. The authors found that fake malware images did not impose a threat to adversarial attacks; however, the models were able to successfully classify generated and real malware images. In their study, Ref. [47][55] used an augmentation model to boost the classification results of malicious images based on their method B1IMG. The CNN-based achieved superior results using RGB photos as opposed to grey images. Ref. [48][56] also used GANs to enrich samples from the same family and performed classification with CNN. Ref. [49][57] presented an analysis of the robustness of image-based malware analysis. More specifically, they compared gist descriptors to CNN that were trained directly on malware images to test their malware obfuscation resistance. According to their experiment, they discovered that gist descriptors were more reliable than CNNs.

4. Network Traffic Malware Detection/Classification

Ref. [50][58] proposed a DL-based model for malware traffic classification using PCAP (packet capture) data. The model converts PCAP data into images and uses image-based NN models to classify the malware traffic. The authors used vision transformers and CNN. Ref. [51][59] proposed a method for detecting IIoT malware. More specifically, they proposed an edge computing-based malware detection system that identifies malware by sending massive amounts of IIoT traffic data from smart factories to edge servers. By extracting relevant features from the network traffic images, they used DL techniques to identify malicious behaviour. Ref. [52][60] proposed an ML-based intrusion detection and response system. The system focused on network profiling and trained ML algorithms on historical attack data to detect intrusions in real time. Ref. [53][61] proposed a method that framed network flows as images and used image recognition algorithms to detect anomalies. The authors used federated learning to train the algorithms and evaluated the performance of their system on DDoS attack data.
Video Production Service