Malware Detection Method for with ViT Attention Mechanism

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Jeonggeun Jo	--	1465	2023-07-21 07:40:58	\|
2	format	Jason Zhu	Meta information modification	1465	2023-07-24 03:49:09	\|

This entry is adapted from the peer-reviewed paper 10.3390/app13116839

Artificial intelligence (AI) is increasingly being utilized in cybersecurity, particularly for detecting malicious applications. However, the black-box nature of AI models presents a significant challenge. This lack of transparency makes it difficult to understand and trust the results. In order to address this, it is necessary to incorporate explainability into the detection model. There is insufficient research to provide reasons why applications are detected as malicious or explain their behavior.

explainable artificial intelligence (XAI) deep learning cybersecurity mobile malware

1. Introduction

Mobile security threats that target Android devices are constantly evolving and becoming more sophisticated. Using Android malware, cybercriminals can steal sensitive information, disrupt device use, and compromise user privacy ^[1].

Among the many efforts to detect malicious applications (app), many studies have demonstrated the effectiveness of deep learning methods ^[2]. Recently, studies using image-based malware detection models have been increasing ^[3]. This method of detecting malicious applications by expressing binary as an image enables more accurate detection by applying advanced technology to image processing ^[4]. Additionally, this method can quickly generate training data because it processes the data in a way that does not require feature engineering. With these advantages, a more accurate and efficient malicious application detection method can be built.

However, a deep learning-based malware detection model does not explain the reason for detecting an application as malicious. This poses severe problems when integrating artificial intelligence (AI) into cybersecurity ^[5], reducing human users’ trust in the model and making it difficult for users to understand the process behind the results ^[6]. To address these issues, some studies interpret the model’s decision basis as images or strings ^[7]^[8]^[9]. Unfortunately, in many cases, a model is designed for interpretability rather than detection accuracy, and interpretation methods are usually complex. As malware evolves rapidly, a methodology that provides both accuracy and interpretability while requiring minimal updates or modifications to models or input data is needed to continuously respond to malware ^[10]. A method with a simple structure is needed for this purpose. In addition, a means of explaining the interpretation of these detection models should be designed with the needs and preferences of users in mind. That is, it should be provided in a form that is easy for users to understand, such as in text format ^[11].

2. Malware Detection

Android malware detection has been extensively studied and broadly divided into signature-based and artificial intelligence-based methods ^[12].

Zhang et al. ^[13] obtained features through a static analysis of the AndroidManifest.xml and Android Dalvik executable (DEX) file. They generated four different feature sets: permission, intent filter, API call, and string, and proposed a convolutional neural network (CNN)-based model for malicious app detection by creating a vector of features through a feature embedding model.

Wang et al. ^[14] created a hybrid model using a deep autoencoder and convolutional neural network to detect malicious applications. They used seven categories of static features: requested permissions, intent, restricted API calls, hardware functions, code-related patterns, and suspicious API calls. The total number of extracted all individual features was 34,570. Among them, 413 features were used after filtering. Two variant CNN-based models, CNN-S and CNN-P, were used to detect malicious apps.

Ren et al. ^[15] presented two methods for processing classes.dex files into fixed-size sequences and using them as input to a deep learning model. This method does not limit the input file size, does not require feature engineering, and consumes few resources.

Hsien-De Huang and Kao ^[16] mapped the bytecode of classes.dex to RGB color to create fixed-size color images that revealed visual patterns in malware from the same family. The inception-V3 model detected malware with high accuracy, and the grayscale image was as effective as the color.

Daoudi et al. ^[17] used grayscale images from DEX file bytecodes to detect malware with a CNN model, achieving high accuracy. Image size did not significantly impact performance, and obfuscated apps were also effectively detected.

Freitas et al. ^[4] constructed MALNET-IMAGE, a dataset consisting of over one million malicious application images, providing a valuable resource for research into malicious apps. Using this MalNet dataset, detection performance was evaluated using CNN-based models such as ResNet, DenseNet, and MobileNet.

Yadav et al. ^[18] presented an EfficientNet-B4 CNN-based method for Android malicious app detection, wherein the DEX file was transformed into an image and used as model input. This approach demonstrated superior malicious app detection performance compared to pre-trained models such as ResNet, InceptionV3, and DenseNet.

These influential studies in the field of Android malicious app detection each employ unique approaches, ranging from static analysis and feature extraction to complex deep learning models. Studies focusing on image-based malware detection have demonstrated impressive performance, leveraging the latest CNN-based models.

3. Malware Detection Interpretation

XMal, proposed by Wu et al. ^[9], is a method for detecting malicious applications and generating descriptions of malicious behavior using an attention mechanism based on a multi-layer perceptron (MLP). Their model generated human-readable descriptions of malware behavior using API calls and permissions as features. It included an attention layer and MLP and used a pre-built semantic database of highly impactful features for detection. However, XMal prioritizes highly weighted features, but may not cover all malicious behavior, while its focus on interpretability may compromise detection accuracy.

Deep learning techniques can visualize important image features, making them helpful in interpreting the results of image-based malware detection models.

Iadarola et al. ^[7] used images to identify common patterns among malware of the same type. They used gradient-weighted class activation mapping (Grad-CAM) to visually present the model’s results to security professionals. They used average Euclidean distance to compare heatmap images of similar malware types, finding similar shapes and enabling security experts to identify patterns in these types without prior knowledge of the samples. One area of improvement is that the interpretation provided to security experts is a heatmap of a binary image; thus, it is not an image that humans can easily understand.

Yakura et al. ^[19] proposed a method of extracting essential byte sequences from malware to make manual analysis more efficient. Based on attention mechanisms and CNNs, they showed that by applying attention maps to binary data, and thus it was possible to identify features or locations of these data that characterize the type of malware.

4. Vision Transformer

A ViT is a Transformer Encoder-based model for image classification that is highly scalable and performs well on large datasets with fewer training resources than CNN-based models ^[20]. Self-attention is an essential mechanism for ViT, which enables ViT to learn large image datasets very accurately and effectively ^[21]. One of the most valuable things about ViT is that self-attention makes it easy to recognize where the model is focusing on in the input data. This interpretability is a crucial advantage of ViT compared to other deep learning models and is particularly useful in applications where transparency and explainability are essential ^[22]. Overall, self-attention is a suitable method for the ViT to achieve high accuracy in various computer vision tasks and provide a transparent and intuitive way to interpret its inner workings ^[23].

One of the methods that can be used to compute the attention map is Attention Rollout ^[24]. The Attention Rollout method can be applied to a ViT to generate a heatmap showing the areas identified as critical in the ViT model.

In CNN-based models, Grad-CAM is often used to generate heatmaps. Grad-CAM improves on the traditional class activation map (CAM) method and has the advantage that it can be applied to visualizations without modifying the model ^[25]. The CAM method relies on the last convolutional layer, the Global Average Pooling layer, and gradient values to produce a heatmap highlighting critical regions ^[26].

Some research points out that Attention Rollout may be more efficient in explaining ViT decisions than previous XAI techniques, such as CNN’s Grad-CAM. Both Attention Rollout and Grad-CAM aim to provide insight into the decision-making process of a deep neural network. Attention Rollout provides a more accurate and detailed visual description of ViT’s predictions ^[27]. However, it should be noted that the effectiveness of these visualization methods depends on a number of factors, such as the complexity of a given dataset and the specific task.

5. Android DEX File

The Dalvik Virtual Machine (DVM) runs code that has been converted to the DEX file format. DEX files contain data about the source code of the application. The DEX file contains crucial information to run Android apps but is not human-readable. It has sections such as header, string_ids, and type_ids. The data section contains bytecode and string data stored in a format specific to each element. DEX decompiler tools allow for the Java source code to be obtained by reorganizing the data in a DEX file into Java code format. The tools used to decompile DEX include jadx ^[28], dex2jar ^[29], and apktool ^[30]. Even without a decompiler tool, the source code and related information can be obtained by parsing the DEX file following Google’s dex format documentation.

References

Šembera, V.; Paquet-Clouston, M.; Garcia, S.; Erquiaga, M.J. Cybercrime specialization: An exposé of a malicious Android Obfuscation-as-a-Service. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Vienna, Austria, 6–10 September 2021; pp. 213–226.
Liu, K.; Xu, S.; Xu, G.; Zhang, M.; Sun, D.; Liu, H. A review of android malware detection approaches based on machine learning. IEEE Access 2020, 8, 124579–124607.
Liu, Y.; Tantithamthavorn, C.; Li, L.; Liu, Y. Deep learning for android malware defenses: A systematic literature review. ACM Comput. Surv. 2022, 55, 1–36.
Freitas, S.; Duggal, R.; Chau, D.H. MalNet: A Large-Scale Image Database of Malicious Software. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 3948–3952.
Gerlings, J.; Shollo, A.; Constantiou, I. Reviewing the need for explainable artificial intelligence (xAI). arXiv 2020, arXiv:2012.01007.
Perarasi, T.; Vidhya, S.; Ramya, P. Malicious vehicles identifying and trust management algorithm for enhance the security in 5G-VANET. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 15–17 July 2020; pp. 269–275.
Iadarola, G.; Martinelli, F.; Mercaldo, F.; Santone, A. Towards an interpretable deep learning model for mobile malware detection and family identification. Comput. Secur. 2021, 105, 102198.
Kinkead, M.; Millar, S.; McLaughlin, N.; O’Kane, P. Towards explainable CNNs for Android malware detection. Procedia Comput. Sci. 2021, 184, 959–965.
Wu, B.; Chen, S.; Gao, C.; Fan, L.; Liu, Y.; Wen, W.; Lyu, M.R. Why an android app is classified as malware: Toward malware classification interpretation. ACM Trans. Softw. Eng. Methodol. TOSEM 2021, 30, 1–29.
Zhang, Z.; Hamadi, H.A.; Damiani, E.; Yeun, C.Y.; Taher, F. Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research. arXiv 2022, arXiv:2208.14937.
Liu, H.; Yin, Q.; Wang, W.Y. Towards explainable NLP: A generative explanation framework for text classification. arXiv 2018, arXiv:1811.00196.
Alzahrani, N.; Alghazzawi, D. A review on android ransomware detection using deep learning techniques. In Proceedings of the 11th International Conference on Management of Digital EcoSystems, Limassol, Cyprus, 12–14 November 2019; pp. 330–335.
Zhang, Y.; Yang, Y.; Wang, X. A novel android malware detection approach based on convolutional neural network. In Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, Guiyang, China, 16–18 March 2018; pp. 144–149.
Wang, W.; Zhao, M.; Wang, J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Ambient. Intell. Humaniz. Comput. 2019, 10, 3035–3043.
Ren, Z.; Wu, H.; Ning, Q.; Hussain, I.; Chen, B. End-to-end malware detection for android IoT devices using deep learning. Ad Hoc Netw. 2020, 101, 102098.
Hsien-De Huang, T.; Kao, H.Y. R2-d2: Color-inspired convolutional neural network (cnn)-based android malware detections. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, DC, USA, 10–13 December 2018; pp. 2633–2642.
Daoudi, N.; Samhi, J.; Kabore, A.K.; Allix, K.; Bissyandé, T.F.; Klein, J. Dexray: A simple, yet effective deep learning approach to android malware detection based on image representation of bytecode. In Proceedings of the Deployable Machine Learning for Security Defense: Second International Workshop, MLHat 2021, Virtual, 15 August 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 81–106.
Yadav, P.; Menon, N.; Ravi, V.; Vishvanathan, S.; Pham, T. EfficientNet convolutional neural networks-based Android malware detection. Elsevier Comput. Secur. 2022, 115, 102622.
Yakura, H.; Shinozaki, S.; Nishimura, R.; Oyama, Y.; Sakuma, J. Malware analysis of imaged binary samples by convolutional neural network with attention mechanism. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, Tempe, AZ, USA, 19–21 March 2018; pp. 127–134.
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929.
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. CSUR 2022, 54, 1–41.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 2017, 30.
Zhao, H.; Jia, J.; Koltun, V. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020.
Abnar, S.; Zuidema, W. Quantifying attention flow in transformers. arXiv 2020, arXiv:2005.00928.
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626.
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NA, USA, 27–30 June 2016; pp. 2921–2929.
Chefer, H.; Gur, S.; Wolf, L. Transformer interpretability beyond attention visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 782–791.
JADX. Available online: https://github.com/skylot/jadx (accessed on 7 January 2023).
dex2jar. Available online: https://github.com/pxb1988/dex2jar (accessed on 7 January 2023).
Apktool. Available online: https://ibotpeaches.github.io/Apktool/ (accessed on 7 January 2023).

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Jeonggeun Jo

Jaeik Cho

Jongsub Moon

View Times: 251

Update Date: 24 Jul 2023

Table of Contents

Video Upload Options

Confirm