This systematic review discussed ML-based Android malware detection techniques. It critically evaluated 106 carefully selected articles and highlighted their strengths and weaknesses as well as potential improvements. The ML-based methods for detecting source code vulnerabilities were also discussed, because it might be more difficult to add security after the app is deployed. Therefore, this paper aimed to enable researchers to acquire in-depth knowledge in the field and to identify potential future research and development directions.
Malware detection in Android can be performed in two ways; signature-based detection methods and behaviour-based detection methods [39]. The signature-based detection method is simple, efficient, and produces low false positives. The binary code of the application is compared with the signatures using a known malware database. However, there is no possibility to detect unknown malware using this method. Therefore, the behaviour-based/anomaly-based detection method is the most commonly used way. This method usually borrows techniques from machine learning and data science. Many research studies have been conducted to detect Android malware using traditional ML-based methods such as Decision Trees (DT) and Support Vector Machines (SVM) and novel DL-based models such as Deep Convolutional Neural Network (Deep-CNN) [40] and Generative adversarial networks [41]. These studies have shown that ML can be effectively utilised for malware detection in Android [9].
Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks |
---|---|---|---|---|---|---|---|---|---|
2018 | [68] | Developing 3 level data purring method and applying ML models with SigPID | Manifest Analysis for Permissions | Google Play | NB, DT, SVM | SVM | 90% | High effectiveness and accuracy | Considered only the permission analysis which may lead to omit other important analysis aspects |
2021 | [69] | Analysing permission and training the model with identified ML algorithm | Manifest Analysis for Permissions | Google Play, AndroZoo, AppChina | RF, SVM, Gaussian NB, K-Means, | RF | 81.5% | The model was trained with comparatively different datasets | Did not consider other static analysis features such as OpCode, API calls, etc. |
2021 | [70] | Reducing dimension vector generation and based on that perform malware detection using ML models | Manifest Analysis for permissions | AMD, APKPure | MLP, NB, Linear Regression, KNN, C.4.5, RF, SMO | MLP | 96% | Efficiency, applicability and understandability are ensured | Hyper-parameter selections are not made in the use |
2021 | [71] | Selecting feature using dimensionality reduction algorithms and using Info Gain method | Manifest Analysis for permissions and intents | Drebin, Google Play | RF, NB, GB, AB | RF, NB, AB | RF-98%, NB-92%, AB-97% | Analysed the features as individual components and not as a whole | Did not consider about other features such as API calls, Opcode etc. |
2021 | [72] | Feature weighting with join optimisation of weight mapping with proposed JOWMDroid framework | Manifest Analysis for permission, Intents, Activities and Services | Drebin, AMD, Google Play APKPure | RF, SVM, LR, KNN | JOWM-IO method with SVM and LR | 96% | Improved accuracy and efficiency | Correlation between features were not considered |
Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks |
---|---|---|---|---|---|---|---|---|---|
2016 | [78] | Transforming malware detection problem to matrix model using Wxshall algo and extracting Smali codes and generated the API call graph using Androguard | Code analysis for API Calls and code instrumentation for network traffic | MalGenome | Custom build ML based Wxshall algorithm, Wxshall extended algorithm | Wxshall extended algorithm | 87.75% | Few false alarms | Required to expand the behaviour model and improve the efficiency |
2017 | [74] | Using the combination of system functions to describe the application behaviours and constructing eigenvectors and then using Androidetect | Code analysis for API calls and Opcodes | Google Play | NB, J48 DT, Application functions decision algorithm | Application functions decision algorithm | 90% | Can identify the instantaneous attacks. Can judge the source of the detected abnormal behaviour High performance in model execution | Did not consider some important static analysis features such as OpCode, API calls, etc. |
2018 | [39] | Using TinyDroid framework, n-Gram methods after getting the Opcode sequence from .smali after decompiling .dex | Code Analysis for Opcode | Drebin | NLP, SVM, KNN, NB, RF, AP | RF and AP with TinyDroid | 87.6% | Lightweight static detection system High performance in classification and detection | Malware samples were taken only from few research studies and some organisations which lack metamorphic malware samples |
2018 | [73] | Analysing Package level information extracted from API calls using decompiled Smali files | Code Analysis for API calls and Information flow | Drebin, Contagio, Google Play | DT, RF, KNN, NB | RF | 86.89% | Model performs well even when the length of the sequence is short | Other information contained in operands were not considered which affect to the overall model |
2016 | [77] | Using Deterministic Symbolic Automaton and Semantic Modelling of Android Attack | Code Analysis for Opcode/Byte code | Drebin | AB, C4.5, NB, LinearSVM, RF | RF | 97% | Use a combined approach of ML and DSA inclusion | Unable to detect new malware patterns since this will not perform complete static analysis |
2017 | [80] | Training Hidden Markov Models and comparing detection rates for models based on static data, dynamic data, and hybrid approaches | Code analysis for API calls and Opcode in static analysis and System call analysis | Harebot, Security Shield, Smart HDD, Winwebsec, Zbot, ZeroAccess | HMM | HMM | 90.51% | Check the difference approaches available to detect ML | Did not consider other ML algorithms or other important features |
2019 | [75] | Determining the apps call graphs as Markov chain Then obtaining API call sequences and using ML models with MaMaDroid | Code Analysis for API calls | Drebin, oldbenign | RF, KNN, SVM | RF | 94% | the system is trained on older samples and evaluated over newer ones | Requires a high memory to perform classification |
2019 | [76] | Calculating confidence of association rules between abstracted API calls which provides behavioural semantic of the app | Code Analysis for API calls | Drebin, AMD | SVM, KNN, RF | RF | 96% | Efficient feature extraction process Better stability of the system | Did not address the cases such as dynamic loading, native codes, encryption, etc. |
Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks |
---|---|---|---|---|---|---|---|---|---|
2017 | [81] | Using customized method named Waffle Director | Manifest Analysis for Sensitive permissions and API calls | Tencent, YingYongBao, Contagio | DT, Neural Network, SVM, NB, ELM | ELM | 97.06% | Fast Learning speed and Minimal human intervention | Combination of permissions and API calls are not refined |
2017 | [82] | Using a code-heterogeneity-analysis framework to classify Android repackaged malware by Smali code intermediate representation | Manifest Analysis for Intents, Permissions and API calls | Genome, Virus-Share, Benign App | RF, KNN, DT, SVM | RF with custom model proposed | FNR-0.35%, FPR-2.96% | Provide in-depth and fine-grained behavioural analysis and classification on programs | Detection issues can happen when the malware use coding techniques like reflection and cannot handle if the encryption techniques used in DEX |
2018 | [84] | Extracting features and transforming into binary vectors and training using ML with RanDroid Framework | Manifest Analysis for Permissions Code Analysis for API calls, opcode and native calls | Drebin | SVM, DT, RF NBs | DT | 97.7% | Highly accurate to analyse permission, API calls, opcode an native calls toward malware detection | Broadcast receivers, filtered intend, Control Flow Graph analysis, deep native code analysis were not considered |
2018 | [86] | Creating the binary vector, apply ML models, evaluate performance of the features and their ensemble using DroidEnsemble | Manifest analysis for permissions, code analysis for API calls and system calls analysis | Google Play, AnZhi, LenovoMM, Wandoujia | SVM, KNN, RF | SVM | 98.4% | Characterises the static behaviours of apps with ensemble of string and structural features. | Mechanism will fail if the malware contains encryption, anti-disassembly, or kernel-level features to evade the detection |
2019 | [83] | Extracting applications features from manifest while decompiling classes.dex into jar file and applying ML models | Manifest Analysis for permissions, activities and Code Analysis for Opcode | Drebin, playstore, Genome | KNN, SVM, BayesNet, NB, LR, J48, RT, RF, AB | RF with 1000 decision trees | 98.7% | High efficiency, Lightweight analysis and fully automated approach | Did not consider about the API calls and other important features when analysing the DEX. |
2019 | [85] | Using FlowDroid for static analysis and proposing TFDroid framework to detect malware using sensitive data flow analysis | Manifest Analysis for permission and Code Analysis for information flow | Drebin, Google Play | SVM | SVM | 93.7% | Analysed the functions of applications by their descriptions to check the data flow. | Did not consider the improving clustering techniques and applicability of other ML models |
Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks |
---|---|---|---|---|---|---|---|---|---|
2017 | [87] | Extracting the DNS, HTTP, TCP, Origin based features of the network used by apps | Network traffic analysis for network protocols | Genome | DT, LR, KNN, Bayes Network, RF | RF | 98.7% | Work with different OS versions, Detect unknown malware, and infected apps | If the malware apps using encrypted, not possible to detect malware properly |
2017 | [88] | Using Markov Chain-based detection technique, to compute the state transitions and to build transition matrix with 6thSense | System resources analysis for process reports and sensors | Google Play | Markov Chain, NB, LMT | LMT | 95% | Highly effective and efficient at detecting sensor-based attacks while yielding minimal overhead | Tradeoffs such as frequency accuracy, battery frequency are not discussed which can affect the malware detection accuracy |
2017 | [89] | Using Dynamic based permission analysis using a run-time and detect malware using ML calculate the accuracy | Code instrumentation analysis Java classes and dynamic permissions | Pvsingh, Android Botnet, DroidKin | NB, RF, Simple Logistic, DT K-Star | Simple Logistic | 99.7% | High Accuracy | Need to address the app crashing issue in the selected emulators in dynamic analysis |
2019 | [90] | Using dynamically tracks execution behaviours of applications and using ServiceMonitor framework | System call analysis | AndroZoo, Drebin and Malware Genome | RF, KNN, SVM | RF | 96.7% | High accuracy and high efficiency | Not detecting difference in some system calls of malware and benign apps since signature based verification was not applied |
2020 | [91] | Extracting the features and permissions from Android app. Performing feature selection and proceed to classification with DATDroid | System call analysis, Code instrumentation for network traffic analysis and System resources analysis | APKPure, Genome | RF, SVM | RF | 91.7% | High efficiency | Impact from features like HTTP, DNS, TCP/IP patterns are not considered |
2021 | [92] | Using decompilation, model discovery, integration and transformation, analysis and transformation, event production | Code instrumentation for java classes, intents | AMD | ML algorithms used in MEGDroid, Monkey, Droidbot | MEGDroid | 91.6% | Considerably increases the number of triggered malicious payloads and execution code coverage | System calls are not monitored |
Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML algorithms/Models | Selected ML algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks |
---|---|---|---|---|---|---|---|---|---|
2017 | [96] | Using a set of Python and Bash scripts which automated the analysis of the Android data. | Manifest analysis for permissions and System call analysis for dynamic analysis | Andrototal | NB, DT | DT | 80% | Model execution is efficient | Consider system call appearance rather than frequency and Lower number of samples used to train |
2018 | [95] | Using Binary feature vector and permission vector datasets were created using the analysis techniques and was used with the ML algorithms | Manifest analysis for permissions and system call analysis | Drebin | RF, J.48, NB, Simple Logistic, BayesNet TAN, BayesNet K2, SMO PolyKernel, IBK, SMO NPolyKernel | RF | Static-96%, Dynamic-88% | Compared with several ML algorithms | Accuracy depends on the 3rd party tool (Monkey runner) used to collect features. |
2019 | [94] | Preparing a JSON file after reverse engineering, decompiling, and analysing the APK by running in a sandbox environment and then extracting the key features and applied ML | Manifest analysis for permissions, code analysis for API calls and System call analysis | MalGenome, Kaggle, Androguard [79] | SVM, LR, KNN, RF | LR for static analysis and RF for dynamic analysis | Static-81.03%, Dynamic-93% | Dynamic analysis performed was better than the static analysis approach in terms of detection accuracy | Did not perform a proper hybrid analysis approach to increase the overall accuracy |
2017 | [99] | Using import term extraction, clustering and applying genetic algorithm with MOCODroid | Code analysis for API calls and information flow and system call analysis | Virus-total, Google Play | Genatic algorithm, Multiobjective evolutionary algorithm | Multiobjective evolutionary classifier | 95.15% | Possible to avoid the effects of the concealment strategies | Did not consider about other clustering methods. |
2020 | [97] | Extracted 261 combined features of the hybrid analysis with using the support of datasets and performed the ML/DL models | Manifest analysis for permissions and system call analysis | MalGenome, Drebin, CICMalDroid | SVM, KNN, RF, DT, NB, MLP, GB | GB | 99.36% | Hybrid analysis is having higher accuracy comparing to static analysis and dynamic analysis individually | Runtime environment and configuration is not considered |
2020 | [98] | Using Conditional dependencies among relevant static and dynamic features. Then trained ridge regularised LR classifiers and modelled their output relationships as a TAN | Manifest analysis for permissions, code analysis for API calls and system call analysis | Drebin, AMD, AZ, Github, GP | TAN | TAN | 97% | Highly accurate | Possibility of some malwares remain undetected |
2021 | [100] | Using exploit static, dynamic, and visual features of apps to predict the malicious apps using information fusion and applied Case Based Reasoning (CBR) | Manifest analysis for permissions and System call analysis | Drebin | CBR, SVM, DT | CBR | 95% | Require limited memory and processing capabilities | Require to present the knowledge representation to address some limitations |
Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML/DL Algorithms/Models | Selected DL Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks |
---|---|---|---|---|---|---|---|---|---|
2017 | [104] | Using n-Gram methods after getting the Opcode sequence from .smali after dissembling .apk | Code Analysis for Opcodes | Genome, IntelSecurity, MacAfee, Google Play | CNN, NLP | Deep CNN | 87% | Automatically learn the feature indicative of malware without hand engineering | Assumption of all APKs are benign in Google Play dataset while all are malicious in malware dataset |
2021 | [108] | Using DL based method which uses Convolution Neural Network based approach to analyse features | Code Analysis for API calls, Opcode and Manifest Analysis for Permission | Drebin, AMD | CNN | CNN | 91% and 81% on two datasets | Reduce over fitting and possible to train to detect new malware just by collecting more sample apps | Did not compared with other ML/DL methods |
2018 | [102] | Applying LSTM on semantic structure of bytecode with 2 layers of detection and validating with DeepRefiner | Code Analysis for Opcode/bytecode | Google Play, VirusShare, MassVet | RNN, LSTM | LSTM | 97.4% | High efficiency with average of 0.22 s to the 1st layer and 2.42 s to the 2nd layer detection | Need to train the model regularly to update the training model on new malware |
2020 | [105] | Detecting Malware attributes by vectorised opcode extracted from the bytecode of the APKs with one-hot encoding before apply DL Techniques | Code Analysis for Opcode | Drebin, AMD, VirusShare | BiLSTM, RNN, LSTM, Neural Networks, Deep Convents, Diabolo Network model | BiLSTMs | 99.9% | Very high accuracy, Able to achieve zero day malware family without overhead of previous training | Did not analyse complete byte code |
2020 | [106] | Using DynaLog to select and extract features from Log files and using DL-Droid to perform feature ranking and apply DL | Code instrumentation analysis for java classes, intents, and systems calls | Intel Security | NB, SL, SVM, J48, PART, RF, DL | DL | 99.6% | Experiments were performed on real devices High accuracy | Could have implemented the intrusion detection part also to make it more comprehensive malware detection tool |
2021 | [101] | Selecting features gained by feature selection approaches. Applying ML/DL models to detect malware | Code instrumentation for java classes, permissions, and API calls at the runtime | Android Permissions Dataset, Computer and security dataset | farthest first clustering, Y-MLP, nonlinear ensemble decision tree forest, DL | DL with methods in MLDroid | 98.8% | High accuracy and easy to retrain the model to identify new malware | Human interaction would be required in some cases. Can contain issues in the datasets |
2021 | [107] | Characterising apps and treating as images. Then constructing the adjacency matrix. Then applying CNN to identify malware with AdMat framework | Code Analysis for API calls, Information flow, and Opcode | Drebin AMD | CNN | CNN | 98.2% | High Accuracy and efficiency | Performance is depending on number of used features |
Year | Study | Code Analysis Method | Approach | Used ML/DL Methods/Frameworks | Accuracy of the Model |
---|---|---|---|---|---|
2017 | [127] | Dynamic Analysis | Collected 9872 sequences of function calls as features. Performed dynamic analysis with DL methods | CNN-LSTM | 83.6% |
2017 | [133] | Hybrid Analysis | Decompiled the apk file. Performed static analysis of the manifest file to obtain the components/permissions. Dynamic analysis and fuzzy testing were conducted and obtained system status. | AB and DT | 77% |
2019 | [115] | Hybrid Analysis | Reverse engineered the APK, Decoded the manifest files & codes and extracted meta data from it. Performed dynamic analysis to identify intent crashing and insecure network connections for API calls. Generated the report. | AndroShield | 84% |
2020 | [124] | Hybrid Analysis | Performed intelligent analysis of generated AST. Checked ML can differentiate vulnerable and nonvulnerable. | MLP and a customised model | 70.1% |
2017 | [113] | Static Analysis | Generated the AST, navigated it, and computed detection rules. Identified smells when training with manually created dataset. | ADOCTOR framework | 98% |
2017 | [128] | Static Analysis | Combined N-gram analysis and statistical feature selection for constructing features. Evaluated the performance of the proposed technique based on a number of Java Android programs. | Deep Neural Network | 92.87% |
2019 | [129] | Hybrid Analysis | Decompiled the APK and selected the features and executed the APK and generated log files with system calls. Generated the vector space and trained with ML algorithms as parallel classifiers. | MLP, SVM, PART, RIDOR, MaxProb, ProdProb | 98.37% |
2020 | [121] | Hybrid Analysis | In static analysis, vulnerabilities of SSL/TLS certification were identified. Results from static analysis about user interfaces were analysed to confirm SSL/TLS misuse in dynamic analysis. | DCDroid | 99.39% |
2021 | [122] | Static Analysis | 32 supervised ML algorithms were considered for 3 common vulnerabilities: Lawofdemeter, BeanMemberShouldSerialize, and LocalVariablecouldBeFinal | J48 | 96% |
2021 | [123] | Static Analysis | Classified malicious code using a PE structure and a method for classifying it using a PE structure | CNN | 98.77% |
This entry is adapted from the peer-reviewed paper 10.3390/electronics10131606