Efficient Fingerprinting Attack on Web Applications: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , ,

Website fingerprinting is valuable for many security solutions as it provides insights into applications that are active on the network.

  • network traffic
  • adaptive symbolization
  • PHMM

1. Introduction

Being able to automatically associate a portion of network traffic to a particular web application is desired by network administrators or attackers. With the growth in the usage of end-to-end encryption protocols (such as SSL/TLS), attackers can not inspect the content of communications. However, traditional encryption obscures only the content but does not hide information such as the traffic volume and direction. This allows an attacker to exploit information leaked by the side channel, such as the packet length, timing, and order.
Recent studies have proposed a number of potential solutions to analyze encrypted traffic. A proposed framework [1] monitors network traffic between users and network resources to identify the associated web application. Many machine learning algorithms (e.g., random forest) and deep learning methods, such as convolutional neural networks (CNNs), are used to uncover what applications are running on users’ smartphones [2][3][4] or what webpages/websites users are visiting [5][6][7]. Among them, the technology that identifies webpages/websites from network traffic is referred to as Website Fingerprinting Attack (WFA).
Despite many WFA methods having been proposed, previous studies primarily focused on fingerprinting individual webpages (most existing methods simply refer to homepages as representative webpages) to identify whether users have accessed a monitored website. They usually ignore sequence visits, such as webpage transitions via clicking hyperlinks. However, for most websites, users often follow hyperlinks to carry out their actions. For example, users follow hyperlinks to read/post blogs on a social forum.
Identifying a web application via interaction patterns is practically significant. It is not a difficult job to create a web application today since there are many ready-made templates to choose from. A website builder named Wix [8] provides different types of templates (ranging from e-business, and album, to social forum ones, and so on) and publicizes that customers can create a website in just four steps without any coding skills. Reports [9] surfaced that a police officer provided a source code seized during the investigation of a case to other criminals to create a new gambling website and illegally obtain huge profits. In reality, even if a gambling website is targeted by law enforcement officers, criminals may modify its appearance (e.g., the website title and pictures) and rebuild a new one easily. In order to block these slightly modified illegal websites, an approach that can detect a template-based web application is needed.
Intuitively, web applications that derive from the same template may share similar functional logic.A web application is often designed to provide users with different capabilities, i.e., users can perform certain actions. Criminals might modify the appearance of an illegal website to avoid punishment, but they cannot change those capabilities.

2. Website Fingerprinting Attack

The purpose of a Website Fingerprinting Attack (WFA) is to infer which websites/webpages are visited by users. This type of analysis can reveal the privacy of a user (e.g., interests, habits, sexual and political orientations). WFA was first carried out by Cheng and Avnur [10] in 1998. They demonstrated that the SSL protocol can not address traffic analysis attacks. WFA turns to be a hot research topic in recent years, and many machine-learning techniques have been proven to be very effective.
A work published in 2012 [11] was the first demonstration that application-level defenses, such as HTTPOS and randomized pipe-lining, are not secure. The authors modeled websites using Hidden Markov Models (HMMs), where each state corresponds to a page or a class of pages of the site. To simplify the model, they created it with states corresponding to page templates rather than individual pages. According to their approach, an attacker can construct a HMM for each target website and use the forward algorithm to compute the log-likelihood that a given packet trace would be generated by a user visiting the target website. However, it is not a trivial thing to build a HMM model for a website.
Hayes and Danezis [12] did a systematic analysis of feature importance and filled the gap of a notable absence of feature analysis in the website fingerprinting literature. They proposed the k-fingerprinting attack based on random decision forests and enabled attackers to infer which web page a client is browsing through encrypted or anonymized network connections. They demonstrated that Tor hidden services are easily distinguished from standard web pages, rendering them vulnerable to Website Fingerprinting Attacks.
FLOWPRINT [4] is a semi-supervised mobile-app fingerprinting prototype. The authors observe that mobile apps are composed of different modules that often communicate with a relatively invariable set of network destinations. This property is leveraged to discover patterns in the network traffic. Fingerprints are created based on temporal correlations among network flows between monitored devices and their destinations.
Zhuo and Zhang et al. [13] proposed a website-modeling method based on PHMM; they took advantage of the first tab and the second tab hidden relationship to improve accuracy in identifying a particular website instead of identifying web pages separately.

3. User Action Identification

User action identification has been extensively treated in the domain of personal mobile devices. Apps leverage the Wi-Fi and cellular network of mobile devices to send and receive data. Users perform several actions while interacting with apps and generate data transmissions. The network traffic sequence of a given action typically follows a pattern that depends on the nature of the user–app interaction of that action. These patterns can be used to recognize specific user actions related to a particular app of interest in generic network traces [14].
Conti and Mancini [15] proposed a framework to infer which particular actions the user executes on some apps installed on her mobile phone. Dynamic Time Warping and Random Forest were used to measure the similarity between traffic sequences and classify unseen traffic traces, respectively. The authors considered seven popular apps with different purposes from the official Android market to assess their approach’s performance and showed that the accuracy and precision were higher than 95%.
Similar to [15], Fu and Xiong investigated how to exploit encrypted Internet traffic for classifying in-App usages. They developed a system named CUMMA for classifying usages of mobile messaging Apps by jointly modeling user behavioral patterns, network traffic characteristics (packet length and time delay), and temporal dependencies [16]. In their work, traffic flows were segmented into sessions with a number of dialogs; then, the dialogs were classified into single-type usages or outliers. A clustering Hidden Markov Model-based method was used to detect mixed dialogs from outliers to sub-dialogs or single-type usage. Experiments on WhatsApp and WeChat demonstrated the effectiveness and efficiency of their proposed method.

4. Other Related Works

A few previous papers are notable for using different techniques on similar problems. He and Yang [17] selected features such as burst volumes and directions to represent the application behaviors and leveraged PHMM to model different types of applications (Web, FTP, P2P, and IM) on Tor. Their experimental results demonstrated that PHMM is quite good at modeling network traffic.
Network traffic analysis technology has been extended to the mobile smart home equipment research field. PINGPONG [18] automatically extracts the fingerprints from network traffic generated by the smart home devices and recognizes their actions (such as turning on or off the light). Similarly, HoMonit [19] analyzes the network traffic generated by smart home devices to determine the actions performed on the home device applications. Li and Feng et al. [20] proposed generating fine-grained fingerprints based on the subtle differences between the file systems of various firmware images. They applied the natural language processing technique to process the file content and used the document object model to obtain the firmware fingerprint. Using this fingerprinting approach, they were able to recognize firmware on the Internet. However, their approach has to interact actively with the firmware, thus is easy to be detected.
Network traffic analysis has also been extended to intelligent software testing. In work [21], an automated penetration-testing framework is built to detect vulnerability through traffic analysis. Pyshark is used to capture the traffic in IoT devices’ four different states (booting, mobile application interaction, firmware mode, and offline mode). Then, ‘tshark’ is used to read the .pcap files and check for vulnerabilities such as insecure firmware, lack of transport encryption, and insecure network services. Similar to [20], this approach also interacts actively with the firmware.

This entry is adapted from the peer-reviewed paper 10.3390/electronics12132948

References

  1. Ionescu, P.; Keirstead, J.; Onut, I.; Wilson, D. Automatic Traffic Classification of Web Applications and Services Based on Dynamic analysis. U.S. Patent No. 10,542,025, 21 January 2020.
  2. Tayor, V.F.; Conti, R.; Martinovic, I. Appscanner: Automatic fingerprinting of smartphone Apps from encrypted network traffic. In Proceedings of the 1st IEEE European Symposium on Security and Privacy, Saarbruecken, Germany, 21–24 March 2016; pp. 439–454.
  3. Faik, A.H.; Jasleen, K. Can Android applications be identified using only TCP/IP headers of their launch time traffic. In Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks, Darmstadt, Germany, 18–20 July 2016; pp. 61–66.
  4. van Ede, T.; Bortolameotti, R.; Continella, A.; Ren, J.; Dubois, D.J.; Lindorfer, M.; Choffnes, D.; van Steen, M.; Peter, A. FLOWPRINT: Semi-supervised mobile-app fingerprinting on encrypted network traffic. In Proceedings of the 27th Network and Distributed Systems Security (NDSS) Symposium, San Diego, CA, USA, 23–26 February 2020; pp. 1–18.
  5. Wang, T.; Cai, X.; Nithyanand, R.; Johnson, R.; Goldberg, I. Effective attacks and provable defenses for website fingerprinting. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, 20–22 August 2014; pp. 143–157.
  6. Shen, M.; Liu, Y.; Zhu, L.; Du, X.; Hu, J. Fine-Grained Webpage Fingerprinting Using Only Packet Length Information of Encrypted Traffic. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2046–2059.
  7. Sirinam, P.; Imani, M.; Juarez, M.; Wright, M. Deep fingerprinting: Undermining website fingerprinting defenses with deep learning. In Proceedings of the ACM Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 1928–1943.
  8. Wix. Available online: https://www.wix.com/ (accessed on 1 March 2021).
  9. Sina News Report. Available online: https://tech.sina.com.cn/i/2018-10-16/doc-ihmhafir8971738.shtml (accessed on 1 March 2021).
  10. Cheng, H.; Avnur, R. Traffic Analysis of SSL Encrypted Web Browsing; University of Berkeley: Berkeley, CA, USA, 1998; pp. 1–12.
  11. Cai, X.; Zhang, X.; Joshi, B.; Johnson, R. Touching from a distance: Website fingerprinting attacks and defenses. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, Raleigh, NC, USA, 16–18 October 2012; pp. 605–616.
  12. Hayes, J.; Danezis, G. K-fingerprinting: A robust scalable website fingerprinting technique. In Proceedings of the 25th USENIX Security Symposium, Austin, TX, USA, 10–12 August 2016; pp. 1187–1203.
  13. Zhuo, Z.; Zhang, Y.; Zhang, Z.-L.; Zhang, X.; Zhang, J. Website Fingerprinting Attack on Anonymity Networks Based on Profile Hidden Markov Model. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1081–1095.
  14. Conti, M.; Li, Q.Q.; Maragno, A.; Spolaor, R. The Dark Side(-Channel) of Mobile Devices: A Survey on Network Traffic Analysis. IEEE Commun. Surv. Tutor. 2018, 20, 2658–2713.
  15. Conti, M.; Mancini, L.V.; Spolaor, R.; Verde, N.V. Analyzing Android encrypted network traffic to identify user actions. IEEE Trans. Inf. Forensics Secur. 2016, 11, 1556–6021.
  16. Fu, Y.; Xiong, H.; Lu, X.; Yang, J.; Chen, C. Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps. IEEE Trans. Mob. Comput. 2016, 15, 2851–2864.
  17. He, G.; Yang, M.; Luo, J.; Gu, X. A novel application classification attack against Tor. Concurr. Comput. Pract. Exp. 2015, 27, 5640–5661.
  18. Trimananda, R.; Varmaken, J.; Markopoulou, A. Packet-level signatures for smart home devices. In Proceedings of the 26th Network and Distributed System Security Symposium, San Diego, CA, USA, 24–27 February 2019; pp. 1–18.
  19. Zhang, W.; Meng, Y.; Liu, Y. Homonit: Monitoring smart home apps from encrypted traffic. In Proceedings of the 25th ACM SIGSAC Conference on Computer and Communications, Toronto, ON, Canada, 15 October 2018; pp. 1074–1088.
  20. Li, Q.; Feng, X.; Wang, R.; Li, Z.; Sun, L. Towards fine-grained fingerprinting of firmware in online embedded devices. In Proceedings of the IEEE conferences on computer communications, Honolulu, HI, USA, 16–19 April 2018; pp. 2537–2545.
  21. Akhilesh, R.; Bills, O.; Chilamkurti, N.; Chowdhury, M.J.M. Automated Penetration Testing Framework for Smart-Home-Based IoT Devices. Futur. Internet 2022, 14, 276.
More
This entry is offline, you can click here to edit this entry!
ScholarVision Creations