Ransomware-Resilient Self-Healing XML Documents: Comparison
Please note this is a comparison between Version 1 by Ahmed S. Shatnawi and Version 2 by Amina Yu.

The cybersecurity threat would inherently cause substantial financial losses and time wastage for affected organizations and users. A great deal of research has taken place across academia and around the industry to combat this threat and mitigate its danger. These ongoing endeavors have resulted in several detection and prevention schemas.  The self-healing version-aware ransomware recovery (SH-VARR) framework for XML documents is based on the novel idea of using the link concept to maintain file versions in a distributed manner while applying access-control mechanisms to protect these versions from being encrypted or deleted.

  • ransomware
  • XML documents
  • secure document engineering self-healing

1. Introduction

The progression of cybercrime and the development and adoption of new techniques to jeopardize sensitive information and impart damage across the Internet present an alarming threat to businesses, governments, and nations. Recent cybersecurity research (for example.g., the works in [1][2][3][4][5][6] [1,2,3,4,5,6]) confirms cybercriminals’ determination to develop newer techniques for achieving their malicious objectives. Ransomware is just one of the methods that have been used recently by cybercriminals to achieve financial gains in return for releasing ransomware-encrypted files to their rightful owners. Ransomware attacks represent a real security threat to users’ data files and various network resources that would contain backup files. Amongst others, a conservative estimate is that ransomware criminals received USD 412 million in payments in 2020 [7]. Ransomware attacks impact individuals and organizations in the public and private sectors, including, amongst many, the health sector, e-commerce, educational institutions, government agencies, and the business sectors, in a manner that leads to economic and moral loss. In 2017, the WannaCry Ransomware [8], a recent massive Ransomware attack, impacted up to 300,000 users in 150 countries worldwide, preventing them from accessing their devices and demanding Bitcoin payments in exchange for unlocking the files involved.
With an ever-increasing rate of storing and sharing data, document security is becoming one of the biggest challenges that faces both individuals and organizations. Here, digital documents are represented in many formats, one of the most popular of which includes the Extensible Markup Language (XML). When Ransomware attacks victims’ machines, it will seek to lock or encrypt users’ crucial files and documents, including XML-based documents such as “.docx” and “.odt” file types.
Since 2010, the rate of infection by Ransomware has increased significantly. This growing threat has received significant attention from both academia and industry. Many research studies have intensely served to analyze Ransomware and develop new techniques to detect it, as long as it considers backup. However, a significant portion of all proposed detection techniques claims to have a high detection success rate. Nonetheless, most detection and protection systems in use have several limitations.
It was addressed the problem of recovering XML documents once a ransomware attack has taken place. ItWe was proposed a self-healing version-aware XML recovery framework to combat Ransomware to achieve this goal. The proposed framework takes advantage of the structure of XML documents and combines link-based version control with well-known access-control mechanisms.
The Version-Control System (VCS) manages all the changes made to documents, including tracking and storing versioning data. VCS will be tapped into by presenting a novel approach directed at recovering ransomware-infected XML-based files and documents. Version-Aware XML-based documents are part of a distributed version-control system that does not rely on a central repository but refers to the document file itself in tracking each subsequent version of a document.

2. Ransomware

Ransomware is defined as a form of malware that prevents users from accessing their resources and files either by encryption or blockage until a ransom is rendered to restore access to infected files. It provides a means for money-based extortion that affects both individuals and organizations [9][11]. It is a piece of software designed and implemented by cybercriminals to gain access to legitimate users without their knowledge and to perform malicious activities such stealing sensitive data and asking for a ransom. Due to a lack of proper technical background with little knowledge of how to preserve their data, short of making necessary file backups, some users, especially naive ones, end up paying ransom to restore access to their files. This ultimately leads cybercriminals and attackers to gain more significant revenues and helps to make this an opportunity for thriving businesses [10][12]. In 1989, the first ransomware attack was reported when infected floppy disks with AIDS Trojan were distributed amongst biologists. The malware encrypted all the victims’ system files with a ransom of USD 189 to undo the damage. The earliest variants of Ransomware were developed in 1980 [11][13]. Ransom was paid via postal mail. Today, ransomware weauthores ordered that payment is rendered via credit cards or cryptocurrency such as bitcoin [12] [14]. In recent years there has been an increasing proliferation rate of different types of ransomware families that are spread like a worm, which involve advanced recovery-prevention schemes. This impacts home users, organizations, and the infrastructures of vital governmental establishments around the world [9][11]. WannaCry and Petaya [8] are examples of recent Ransomware which spreads through insecure compromised websites, exploiting weaknesses inherent in Microsoft Windows. On 12 May 2017, WannaCry was first observed as part of massive attacks over multiple countries [13][15]. These attacks affected many vital sectors, including government organizations and the healthcare and telecommunications sectors. WannaCry is an example of crypto Ransomware that is based on public-key cryptography; something that is rather challenging to mitigate or recover from, as the encryption keys are stored on a remote command and control server (C&C). In the following subsections, it wase explained the ransomware lifecycle and main ransomware categories:

2.1. Ransomware Lifecycle

The authors of [16] [14] analyzed 25 ransomware families and found that they all possess similar dynamics. They differ somewhat, however, according to the ransomware versions in place, but exhibit a similar overall high-level pattern. In general, the ransomware lifecycle spans the following six steps [14][16]:
  • Ransomware distribution: Like other malicious software programs, Ransomware uses social-engineering strategies to seduce victims to click links that lead to ridiculous content or download a malicious dropper or payload that causes infection.
  • Infection: The malicious code is downloaded at this stage, and the execution of the code begins. At this stage, a victim’s machine will have been compromised by Ransomware, with the underlying files still not yet encrypted. Encryption is a reversible process, involving highly intensive CPU calculations operations. Encryption does not readily happen in a typical ransomware attack as it requires time for data evaluation by the malware and the scope for data encryption. Once this stage becomes active, all the automatic detection systems will have stopped. The firewall, proxy, antivirus, and intrusion detection programs will have been compromised to allow all malicious communications to take place, ultimately putting the ransomware in total control.
  • C2 Communications: The malicious code continues to maintain access to its command-and-control server (C2) at this stage. Here, an attacker manages a C2 server and begins to send commands to the compromised system. The primary C2 communications objective with Ransomware entails the acquisition of an encryption key. Once that is complete, different systems are changed, and persistence is determined.
  • File search-scanning: This is when things start to slow down a bit. The malware searches the computer to find files to encrypt first. It also scans for cloud data that are synced through folders and shown as local data. Then it starts searching for file shares. This may take time, depending on how much activity there is across the network. The goal is to examine the available information and determine the victim’s level of permissions (for example.g., list, published, delete).
  • Encryption: The encryption starts once all data have been inventoried. Local file encryption may take minutes, but it may take several hours to encrypt a network file; this is because data on network file shares are locally copied and encrypted in most ransomware attacks. Then this is followed by uploading the encrypted files and removing the original ones. This phase takes a bit of extra time.
  • Ransom demand: At this stage, a victim will receive a ransom message instructing them to render ransom; the Ransomware message is issued immediately once encryption has taken place. The Ransomware shows a screen that instructs its victim to pay before criminals delete the key to decrypt the files. The last function usually performed by Ransomware is to end and uninstall itself from a victim’s machine. At this point, the hackers are ready to receive the ransom to their Bitcoin wallet.

2.2. Ransomware Categories

Ransomware falls under three main categories ranging from severe to damaging: Scareware, Locker Ransomware, and crypto Ransomware. Table 1 summarizes these categories. Scareware is a form of malicious software that overwhelms users’ screens with warnings and pop-ups claiming that issues are detected on the users’ PC and it requires money to fix them. If the victim falls in for this trick and installs the malware on their machines, the cybercriminal/s would use this malware to access their files, send out fake emails in their names, and/or track their online activity. Locker Ransomware is malicious software that infects the operating system and prevents users from accessing their files and data. It hijacks one or more of the victim’s system services, such as desktops, smartphones, and applications, depriving users of those tools from accessing them [9][11]. This attack usually takes the form of a locking computer interface asking the user to pay a ransom for re-access. Often, infected computers are left with limited capabilities to allow the user to communicate with ransomware and conduct-related activities to pay the requested ransom. For example, W32. Rasith is a worm that locks the victim’s desktop, making the system unusable [15][17]. This type is not limited to PCs or servers alone, but it also affects mobile devices. Android.Lockdroid.H is an example of a trojan that locks the screen of mobile devices and displays a ransom message [15][17]. Since Locker ransomware is designed to prevent access to the device’s interface, the underlying system and files are left untouched. It is possible to restore the computer to a state close to its original condition. Thus, Locker ransomware is less effective at eliciting ransom payments. Although cryptography is regarded as a critical defense mechanism in computer and network applications [16][18], it can also be used to perform crypto crimes. The work in [17][19] is one of the earliest onresearch studies on fraudulent cryptographic use. What distinguishes Ransomware from conventional malware is that it utilizes cryptography techniques, including symmetric and asymmetric key-based encryption, against victims, as discussed in [18] [20]. This type is the most common type of Ransomware. It is the most harmful type and can cause a great deal of damage, thereby extorting vast amounts of money. This type of Ransomware is considered the most dangerous because once the attacker gets hold of the files, there is no way to restore them until a ransom is rendered for file restoration. Here, WannaCry [8] is one famous example. Crypto ransomware encrypts victims’ files, file contents, and file names without notification by utilizing different cryptographic methods and notifies victims that their data have been encrypted, forcing them to pay a ransom to decrypt files [10][12]. Since 2016, crypto Ransomware attacks have increased dramatically. According to [19]a report by [21], 58.43% of ransomware attacks are conducted by a crypto Ransomware strain called TeslaCrypt. CTB-Locker was considered one of the primary ransomware attacks in 2016. CTB-Locker can attack multiple victims at the same time. Thus, during the same attack, it can extort several victims. This infects web servers by encrypting webroot, causing web servers, host applications, and websites to become paralyzed [19][21].

3. Version-Control System (VCS)

Version-control systems (VCS) are used to manage all changes made to documents, including tracking and storing version data. In this paper, VCS will be tapped into by presenting a novel approach to recovering XML documents affected when Ransomware attacks victims’ machines, causing locking of file encryption. Version-Aware XML-based documents is a distributed version-control system that does not rely on a central repository but refers to the document file to utilize the changes between different versions of the same document. version-control is a system used for tracking all files or file set changes over time to allow for the subsequent release of a specific version of the file so that you can obtain a specific version of the file later. As VCS became popular, new techniques continued to evolve. It uses two main techniques to store versions of data. The first one is to keep a copy of each new version of the file, while the second one would keep only the deltas, which are the data differences between the two versions of the file. There are two major version-control types: centralized and distributed. A centralized version-control system is based on client–server architecture where a central repository is used to store the document versions. Centralized VCS must be used online as it requires the end-user (client) to be connected to the system (central repository) at all times. Using this approach makes it possible to elicit single points of failure [20][22]. A distributed version-control system, also known as Version-Aware XML document (used in theour approach) was first introduced in [20] [22]. In contrast to centralized VCS, version–aware VCS does not depend on a central repository to store versions data. It utilizes reverse deltas stored inside the document file itself, which are the data differences between the two versions of a file, rather than storing the whole document every time. By using Version-Aware XML document technology, users are not worried about the need to use a repository or network connection to remote servers. LibreOffice documents (ODT) are XML schemas that store files, styles, and settings. The [21]authors of [23] created a Custom Microsoft Word plugin to support Version-Aware XML documents technology. Revisions of the document content are stored as a separate copy (snapshot) in a sub-directory inside the document. Shatnawi et al. [22][23][24,25] proposed a secure framework for XML documents that improves security for XML documents and their provenance and provides persistent integrity, detects tampering, and provides tools for performing forensics by utilizing version-aware XML document technology. Their approach provides an extensive document history with author signatures at each step, which also enhances the performance when applying security policies applied to documents.
Video Production Service