Enterprise client-server" backup software describes a class of software applications that back up data from a variety of client computers centrally to one or more server computers, with the particular needs of enterprises in mind. They may employ a scripted client–server backup model with a backup server program running on one computer, and with small-footprint client programs (referred to as "agents" in some applications) running on the other computer(s) being backed up—or alternatively as another process on the same computer as the backup server program. Enterprise-specific requirements include the need to back up large amounts of data on a systematic basis, to adhere to legal requirements for the maintenance and archiving of files and data, and to satisfy short-recovery-time objectives. To satisfy these requirements (which World Backup Day (31 March) highlights), it is typical for an enterprise to appoint a backup administrator—who is a part of office administration rather than of the IT staff and whose role is "being the keeper of the data". In a client-server backup application, the server program initiates the backup activity by the client program. This is distinct from a "personal" backup application such as Apple's Time Machine, in which "Time Machine runs on each Mac, independently of any other Macs, whether they're backing up to the same destination or a different one." If the backup server and client programs are running on separate computers, they are connected in either a single platform or a mixed platform network. The client-server backup model was originated when magnetic tape was the only financially-feasible storage medium for doing backups of multiple computers onto a single archive file;[note 1][note 2] because magnetic tape is a sequential access medium, it was imperative (barring "multiplexed backup") that the client computers be backed up one at a time—as initiated by the backup server program. What is described in the preceding paragraph is the "two-tier" configuration (in one application's diagram, the second-tier backup server program is named "server" preceded by the name of the application, and first-tier "agents" are backing up interactive server applications). That configuration controls the backup server program via either an integrated GUI or a separate Administration Console. In some client-server backup applications, a "three-tier" configuration splits off the backup and restore functions of the server program to run on what are called media servers—computers to which devices containing archive files are attached either locally or as Network-attached storage (NAS). In those applications, the decision on which media server a script is to run on is controlled using another program called either a master server or an optional central admin. server.
1. Performance
The steady improvement in hard disk drive price per byte has made feasible a disk-to-disk-to-tape strategy, combining the speed of disk backup and restore with the capacity and low cost of tape for offsite archival and disaster recovery purposes.[1] This, with file system technology, has led to features suited to optimization such as:
- Improved disk-to-disk-to-tape capabilities
- Enable automated transfers to tape for safe offsite storage of disk archive files that were created for fast onsite restores.[2][3][4]
- Create synthetic full backups
- For example, onto tapes from existing disk archive files—by copying multiple backups of the same source(s) from one archive file to another. The second archive file is typically created in part to satisfy legal retention requirements, and may intentionally omit some backups—either because there is no need to retain them or because retaining them would violate regulations such as the European GDPR Right to erasure. Therefore one application can exclude[5] files and folders from the synthetic full backup.[6] This is termed a "synthetic full backup" because, after the transfer, the destination archive file contains the same data it would after a full backup of the non-excluded data.[2][7][8]
- Automated data grooming
- Frees up space on disk archive files by removing out-of-date backup data—usually based on an administrator-defined retention period.[1][2][9][10][11][12][13] One method of removing data is to keep the last backup of each day/week/month for the last respective week/month/specified-number-of-months, permitting compliance with regulatory requirements.[14] One application has a "performance-optimized grooming" mode that only removes outdated information from an archive file that it can quickly delete.[15] This is the only mode of grooming allowed for cloud archive files and is also up to 5 times as fast when used on locally stored disk archive files. The "storage-optimized grooming" mode reclaims more space because it rewrites the archive file and in this application also permits exclusion compliance with the GDPR "right of erasure"[16] via rules[5]—that can instead be used for another type of filtering.[17]
- Multithreaded backup server
- Capable of simultaneously performing multiple backup (first paragraph of that article says it's one un-hyphenated word), restore, and copy operations in separate "activity threads" (once needed only by those who could afford multiple tape drives).[18][19][20] In one application, all the categories of information for a particular "backup server" are stored by it; when an "Administration Console" process is started, its process synchronizes information with all running LAN/WAN backup servers.[21]
- Block-level incremental backup
- The ability to back up only the blocks of a file that have changed, a refinement of incremental backup that saves space[22][23][24] and may save time.[18][25] Such partial file copying is especially applicable to a database.
- "Instant" scanning of source volumes
- Uses the USN Journal on Windows NTFS and FSEvents on macOS (for non-APFS source volumes only) to reduce the time of the scanning phase[16] on both incremental backups, thus fitting more sources into the scheduled backup window,[18][26][27] and on restores.[28]
- Cramming or evading the scheduled backup window
- Some applications have the "multiplexed backup" capability of cramming the scheduled backup window by sending data from multiple clients to a single tape drive simultaneously;[29] "This is useful for low-end clients with slow throughput ... [that] cannot send data fast enough to keep the tape drive busy .... will reduce the performance of restores."[19] Another application allows an enterprise that has computers transiently connecting to the network over a long workday to evade the scheduled window by using Proactive scripts.
2. Source File Integrity
- Backing up interactive applications via pausing
- Interactive applications can be protected by having their services paused while their live data is being backed up, and then unpaused.[30] Alternatively, the backup application can back up a snapshot initiated at a natural pause.[6][16] Some enterprise backup applications accomplish pausing and unpausing of services via built-in provisions—for many specific databases and other interactive applications—that become automatically part of the backup software's script execution; these provisions may be purchased separately.[6][31][32] However, another application has also added "script hooks" that enable the optional automatic execution—at specific events during runs of a GUI-coded backup script—of portions of an external script containing commands pre-written in a standard scripting language.[16] For some databases—such as MongoDB and MySQL—that can be run on filesystems that do not support snapshots, the external script can pause writing during backup.[16] Since the external script is provided by an installation's backup administrator, its code activated by the "script hooks" may accomplish not only data protection—via pausing/unpausing interactive services—[16]but also integration with monitoring systems.[33]
- Backing up interactive applications via coordinated snapshots
- Some interactive applications such as databases must have all portions of their component files coordinated while their live data is being backed up. One database system—PostgreSQL—can do this via its own "snapshotting" MVCC running on filesystems that do not support snapshots, and can—therefore—be backed up without pausing using an external script containing commands that use "script hooks".[16] Another equivalent approach is to use some filesystems' capability of taking a snapshot and to back up the snapshot without pausing the application itself. An enterprise backup application using filesystem snapshotting can be used either to back up all user applications running on a virtual machine[34][35] or to back up a particular interactive application that directly uses its filesystem's snapshot capability.[6] Conceptually this approach can still be considered client-server backup; the snapshotting capability by itself constitutes the client, and the backup server runs as a separate process that initiates (second paragraph) and then reads the snapshot on the machine that generated it. The software installed on each machine to be backed up is referred to as an "agent"; if "agents" are being used to back up all user applications running on a virtual machine, one or more such "agents" are controlled by a console.[35][36]
- Backing up interactive applications via true Continuous Data Protection
- Continuous Data Protection (CDP), also called continuous backup was patented by British entrepreneur Pete Malcolm in 1989, "a backup system in which a copy [editor's emphasis] of every change made to a storage medium is recorded as the change occurs [editor's emphasis]".[37] In an ideal case of continuous data protection, the recovery point objective—"the maximum targeted period in which data (transactions) might be lost from an IT service due to a major incident"—is zero, even though the recovery time objective—"the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity"—is not zero.[38] An example of a period in which data transactions might be lost is a major discount chain having card readers at its checkout counters shut down at multiple locations for close to two hours in the month of June 2019.
- Because it allows restoring data to any point in time, "[true] CDP is the gold standard—the most comprehensive and advanced data protection. But 'near CDP' technologies can deliver enough protection for many companies with less complexity and cost."[39] Because "near-CDP does this [copying] at pre-set time intervals",[40] it is essentially incremental backup initiated by a timer instead of a script.
- True CDP differs from RAID, replication, or mirroring by enabling rollback to any point in time. However, because true CDP "backup write operations are executed at the level of the basic input/output system (BIOS) of the microcomputer in such a manner that normal use of the computer is unaffected",[37] true CDP backup must in practice be run in conjunction with a virtual machine[41][42] or equivalent[43]—ruling it out for ordinary personal backup applications.
- True CDP uses journaling. It intent-logs every change on the host system,[40] often by saving byte or block-level differences rather than file-level differences.[44][45] This backup method differs from simple disk mirroring[45] in that it enables a roll-back of the log and thus a restoration of old images of data. Intent logging allows proper precautions for the consistency of live data, so that captured changes can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, databases, and logs.[46]
- When real-time edits—especially in multimedia and CAD design environments—are backed up offsite over the upstream channel of the installation's broadband network,[47] network bandwidth throttling[48] may be needed to reduce the impact of true CDP.[47]
3. User Interface
To accommodate the requirements of a backup administrator who may not be part of the IT staff with access to the secure server area, enterprise client-server software may include features such as:
- Administration Console
- The backup administrator's backup server GUI management and near-term reporting tool.[49] Its window shows the selected backup server, with a standard toolbar on top. A sidebar on the left or navigation bar shows the clickable categories of backup server information for it; each category shows a panel, which may have a specialized toolbar below or in place of the standard toolbar. The built-in categories include activities—thus providing monitored backup, past backups of each source, scripts/policies/jobs (terminology depending on the application), sources (directly/indirectly), archive files, and storage devices.[33][50][51]
- User-initiated backups and restores
- These supplement the administrator-initiated backups and restores which backup applications have always had, and relieve the administrator of time-consuming tasks.[52] The user designates the date of the past (not necessarily the last) backup from which files or folders are to be restored—once IT staff has mounted the proper volume(s) of the relevant archive file on the backup server.[1][33][53][54]
- High-level/medium-term reports supplementing the Administration Console[49]
- Within one application's Console panel displayed by clicking the name of the backup server itself in the sidebar, an activities pane on the top left of the displayed Dashboard has a moving bar graph for each activity going on for the backup server together with a pause and stop button for the activity. Three more backup validation panes give the results of activities in the past week: backups each day, sources backed up, and sources not backed up; as of 2019 the last two panes—together with failed backups—are summarized in an additional color-coded bullseye pane.[55] Finally, a storage reporting pane has a line for each archive file, showing the last-modified date and depictions of the total bytes used and available;[22][33] as of 2019, this is supplemented by a pane that gives a linear-regression prediction for growth of each archive file.[55] For the application's Windows variant, the Dashboard acts as a display-only substitute for a non-existent Console[6]—but was upgraded in 2019 into an optional two-way Web-based Management Console.[16] Other applications have a separate reporting and monitored backup facility that can cover multiple backup servers.[56][57]
- E-mailing of notifications about operations to chosen recipients[49]
- Can alert the recipient to, e.g., errors or warnings, including extracts of logging to assist in pinpointing problems.[6][56][58]
- Integration with monitoring systems[49]
- Such systems provide longer-term backup validation. One application's administrators can deploy custom scripts that—invoking webhook code via script hooks—populate such systems as the freeware Nagios and IFTTT and the freemium Slack with script successes and failures corresponding to the activities category of the Console, per-source backup information corresponding to the past backups category of the Console, and media requests.[33] Another application has integration with two of the developer's monitoring systems, one that is part of the client-server backup application and one that is more generalized.[56] Yet another application has integration with a monitoring system that is part of the client-server backup application[59] but can also be integrated with Nagios.[60]
4. LAN/WAN/Cloud
- Advanced network client support
- All applications include support for multiple network interfaces.[18][61][62] However one application, unless deduplication is done by a separate sub-application between the client and the backup server, cannot provide "resilient network connections" for machines on a WAN.[63] One application can extend support to "remote" clients anywhere on the Internet for a Proactive script and user-initiated backups/restores.[16]
- Cloud seeding and Large-Scale Recovery
- Because of a large amount of data already backed up,[18] an enterprise adopting cloud backup likely will need to do "seeding". This service uses a synthetic full backup to copy a large locally-stored archive file onto a large-capacity disk device, which is then physically shipped to the cloud storage site and uploaded.[64][65] After the large initial upload, the enterprise's backup software may facilitate reconfiguration for writing to and reading from the archive file incrementally in its cloud location.[66] The service may need to be employed in reverse for faster large-scale data recovery times than would be possible via an Internet connection.[64] Some applications offer seeding and large-scale recovery via third-party services, which may use a high-speed Internet channel to/from cloud storage rather than a shippable physical device.[67][68]