1000/1000
Hot
Most Recent
This article addresses the challenges of a digital cloud environment when it comes to global scalability with a large number of remote servers and an unsecure public cloud, such as Amazon. The goal of the study was to work out an approach for evaluating the reasonable system capacity under heavy workload. For that purpose, Zabbix monitoring solution is used and business metrics are applied in relation to existing system ones. A prediction data model is proposed to compute the future forecast of the user activity based on the collected historical statistics and to verify whether capacity adjustment is possible or not. The results of capacity planning are implemented at Genesys International Telecommunications Company. System analysis of the production environment indicates the possibility to downscale the capacity of certain virtual servers, which allowed savings to the annual operational costs of $3500 (50%) for each affected server.
Fast growing internet information technologies (IT) based on globally distributed cloud computing (GCC) have allowed radical improvements in the efficiency of internet services as well as capacity and in the performance of virtual machines (VMs). Big international IT companies deploy thousands of VMs with production software in the cloud to provision multiple services for their customers worldwide.
GCC implementation and operational support has both benefits and challenges, as the following:
These challenges are described in detail and proper solutions are defined in the sections below.
A deep analytical overview of the advanced telecommunication technologies and the top 10 cyber threats in the networks, including the attack mechanisms, vulnerability[8] and damage assessment, the object of the attacks, counter measures, modeling and prediction of cyber resistance, and the possibility of how to organize and coordinate the preventive actions, are provided in[9][10].
The article[11] is an overview of the relevant research on the key threats in cloud cyber security, such as data breaches or loss, account or service traffic hijacking, insecure interfaces and applications of IoT, distributed denial of service (DDoS), malicious insiders, the abuse of cloud services, insufficient due diligence, and other shared technology vulnerabilities. In addition, the authors of the article show the dependency of cyber threats with the security controls and compliance models based on the corresponding standards. The performed theoretical research is implemented as a practical system to identify threats and protecting actions.
The article[12] is the analytical study of various factors, which influence the cyber security on particular cloud platforms. The following ways of possible violation are pointed out:
For example, unauthorized access to a host would result in not only V violation but also in data manipulation (D violation) and discrediting of sensitive data as well (C violation). In another example, SQL injection to a DB, especially during data replication or backup (D violation), may cause an unacceptable data modification on the DB server side (I violation). In case of a very dangerous DDoS attack against a frontend server (both V and D violation), a cloud computing service will be entirely unavailable for internet users (A violation).
In terms of the monitoring of cloud computing resources, an analytical overview of the top five worldwide popular solutions is presented in [7]. International scientific conferences on facing the widespread problem of big data computing are addressed in [2].
Paper [13] proposes a recognition system in the IoT area to monitor real-time data, particularly on smartphones. Data fusion from multiple mobile sensors, including audio, Internet localization, etc., allows analysis of a person’s life patterns. This is helpful for taking care of either children or older adults.
Using predictive models in GCC monitoring systems is considered in [14][15], including the well-known problem of Java memory leak and auto-remediation of Java-based services running on a virtualized cloud environment based on the 4R options—restart, reboot, redirect, redeploy.
The novelty of this study is in the evaluation of VM capacity and cyber security when moving from a private corporative cloud (on premise) to a public one. The public cloud is much cheaper but less secure and, therefore, additional actions should be done to meet the cyber security policy.
Big international IT companies have similar globally distributed infrastructure and provide cloud services in multiple geographic regions worldwide[16][17] (Figure 1). A region is usually named by a three letter abbreviation of the nearest international airport, for example SJC—San Jose California, USA; YYZ—Toronto’s main airport, Canada; AMS—Amsterdam, the Netherlands, SIN—Singapore; SYD—Sydney, Australia, etc.
Figure 1. Typical globally distributed cloud computing (GCC) infrastructure and data flows.
Every region is built of at least two data centers (DCs)—a primary unit and a standby one (Figure 2). Such redundancy is designed for the purpose of workload switchover in case of a local incident or an entire DC outage. Each unit supports a certain part of data (POD), including client account database (ADB) and a set of cloud service applications interconnected with the shared message system storage (MSS) for storing call prompts, DB queries, application logs, and other system events. Regular replication of ADB data between the DCs is configured to perform a switchover faster when initiated. For example, RingCentral International Telecommunications Company uses Oracle as a DB and Golden Gate as a third-party solution for optimized ADB synchronization based on triggers and data transaction logs[18].
Figure 2. Example of regional GCC architecture with two data centers (DCs)[18].
To build a cloud IT environment, the three basic architecture models are practically used—private (on premises), public, and hybrid (Figure 3). On premises, the entire IT infrastructure is allocated in corporate DCs, giving full control over computing resources. Public cloud solutions allow IT companies to exclude the expenses for maintaining their own DCs. A hybrid cloud is an integrated solution with an attempt to inherit the advantages of both private and public models.
Figure 3. Basic cloud architecture models.
The key trend of GCC popularity among people and IT companies in the world is moving from private towards public clouds. Microsoft and Gartner analytics compiled in Table 1 shows that the growth of all public cloud services almost doubled over the last five years[19][20].
Table 1. Gartner analytics of growth trend for main cloud services.
World Market, $ billion |
2016 |
2017 |
2018 |
2019 |
2020 |
Software as a Service (SaaS) |
38.57 |
46.33 |
55.14 |
64.87 |
75.73 |
Platform as a Service (PaaS) |
7.17 |
8.85 |
10.62 |
12.58 |
14.8 |
Infrastructure as a Service (IaaS) |
25.29 |
34.6 |
45.56 |
57.9 |
71.55 |
Load Balancer as a Service (LBaaS) |
90.26 |
104.52 |
118.52 |
133.57 |
151.1 |
Database as a Service (DBaaS) |
40.81 |
43.77 |
47.56 |
51.65 |
56.18 |
IT Security |
7.15 |
8.77 |
10.43 |
12.16 |
14 |
The advantages of the public cloud are obvious:
In spite of the big benefits, the public cloud technologies have certain challenges, and cyber security is one of them. Personal sensitive information is stored on the external public network maintained by a vendor and is transferred via common internet channels. Tokenization of secret information, data encryption during transfer, and other approaches [1] are good ways to reduce the risks. The other threats associated with the public cloud are shown in Table 2.
Table 2. The risks associated with the public cloud.
Risk |
Risk Management |
External fraud |
Review the company’s security policy and protect the data |
Access issues |
Monitor health check to ensure the cloud service is stable |
Internal fraud |
Ask external cloud provider to track and log the user's activities |
Vendor failure |
Need alternative cloud vendor in case of a primary outage |
Cloud outage |
Need backup network channel in case of a primary failure |
This section describes the analysis of the specific GCC resources provided by Amazon Web Services (AWS) as an example, but the same approach can be applied to any other cloud provider, such as Microsoft Azure, IBM Cloud Services, Red Hat CloudForms, VMware Cloud Foundation, and other worldwide known leaders[21].
AWS is one of the leading public cloud vendors in the IT world, having a wide variety of VMs to implement all the main IT techniques as a service like SaaS, PaaS, IaaS, LBaaS, DBaaS, etc. VMs are available in AWS as a selection of various VM instance types and sizes, allowing flexible scaling of CPU, memory, storage, and network to the required workload of the applications (Table 3). Migration from one instance size to another is possible at any moment and is very easy—just select a proper type and restart the VM. If a misbalance of computing resources is needed, there are special VM types accelerated and optimized for computing, memory, or storage.
Table 3. Amazon Web Services (AWS) instances of the M4 type with optimized storage[21].
VM Instance |
Virtual CPUs |
Memory, GB |
Storage Performance |
VM Bandwidth, Mbps |
Network Performance |
m4.large |
2 |
8 |
Optimized |
450 |
Moderate |
m4.xlarge |
4 |
16 |
Optimized |
750 |
High |
m4.2xlarge |
8 |
32 |
Optimized |
1000 |
High |
m4.4xlarge |
16 |
64 |
Optimized |
2000 |
High |
m4.10xlarge |
40 |
160 |
Optimized |
4000 |
10 Gigabit |
m4.16xlarge |
64 |
256 |
Optimized |
10,000 |
25 Gigabit |
AWS pricing policy depends on the cloud region, the operating system (OS) installed, and the status of a VM—reserved, running, stopped, or terminated. Table 4 shows a pricing example for AWS instances of the M4 type listed in Table 3.
Table 4. Prices for AWS instances of the M4 type running operating system (OS) Linux/Unix in the US East region[22].
VM Instance |
Virtual CPUs |
Memory, GB |
Storage Performance |
Storage Volume, GB |
Price per Hour for OS Linux |
m4.large |
2 |
8 |
Optimized |
1 4 SSD |
$0.1 |
m4.xlarge |
4 |
16 |
Optimized |
1 x 8 SSD |
$0.2 |
m4.2xlarge |
8 |
32 |
Optimized |
1 x 16 SSD |
$0.4 |
m4.4xlarge |
16 |
64 |
Optimized |
1 x 32 SSD |
$0.8 |
m4.10xlarge |
40 |
160 |
Optimized |
2 x 40 SSD |
$2.0 |
m4.16xlarge |
64 |
256 |
Optimized |
2 x 80 SSD |
$3.2 |
Tables 3 and 4 show that in case of migration from any AWS instance to the nearest one, the computing resources of both CPU and memory are doubled and the price doubles as well.
Although VM types in the public cloud can be scaled up and down very quickly, it should be done during planned maintenance to avoid service outage. For the purpose of capacity planning and performance evaluation, the monitoring system is needed. In this paper, Zabbix enterprise-class system is considered as one of the leading monitoring solutions in the world, having many built-in possibilities including[23][24][25]:
Zabbix supports a wide scope of built-in system metrics and triggers to monitor CPU, memory, disk space, etc., but business specific items should be implemented manually. Figures 4 and 5 show examples of such business items, measuring the incoming and outgoing user traffic at RingCentral and Genesys Companies accordingly[16][17].
Figure 4. Cyclic user workload with a stable prediction trend.
Figure 5. Cyclic user workload with a downgrading prediction trend.
User workload is usually cyclic but cannot be predicted using polynomial forecasting models due to Runge’s problem of oscillation for polynomial models of higher degrees[26] (Figure 6). Instead, a linear model is applied for capacity evaluation where only maximum values in peak time are taken into account. As a result, the prediction looks more stable and precise.
Figure 6. Runge’s phenomenon of oscillation for polynomial models of 3 and higher degrees – the abscissa shows the degree of the model and the ordinate shows the error of the results in percentage, giving the same accuracy results and extra calculations are not required; red, blue and green curves is a polynomial models of corresponding degree
The same approach can be applied to any monitoring tool other than Zabbix. The purpose of the prediction data model is to analyze the workload trend and verify whether VM downscale is possible or not. If the workload is stable (Figure 4) or slowing down (Figure 5), the downscale is reasonable, otherwise it is risky and possible that upscaling will be needed soon.
The proposed monitoring and forecasting models are implemented at Genesys International Telecommunications Company[16]. Production VMs of the customers with relatively low user workload are analyzed using Zabbix monitoring and prediction system. Some experimental results are presented in Figures 7 and 8.
Figure 7. Monitoring statistics for user workload.
Figure 8. Monitoring statistics for system resources.
Statistics in Figure 6 shows that the users (agents) are not as active as configured (20 logged on agents as maximum out of 75 configured that is less than 30%). System statistics in Figure 8 confirms less than 20% of CPU utilization even in peak time. Memory consumption is about 5.55 GB and is stable. In Table 3, all the instances of M4 type have at least 8 GB memory, which would be enough for running cloud services. Therefore, the VM instance could be scaled down without a risk of system overload.
Similar calculations are verified for some more VMs and the results are put into pivot Table 5. CPU utilization is computed given multiple cores. Price savings are converted from “$ per hour” to “$ per year”, taking into account the same VM type for primary and backup servers and multiple regions for some customers, using the following formulas:
Price per year = Price per hour * 24 * 365; |
|
Price saving per year = Price saving per hour * 24 * 365 * 2 backups * 2 regions. |
|
Table 5. Evaluation of AWS instances downscale and price savings.
AWS Region/VM |
# of Confi- |
Max # of Logged on Agents |
Max/avg CPU Usage per Core, % |
Max Memory Usage, % |
AWS Instance Type |
Recom- |
Price Savings, $ per Year |
euw1/vmp-400 |
75 |
20 |
87/25 |
40 |
m3.large |
m3.medium |
$3504 |
euw2/vmp-400 |
75 |
0 |
0/0 |
0 |
m3.large |
m3.medium |
$3504 |
use1/vmp-220 |
414 |
121 |
95/25 |
50 |
m3.large |
m3.medium |
$3504 |
euw1/vmp-080 |
1854 |
610 |
95/50 |
50 |
m3.xlarge |
m4.large |
$3504 |
The estimated price saving is $3504 per year for each affected VM. If evaluating the whole GCC infrastructure of a low loaded customer, the other VMs most probably could also be reduced in size and reach the total price saving up to 50%, that is about $13K per year (Figure 9).
Figure 9. AWS prices statistics by months for the whole GCC of a customer.
Some instances could probably be downscaled twice depending on the actual monitoring statistics. Reducing the size of VMs is based on objective, historical statistics and safety. If the workload increased at some point, the triggers of the monitoring system will detect this and VM could be upsized back at any moment.