Cloud computing has emerged as a major disruptive technology in recent
years. With the outbreak of COVID-19 worldwide, telecommuting has become
mainstream, which further promotes the development of Cloud services. There
has also been a corresponding increase in the number of enterprises looking to
migrate some or all of their IT systems to the Cloud.
However, the service model of Cloud computing is unique that hardware
resources are shared. The sharing manner of multi-tenancy and network leads to
the performance uncertainty of Cloud resources. The performance uncertainty
makes it difficult for applications that require quality assurance, i.e., quality
critical applications, to migrate to the Cloud. For example, most of the scientific
workflow applications, such as disaster early warning systems, need to complete
most of their tasks in a specific time window, otherwise the results would be
meaningless. Hence, an accurate Cloud performance uncertainty model is crucial
for optimized resource allocation to satisfy the quality requirements of quality
critical applications.
Current efforts on Cloud resources performance uncertainty modeling can be
divided into two ways. One is to model the Cloud system from the perspective
of the data center as the Cloud service provider, based on the queuing theory.
However, the model of this method is bounded to a specific Cloud platform and
cannot be extended to the multi-cloud environment. From the perspective of
Cloud users, a non-parametric statistical method is leveraged to predict Cloud
performance based on the observed probability distribution. However, most of
the current uncertainty modeling methods are based on the classical central
statistical method, which pays more attention to the mean and variance of the
performance. Quality critical applications, on the other hand, need to ensure
that each execution is done within a specific time window, and more attention
should be paid to the worst cases of performance rather than the mean cases.
To tackle this issue, we leverage the powerful statistical tool of Extreme
Value Theory (EVT)[1]. In statistics, EVT focuses on the tail shape of the random
variable distribution and studies the statistical characteristics of extreme values. In particular, for sample extremum, the EVT gives an asymptotic theory
that is similar to the central limit theorem. In the central limit theorem, the
distribution of the mean value approximates the normal distribution, and is independent
of the distribution of data. Similarly, EVT states that the distribution
of extreme values all follow the GEV distribution under a weak condition, and
is independent of the distribution of data. Therefore, the performance of Cloud
resources in extreme cases can be characterized more accurately using EVT and
obtain a more effective model of performance uncertainty.
Let us summarize the basic idea of our uncertainty model with EVT. The tail
of the input performance data distribution can be fitted by using GEV, so that
the probability of extreme events can be calculated. Mathematically speaking,
it is to find the quantile of the probability p. Specifically, when the probability
p is given, xp can be obtained such that P(X > xp) < p. The probability p here
can be determined according to the requirements (i.e., the deadline miss rate)
of the quality critical application, and then the performance threshold xp can
be calculated. In this way, we can make resource recommendations for quality
critical applications based on xp, or we can describe the uncertainty of Cloud
resources performance more accurately of the Cloud data center.
The overall steps for performance uncertainty modeling of Cloud resources we
proposed are as follows. First of all, we develop an automatic multi-datacenter
performance monitoring application based on the Cloud infrastructure operation
code of CloudsStorm [2] and show the performance uncertainty according to
the real performance information retrieved from four data centers. Secondly, we
use the K-means clustering method to pre-process the data, mainly to cluster
the performance data generated by different physical hosts, as the performance
of different hosts may vary in the data center. Then the performance of Cloud
resources can be characterized in a fine-grained manner. Finally, we use the EVTbased
Cloud performance model to analyze and model the extreme situation
of Cloud performance. The obtained results can be utilized by quality critical
applications as a reference for choosing a proper Cloud data center.
For the part of experimental verification, we conduct experiments using the
K-fold cross-validation method. Based on our experimental data and studies,
the threshold calculated by our proposed model can make the average miss rate
become lower than the required 5% deadline miss rate and reduced by 77%
compared with the traditional modeling method. The number of times that the
deadline miss rate cannot be satisfied is also reduced by 84%.
References
1. Beirlant, J., Goegebeur, Y., Teugels, J., Segers, J.: [wiley series in probability and
statistics] statistics of extremes (theory and applications) —— regression analysis
10.1002/0470012382, 209–250 (2004)
2. Zhou, H., Hu, Y., Ouyang, X., Su, J., Zhao, Z.: Cloudsstorm: A framework for
seamlessly programming and controlling virtual infrastructure functions during the
devops lifecycle of cloud applications. Software Practice and Experience (1) (2019)
Authors
Mengjuan Li, Master candidate, School of Computer, National University of Defense Technology. Her main research interests include Cloud computing and performance modeling.
Jinshu Su received the B.S. degree in mathematics from Nankai University, Tianjin, China, in 1985, and the M.S. and Ph.D. degrees in computer science from the National University of Defense Technology, Changsha, China, in 1988 and 2000, respectively. He is a Professor with the School of Computer Science, National University of Defense Technology. He currently leads the Distributed Computing and High Performance Router Laboratory and the Computer Networks and Information Security Laboratory, which are both key laboratories of National 211 and 985 projects, China. He also leads the High Performance Computer Networks Laboratory, which is a key laboratory of Hunan Province, China. His research interests include Internet architecture, Internet routing, security, and wireless networks.
Hongyun Liu is currently a Ph.D. candidate in the Multiscale Networked Systems(MNS) research group and a junior lecturer at Graduate School of Informatics, Univeristy of Amsterdam. He received the BSc degree in Automation and MSc in Navigation, Guidance and Control both from Northwestern Polytechnical University, Xi’an, China, in 2013 and 2017. His research interests include resource management, cloud computing, applied machine learning.
Zhiming Zhao received his Ph.D. in computer science in 2004 from University of Amsterdam (UvA). He is currently a senior researcher in the Systems and Networking Laboratory at UvA. He coordinates research efforts on quality critical systems on programmable infrastructures in the context of EU H2020 projects of ARTICONF, SWTICH, etc. His research interests include SDN, workflow management systems, multi agent system and big data research infrastructures.
Xue Ouyang received her B.Sc. and M.Sc. degrees from the National University of Defense Technology (NUDT), China, in network engineering and software engineering, respectively. She received her Ph.D. degree in the Distributed Systems and Services group in the School of Computing, University of Leeds, UK, in 2018. She is currently a Lecturer with the School of Electronic Science, NUDT. Her research interests include Cloud and Edge computing, intelligent scheduling, distributed storage, big spatial data analytics and blockchain.
Huan Zhou received the Ph.D. degree in computer science from University of Amsterdam in 2019. He is currently an Associate Professor in School of Computer, National University of Defense Technology. His research mainly focuses on Cloud computing and blockchain. He is specifically working on Cloud infrastructures seamless programming and control for orchestrating Cloud applications, as well as blockchain enhanced Cloud/Fog/Edge service management and secure network communications.