Load Balancing on the cloud

Author: Pulkit Goel
Date: 2017-12-04
Report no: IIIT/TH/2017/84
Advisor:Vasudeva Varma

Abstract

A decade or so ago, setting up online businesses called for upfront investments in hardware and software. It also required regular expenses in, maintenance, upgrades etc. However, with the advent of cloud computing, this infrastructure was now able to be “rented” from a shared pool of resources which was present remotely. This was much cheaper than investing in one’s own hardware/software and provided many other advantages such as scalability, reliability, maintenance etc. Today, any new online business can launch its application with minimal costs, just by renting servers on a cloud. Load balancing means optimizing the distribution of workloads across multiple computing resources, to maximize the efficiency of each resource. Load balancing can be done at different layers of the stack, like application, network, CPU, disk etc. but in this thesis, we focus on improving it at the application layer and the network layer. We first look at an example of load balancing at the application layer. Most applications on the cloud are made to be horizontally scalable, i.e. addition of more number of servers would enable the application to handle more load. However, this also causes an increase in the costs to the business, for renting more number of servers. Therefore, it is important to distribute the load among servers smartly, so that a smaller number of servers are able to handle a given load. This requires good data distribution strategies and load balancing algorithms which adapt to the requests and the kind of data being accessed. Today, most of the companies which work with BigData use NoSQL databases to store huge volumes of data. They prefer NoSQL over relational databases because NoSQL databases are horizontally scalable, and have much better read-write performance due to the lack of a schema. Distributed Key-value stores are a type of NoSQL database which is used heavily in most applications, for storing billions of key-value pairs. Key-value stores are required to have a very high throughput as the applications using them need to perform millions of queries/updates per second. Having good load balancing algorithms helps in achieving a better throughput using the same number of servers. We analyze, compare and look at trade-offs in different load distribution schemes for consistent hashing in distributed key-value stores. Different traffic patterns, including an adversarial pattern, were constructed to test the load distribution. A simulator was made for each load distribution scheme and the load on each node was recorded. It was shown that increasing the number of consistent hash rings doesn’t change the load distribution characteristics in simple traffic scenarios, but in the case of adversarial traffic, increasing the number of hash rings leads to an improvement of around 15% in the load distribution statistics. Secondly, we look at an example of load balancing at the network layer. Nowadays, large datacenters are moving from a traditional network to a software-defined network. Software Defined Networking( SDN) lets the network administrator create his own software for defining rules to create flow routes. Since the network administrator knows about the topology and other specifications of the network, it is possible to make packets take different routes based on the various requirements, by just programming the OpenFlow controller. There are different kinds of applications with different requirements in a datacenter. Some require low latency and are not sensitive to bandwidth, others require low packet loss etc. Thus, each application has its own requirements profile. For example, a gaming server hosted on the cloud would require extremely low latency, but would not care about bandwidth. Video streaming services, on the other hand, are sensitive to bandwidth and packet loss. In the existing networks, it is very hard to control the routes that the packets take. We create an OpenFlow controller, which uses the requirements profile of each application using the network and tries to route traffic to optimize the network parameters that the application is most sensitive to. The controller is able to adapt to changing network conditions, and intelligently updates existing routes to accommodate for the network disruptions. We test it in different scenarios with different application requirements and study the improvements that the new controller gives over the existing one. We simulate and show that file transfer of 10MB on an unstable network took 168 seconds using a simple controller, and was optimized to 88 seconds using our controller. We also simulate video streaming on an unstable network and show that our controller adapts to these fluctuations in the network and pro-actively re-routes traffic, to make the stream non-lossy. We also looked at how we can optimize migration of virtual machines for load balancing by leveraging the knowledge of migration, on the network. This is an example of how load balancing at the application layer can be optimized using the network. Today, virtual machines are used very commonly, for example, Amazon EC2 provides remote virtual machines at low costs so that clients can get an isolated environment, without spending much on the hardware. This also allows them to be variably scalable without significant increase in cost. In such clusters, where the rate of VM creation and destruction is large, it is often required to migrate virtual machines from one node to another. Virtual machine live migration means migrating the virtual machines without the client noticing any downtime. We try to optimize the latency spike during the virtual machine migration, on a software-defined network and test different strategies for the same. We simulate traffic between hosts and show that pro-actively migrating the most recently used routes, before migrating the VM, reduced the latency spike from 300ms to 2ms during migration and the network flooding with OpenFlow packets to re-discover routes was reduced by 10%.

Full thesis: pdf

Centre for Search and Information Extraction Lab

IIIT Hyderabad Publications

Load Balancing on the cloud

Abstract