Summary of Paper
Content Delivery Policies in Replicated Web Services: Client side vs. Server side.
By: MARCO CONTI ?, ENRICO GREGORI and WILLY LAPENNA Consiglio Nazionale delle Ricerche, Istituto IIT, Via G. Moruzzi 1, 56100 Pisa, Italy.
Currently the one of the major issue Distributed System is facing is the improvement of faster and efficient method to access the Internet, How to introduce QoS in web services provided to client? Tremendous growth in number of users accessing the Internet has lead to this problem. For such a major problem, the authors of this paper talk more about the replication of webservices and its content delivery policies and how it helps to deal with QoS improvement.
Initially many methods were used to improve the access by users to internet, such as caching, prefetching and pushing. But none of them were good enough for large number of users accessing the internet. So the advanced approach of data replication was introduced to attack the problem and make accessing internet easier.
So what is replication and how it enables us to improve the way user is accessing the internet? Replication simply means creating multiple copies of a resource and distributing it throughout the network on servers. Doing so enables us to reduce the single server load on a network. Since accessing the web is done throughout the globe and is also time zone dependent Geographical replication is used since it facilitates the use of the available resources.
Circumstance that created this problem is the Centralized Server Model, In CSM there is one original web server which processes the data and sends it to the local ISP where a group of client access it. So if we focus on the communication path between client and the Centralized Server we can identify some bottleneck that cause the network infrastructure issues. The authors of the paper state there are 4 bottlenecks in CSM they are as follows : First Mile, The peering points, The backbone capacity, The last mile. First mile problem occurs when bandwidth is less. Higher the number of users higher should be the bandwidth. The peering point occurs when there are multiple uses multiple routers in a network racing with one another. The backbone capacity is the infrastructure unable to carry the load of client requests.
Last mile problem occurs when number of client trying to access network over broadband is less but over dial up connection is high.
Previous solutions were caching, prefetching, pushing came up short because cache misses are very expensive. Prefetching consumed lot of resources. Pushing was an incompetent idea because to send a file throughout the globe pushing the data would be consuming a lot of time and power. Currently there are 2 different approaches in achieving replication of web service example Clustering and Geographical replication.
The main goal of the geographical replication is to hide the communication latency, because the copy of the file will be nearby to the client. It is a promising area from the industrial point of view. Authors have assumed that there wont be any data consistency problems appearing because of the replication. Authors have gone into depth of the geographical replication to talk about the subtypes in it and comparison between them. When there are multiple webservers there is a severe need for binding policy to client requests, this can be on the following geographical basis request close to client and request close to server: – Server Side approach and the client side approach. From the above mentioned things we can get that
Authors have analysed the system with a practical example and actual analysis for server side they have used CISCO Distribute Director. This evaluation was very practical and based on two protocols
evaluation BGP and IGP.
For the client side replication approach the authors unlike in the Server approach where client has no idea about the decision making over here the the client has to know the replicated web servers involved in the network. Client assumes that when a user browser makes a request to server to get the URL it automatically gets the available IP addresses of the RWS. Client side is further divided into two forefront strategies selecting from one server or many servers.
So in Single server client side replication approach it is of grave importance to state an algorithm that each user is finding most easy to connect, when there are multiple servers used to download data, attention must be given to the way the portion of data is downloaded. One server client side strategy helps in improving QoS by reducing number of hops, utilizing available bandwidth, reduces latency. Many server client side strategy helps in improving QoS by quick download of variable size blocks and encoded blocks too.
Evaluation of the system fails in the case where the client does not know the server load and it will just keep on guessing it. Also what happens when there are multiple client requests how does the client work in harmony with other client requests has also been not evaluated. Authors evaluated the subtypes of the client side approach which is little off the topic.
According to me, the authors have mentioned a lot of a particular method than another fair. They have assumed the system to be consistent which I not practically feasible, what happens because of inconsistencies had to be mentioned because its important to the cause. Authors have given correct path to the problem evaluation. They have just used practical example for server side but the extent of how they practically tested client side is not mentioned or is little ambiguous. Both server side and client side approaches don’t share same motto. Server side approach focusses through the point of view of web services providers as compared to the client side which focuses on the services. Server side approach is the best when it comes to using resources prudently
Authors concluded that after comparing server side replication approach and client side replication approach there is no supreme strategy just a trade-off between two situations. In server side replication approach server resource allocation and usage can be managed. Author suggest that there should be a hybrid model which comprises of both server and client replication approach. This can also be the future work to be done.