Research # 2 Name: Hammad Qureshi Course: Big Data Analytic Teacher: Agha Saadat College: Governor State UniversityBusiness problem : Data is getting bigger and bigger ever since computer science revolutionized the Information Technology Industry. There are many impediment occurs due to having an abundance of data being used in a single business day of corporate companies.
Ther efore, the first problem is that where to store the data, and the second problem is how it can be efficient. Thus , data scientist introduced a Big Data Hadoop into the market . It has two main core component s, Hadoop Distributed File System (HDFS) and Map -reduce Framework. It mitigates sto rage system to store data into HDFS and similarly M apR educe helps to retrieve data rapidly from the system. MapR educe is a major component of Hadoop and it consist of two mai n function Key and Value , it help to optimized the performance to retrieve data from the system. Informati on Technology Industries storage data is rapidly increasing ; therefore, there are severity issue arises of storage of data.
On the other hand, it is very expensive for companies to buy storage devices and servers additionally . It cost s very expensive for business to purchase additional storage devi ces very often. Finally, they came to the solution to use MapReduce framework in order to store data wit h cost efficient way. It saves a lot of amount of money which companies spent on each year . Google and Fac e-book came up with data storage problem in 2001 when their data consumption reaches to 21 to 30 Peta -bytes per year and that was the biggest business problem for them.
Therefore, they have decided to use MapRedu ce kind of Framework in order to resolve this problem. So what they did is to developed an abstraction layer that splits the dat a flo w into two main phases, Map phase and the R educe phas e and the same technique applies in the MapReduce Framework.Technical Solution : Although , Google invented MapReduce Framework in 2004 and Yahoo started a project and developed Hadoop open source project in 2007 . Furthermore, m any Information Technology Companies are benefited from the invention of MapReduce Framework an d saving millions of dolla rs by using it. In addition, it is efficient and faster on web browser interface to get the request from the server and send responses back and forth . MapReduce framework can able to resolve business problems as well as technical Solution. It performs grouping, s orting and filtering operations, while Reduce function summarizes and aggregates the result, produced by Map function. The result o f these two functions is a k ey and v alue pair, where the keys are mapped to the values to reduce the processing.
Map Reduce fra mework of Hadoop is based on YARN architecture, which supports parallel processing of large data sets . The basic concept be hind MapReduce is that the Map sends a query to various data -nodes for processing and Reduce collects the result of these queries and output a single value . Architectural Diagram is below:MapReduce major aspect of Hadoop and conducts of two main functions are, responsible for delegating work to the different nodes in the cluster and collect all the results from the query into one cohesive answer . Thus, l arge files are split into blocks of equal size, which are distributed across the cluster for storage. Because you always need to consider the failure of the computer in a larger cluster, each block is stored multiple times usually three times on diffe rent computers. In the implementation of MapReduce, the user applies an alternating succession of map and reduce functions to the data. Parallel execution of these functions, and the difficulties that occur in the process, are handled automatically by the framework. The iteration comprises three phases such as map, shuffle, and reduce .
Furthermore, t he main components of MapReduce are Job -Tracker known as the master node , Task -Trackers known as the agents within each clust er, with functions of their own, and last but not least Job -History -Server is deployed as separate function, but a component that tracks jobs. Technical Diagram of MapR educe Framework :How Simply MapReduce split task below is the Diagram: Similarly, m any c orporate companies saving millions in hardware c osts even it is a hug multinational Information Technology corporate companies like Yahoo , Amazon, Google and etc. The challenge for saving millions of dollars in hardware costs is both a necessity and a challenge for upcoming t arget . It is s aid that more than 150 terabytes of machine data goes through their data warehouse every day, by using MapReduce Framework Technology saved millions of dollars of companies . In addition, it provides techni cal solution for the big volume of data which is beingincreased rapidly due to market demand s by using MapReduce key and values function .
Actually, keys and values function perform into a system w ith very effective approach by split ting a task into a small node s and can be search by calling keys from the values. This approach make an application work faster and better when will have a big volume of data. Statistics of last 16 years of Google Revenue is below: Furthermore, a fter Google started using MapReduce framework approach to use its key and value function into the application, it helps to reduce load from the system to retrieve data into the server because it separated task into a nodes; therefore, process does no t take much memory and it work faster and efficient as compare to other manual approach without using MapReduce Fram ework. Below is Google MapReduce d iagram :However , Google switch ed to cloud Computing and change d MapRe duce framework into clou d base cluster envi ronment in order to get rid of physical storage system. But they always ke ep backup for their physical storage system in case of any immediate alert. Therefore, they still rely on MapReduce Frame work on a side for any situation of emergency.
In addit ion, Amazon Elastic MapReduce also provide cluster that adapt dynamically to customer requirement. So now compan ies like Google and Yahoo are using Ama zon Elastic MapReduce Cloud base approach rather than Hadoop MapReduce framework because of cloud base approach to follow in the feature and get rid of physical storage system to suppo rt all the time. Conclus ion: Google resolve d its business problem b y invented Map Reduce framework ever since data storage is getting bigger and bigger day by day ; moreover, it is applying most advance approach of using Amazon Cloud based Elastic MapReduce Framework due to the dem and of market. Similarly other companies are also adopting the same techniques to use M apReduce Framework in order to be cost efficient and faster in application services . It is very important to remain in theinformation technology business to us e smart approach and not to follow always back dated approach or technology; otherwise Information Technology Industry competi tor companies could take over.
Thus Google is the leading business tycoon in the Information Techn ology industry because of its smart decision to remain updated their system according to the market demand . Therefore, it came up with effective technical solution of MapR educe fram ework to use it.Resources Link Below: http://www.admin -magazine.com/HPC/Articles/MapReduce -and -Hadoop http://map -reduce.wikispaces.asu.edu/ https://hackernoon.c om