DATA ANALYSIS ON THE

TOURISM SECTOR

Table of Contents.

TOC o “1-3” h z u 1. Introduction.12. Descriptive Statistics.22.1 Pie Chart for Total Number of Guest in the Various Accommodation Sectors.22.2 Data Clustering.32.3 Central Tendency and Measure of Dispersion.33. Inferential Statistics.43.1 Probability Tree.43.2 Hypothesis Test.53.3 Regression.63.3.1 Relationship between Revenue on Sample Night and Average Time Spent by Guest in Accommodation.63.3.2 Relationship between Number of Guest taking Breakfast and Number of Guest taking Dinner.73.4 Interpolation and Extrapolation.84. Conclusion.95. Bibliography.106. Appendices

1. Introduction.Singapore is an international trading centre and business hub of many multinational companies. With the official launch of Singapore Changi Airport Terminal 4 on Oct 31st, 2017 and the upcoming Terminal 5 project, Singapore will be expected to handle more passengers annually. This will inevitably impact Singapore’s gross domestic product (GDP) and also bring about tremendous growth to the hospitality sector mainly in the Breakfast and Bed (B;B) and Hotel Industries thereby increasing the demand for lodging.

The data analysis and statistic of this report is based on a sampling data of 50 accommodation providers which will gain insights into guest’s behavior and revenue, thereafter applying it to aid the national sales manager, Max from Unilever Food Solution (UFS), in strategizing his action plan for the food and beverages (F&B) channel. The objective of this project is to capture both the demand and trend in the breakfast and dinner sector of all accommodation providers which will be used to aid UFS on focusing their growth in the food and tea category.

2. Descriptive Statistics.The Descriptive Statistics presents an overview of the fifty accommodation providers sample data which shows or summarize the data in a meaningful way such that a pattern or relation can be seen.

2.1 Pie Chart for Total Number of Guest in the Various Accommodation Sectors.

The pie chart reflected the sample number of guests’ market share for the respective accommodation providers in the B;B and Hotels which may be used to calculate the population market share of all the accommodation providers. This will enable UFS to have a clear visibility of the number of guests patronizing each accommodation sector.

2.2 Data Clustering.From the accommodation providers’ survey data of fifty accommodation samples; we have clustered the different accommodation categories as below. Based on raw observations we can generally describe:

There is a pyramid trend in the accommodation sector where the highest number of beds and guests are at the lowest star rating, while the lowest number of beds and guests are at the highest star rating.

The occupancy rate is the highest at the 4-star rating.

B&B guests do not have dinner in their accommodation.

2.3 Central Tendency and Measure of Dispersion.From this measure of central tendency and spread of data, we can generally describe the kurtosis in Hotels is higher than that of B&B. The data distribution is never normal, it is either right skewed or left skewed. A full list of the descriptive statistics and QQ plot without clustering is shown in the appendix for reference.

3. Inferential Statistics.Inferential Statistics are techniques that allow us to use samples to make a generalization on the population. The methods of inferential statistics are:

Estimation of parameters.

Testing of statistical hypothesis.

In the following sub-chapters, we will estimate the probability of guests taking breakfast and dinner in all segments of the population. We will also perform statistical test to find out if there is any relationship between some segments and to make conclusions on some claims we have on our observations.

3.1 Probability Tree.From the sample data, we have created a probability tree as below. If we know the total population of guest accommodation, we can estimate the number of guest having breakfast or dinner in each accommodation sectors. This estimation will help Unilever to prepare stocks and market their products to the respective accommodation sectors in relation to their estimated demands.

3.2 Hypothesis Test.Null Hypothesis: H0:

Average Time spent by Guests in B&B = Average Time spent by all Guests in Hotel

Alternative Hypothesis: H1:

Average Time spent by Guests in B&B ? Average Time spent by Guests in Hotel

To check this claim two-sample t-test will be used. The formula for two sample t-test is as below:

The descriptive statistics of both genders for the Average Time spent in Sauna is summarized in the following table:

The value for the test statistics t is given as below:

t = (6.8578 – 7.5391) / sqrt (2.90*2.90/18) + (1.32*1.32/32)

t = -0.68128 / sqrt (8.41/18) + (1.7424/32)

t = -0.68128 / sqrt 0.4672 + 0.05445

t = -0.68128 / sqrt (0.521672)

t = -0.68128 / 0.722269

t = -0.943

The two sample t-test for checking this claim is as below:

For this test we get the p-value as 0.357 which is greater than the given level of significance or alpha value of 0.05, thus we do not reject the null hypothesis that the average time spent by guests for both B&B and Hotel is the same. From this, we conclude that characteristic of average time spent by guests on accommodation is independent of any external factors that B&B and Hotel have to offer.

3.3 Regression.Regression analysis is a statistical process for analyzing relationship among variables. In the following sub chapters, we will determine the relationship of key variables that we have identified and the probability of the variables that is of interest to us.

3.3.1 Relationship between Revenue on Sample Night and Average Time Spent by Guest in Accommodation.In order to check the relationship between revenue on sample night and the average time spent by guests in accommodation, a regression analysis will be used. The regression output for this model is given as below:

The correlation coefficient between the two stated variable is 0.038 which means there is very low correlation or linear association that exists between these two variables. From this, we conclude that the characteristic of average time spent by guests on accommodation is random. One of the key possibilities is that guests might have a long day before returning to their accommodation. After a tiring day, they spent a longer time

in accommodation to rest instead of our initial thought that the longer time they spent on accommodation, the higher their expenditure on the accommodation facilities.

3.3.2 Relationship between Number of Guest taking Breakfast and Number of Guest taking Dinner.Moving forward, we are also interested in checking out the relationship between the number of guests taking breakfast and number of guests taking dinner.

The correlation coefficient is given as below:

The correlation coefficient between the two stated variables is 0.727 which means there is a strong positive correlation or strong linear association that exists between these two variables. From this, we conclude that it may have been the practice for Hotel to offer dining promotion such as discount for in-house guest as a form of encouragement for them to have both their breakfast and dinner in the Hotel.

3.4 Interpolation and Extrapolation.We have also created an interpolation graph which can be extended to extrapolation. The straight line formula is also given for calculation. This interpolation and extrapolation can be an alternative estimate to our probability tree. If we know the number of guest checking in the various accommodation sector, we can estimate the number of guests which will be having breakfast or dinner in that accommodation sector. This will aid Unilever in understanding the food and beverages demand of their target market.

Straight Line Formula for Breakfast: Y = 1.2759x + 9.6393, R2 = 0.7757

Straight Line Formula for Dinner: Y = 1.1722x + 14.575, R2 = 0.7552

4. Conclusion.Based on the initial assumption, we presumed that guests will be more comfortable and thus spend more time in their accommodation as the star rating increases. However, as shown in the hypothesis test, the time spent by guests in all accommodation may be the same. We also presumed that guest will spend more when they have a longer average time spent on accommodation. However, there is no strong relationship between those two variables as proven in the correlation test. Therefore, we have decided to ignore the data on average time spent on accommodation and length of stay of guests in accommodation.

As B&B currently does not serve dinner, there is an opportunity for UFS to explore ideas such as having food vending machines in these accommodations to provide round the clock of convenience to guests instead of convention methods of providing meals in restaurants.

From the results of the data analysis and statistic test, it is recommended to shift the market focus of our food and beverages products to Hotel 4-Star due to it being one of the highest probability accommodation sector to have the highest number of guests that will be taking both breakfast and dinner.

In the perspective of economics, in the face of recession, guests from Hotel 5-Star may want to cut cost and thus choose Hotel 4-Star. In face of economic success, guests from Hotel 3-Star may wish to upgrade to Hotel 4-Star for a better hotel stay experience. Therefore, we conclude that Hotel 4-Star will be the most stable and profitable group for UFS to work with in order to achieve their strategic goals.

5. Bibliography.Levine, Stephan, Szabat, (2017). Statistics for Managers Using Microsoft Excel Eighth Edition. United States: Pearson Education Limited.

The Straits Times (2018). Changi Airport Group awards architectural and engineering contracts for Terminal 5.

Available at: https://www.straitstimes.com/singapore/transport/changi-airport-group-awards-architectural-and-engineering-contracts-for-terminal. Retrieved 11th May 2018.

Unilever (2018). Unilever Food Solutions.

Available at: http://www.unileverfoodsolutions.com.sg/. Retrieved 11th May 2018.

Excel Easy (2010). ANOVA.

Available at: https://www.excel-easy.com/examples/anova.html. Retrieved 11th May 2018.

Excel Easy (2010). Correlation.

Available at: https://www.excel-easy.com/examples/correlation.html. Retrieved 11th May 2018.

Gaurav Jha (2013). Calculating Mean-Variance Skewness Kurtosis on Excel.

Available at: https://www.youtube.com/watch?v=37nTkqIMKow. Retrieved 11th May 2018.

Steven Trumble (2014). Excel: How to create a Q-Q Plot to test for normality.

Available at: https://www.youtube.com/watch?v=U_0NY6P_xAY. Retrieved 11th May 2018.

6. AppendicesTourism Raw Data.

Sector: Hospitality, Accommodation Provision

Background: A sample of 50 accommodation providers’ responses to a survey on guests staying on their premises on an October Mid-Week night is as follows:

Descriptive Statistics.

The descriptive statistics such as mean, mode, median, minimum, maximum, coefficients of skewness and kurtosis, are calculated using Microsoft Excel. In Microsoft Excel, we add-in Analysis Toolpak, select the data, clicked on data analysis followed by descriptive statistics. We will then get descriptive statistics for the variables we selected.

lefttop

lefttop

lefttop

QQ Plot.

In order to have an overview of our data distribution, we also did a QQ-plot test as below. The distribution of the data is irregular with some outliers. However, there is still a trend in most of the data.

Inferential Statistics.

For other statistical tests, there are options given in the data analysis tools. We use these tools for different tests such as t-test or z-test, regression, ANOVA, etc. Besides using tools, we also did a manual calculation.

Probability Tree.

Regression Breakfast.

Regression Dinner.