CHAPTER 2

RANDOM SAMPLING

The core of statistical inference or inferential statistics lies between two processes i.e. taking a small piece of information from a large data and making a conclusion of the entire data. This complete data for which a conclusion is made is termed as population and the small segment of data from which a conclusion is made as a whole is termed as a sample 1. Hence sampling is a method consisting of close examination of a sample data and then applies the results to a population.

2.1 Random Experiment and Random Variables

Experiments with unpredictable outcomes are called Random experiments. Without the completion of such an experiment, its outcome is totally uncertain 2. In a random experiment, there is no preference for any of the outcome. All outcomes are equally probable to occur 3. In such cases, all possible results can be listed. This set of outcomes is the sample space denoted by S 2. A variable is something with changing values. If the numerical value of a variable can be obtained from a random experiment, such a variable is referred to as a random variable i.e. before the experiment only prediction of value will be possible 2. It means that the random variable is a real number whose value depends on the outcomes of a random experiment and a random variable can take any value of the possible outcomes 5. Basically random numbers are of two types, discrete random numbers and continuous random numbers. Discrete random numbers are those which have a discrete set of values. If a random numbers can accommodate any real values within an interval, such random numbers are continuous random numbers. In this case all outcomes are equally probable 3.

2.1.1 Sampling theory

Sampling theory relies on the investigation to find the connection between a population and its sample 3. Hence the process of sampling completely depends on the selection of suitable sample from the population to make a conclusion. A population constitute of small segments used for sampling process. During the selection, each of these units is treated as an individual. These units are known as sampling units. The criteria for sampling hang on to the need of the investigation. Then sampling is considered as a tool to evaluate a population by analysing its sample units 4.

A few significance of sampling process is mentioned below.

It is very difficult to study the whole population in a limited amount of time.

If the population is unlimited or it contains infinite number of sampling units, then its evaluation is not an easy task.

Usually, sample studies are more preferable than population studies. The consistency of results depends on the 0excellencies of the selected sample. If the sample is truthful to the whole population, then the conclusion made from it will be absolutely fine 3.

Parameter is a statistical quantity obtained by the measurement of the population. In a similar manner statistic is a statistical quantity obtained by the measurement of the sample.

2.1.2 Random Sampling

If a sampling process is carried out by selecting each and every sampling unit from a population with equal probability, such a procedure is known as random sampling. Two types of random sampling are random sampling with replacement and random sampling without replacement. If a sample is selected from one population and put back it to the same population just before the next selection, then this type of sampling process is random sampling with replacement.

A sample of size n is choosing from a population of size N, without replacement. Then there are (Nn) possible samples. Each of these samples is equally probable to select. The probability of selecting a sample unit to be included in the sample is,

P= 1NnOn the other hand, after the selection of a sample if it is not replaced back to the population, then this type of sampling process is known as sampling without replacement 5.

2.2 Probability Distribution of a Random Variable

A discrete random variable denoted by X can assumes values x1, x2, x3 ,… xn where n is the total number of outcomes. Then,

Pi = P(X = Xi), i = 1, 2, 3… nPi is the probability of Xi.

The probability function or probability mass function of the random variable X is given by, Pi = P(X = Xi)

The probability distribution function of the discrete random variable X is defined as the set of all possible ordered pairs {xi, Pi}. It means a probability distribution describes about the distribution of total probability 1, among all the possible values of a random variable.

For a continuous random variable, probability is calculated in an interval.

Let p(x) dx represents the probability of a random variable X to assume values in an interval dx. Then p(x) is the probability density function (pdf) of x 5.

2.2.1 Cumulative Distribution Function

The distribution function F(x) of a discrete random variable X, having a probability function p(x) is given by,

F(x) = P(X ? x)

If X can have values 1, 2, 3… then,

F(x) = P(X = 1) + P(X = 2) + P(X = 3) + … P(X = x)

For a continuous random variable X having probability density function p(x), the distribution function F(x) is 5,

F(x) = P(X ? x) = -?xpxdx2.3 Uniform Distribution

If the probability density function of a random variable X is equal to a constant throughout an interval, then X is said to be having a uniform distribution along the same interval. Suppose the a, b is the interval under consideration, then the probability density function f(x) will be,

f (x) = 1b-a, x? a, b 0 , otherwiseHence X is U (a, b).

Fig: 2.1 Probability density function (credit: howlingpixel.com)

The cumulative distribution function is,

F (x) = 0, x<ax-ab-a , a?x<b1, x?b

Fig: 2.2 Cumulative distribution function (credit: howlingpixel.com)

2.4 Gaussian or Normal distribution

Gaussian or normal distributions are of great importance in daily life. It is just because, the distribution of almost all natural phenomena is Gaussian. It is used for error analysis in Astronomy. Likewise it has many applications in other fields. Mean and standard deviation are the two parameters describing a normal distribution 2. A normal distribution with mean µ and standard deviation ? is,

fx= 12??2 e-x-?22?2

Fig: 2.3 Normal distribution (credit: keywordsuggest.org)

Characteristics:

It is symmetric about the mean.

Mean= mode= median for a normal distribution.

Total area under the normal curve will be 1.

Probability of x to be within a<x<b is given by the area under the curve between x= a and x=b.

Normal distributions with µ=0 and ? = 1 are called as standard normal distributions 2.

2.5 Random sampling from Salpeter function

Stellar IMF is the distribution of stellar mass at their birth. There exist two models which depend on the type of sampling process required. First model consists of random sampling in which IMF is taken in the form of a probability density function. In such a case, stars are randomly drawn from an IMF in a cluster consists of N number of stars. Second model is optimal sampling, which assure the existence of mmax- Mecl relation. mmax represents the mass of the highest massive star, and Mecl is the birth mass of a star cluster 6. In our project, we adopt the first model.

Random sampling is needed for the creation of a numerical star cluster by choosing N number of stars. This number may be a fixed one, or it may be randomly drawn from a distribution. Generally the number of stars N is randomly drawn from a distribution to obtain the distribution of stellar masses 7.

Since we adopt the first model, the IMF is normalized as a probability density function. Let p(m) denote the functions normalized as a probability density function(pdf) and c(m) denote the cumulative distribution function (cdf) of the same 8.

According to Power law,

pmdm=Nm-? 2.1

Standard normalization condition is,

mlowmuppmdm=1mlow and mup are the lower and upper mass respectively.

mlowmuppmdm=1=Nm-?+1-?+1mupmlow1= N-?+1mup-?+1-mlow-?+1N= -?+1mup-?+1-mlow-?+1? Equation 2.1 becomes,

pmdm= -?+1mup-?+1-mlow-?+1m-? 2.2

Since pdf is the derivative of cdf,

cmdm=mlowmpmdmcmdm= mlowm-?+1mup-?+1-mlow-?+1m-? cmdm=-?+1mup-?+1-mlow-?+1mlowmm-? cmdm= -?+1mup-?+1-mlow-?+1m-?+1-mlow-?+1-?+1 cmdm= m-?+1-mlow-?+1mup-?+1-mlow-?+1=r 2.3

In equation 2.3 c (m) dm is the cumulative distribution function of the unknown mass m and it is equal to a random number r which is randomly drawn from a known distribution.

From the above equation the expression for the unknown mass is as follows,

m-?+1-mlow-?+1=rmup-?+1-mlow-?+1 m-?+1=rmup-?+1-mlow-?+1+mlow-?+1 m= rmup-?+1-mlow-?+1+mlow-?+11-?+1 2.4

According to Salpeter the value of ? = 2.35, which is taken as the standard reference for our project.

References

1 Richard C Sprinthall, Basic Statistical Analysis, Allyn and Bacon, 2002

2 Robert V Hogg and Elliot A Tanis, Probability and Statistical Inference, Pearson Higher Ed, 1988

3 Suranjan Saha, Mathematics and Statistics, New Central Book Agency, 1993

4 B L Agarwall, Basic Statistics, New Age International Publishers, 1988

5 S C Gupta, Fundamentals of statistics, Himalaya Publishing House, 1981

6 C. Weidner, The mmax – Mecl relation, the IMF and IGIMF: probabilistically sampled functions, Royal Astronomical Society, MNRAS 434, p.84–101, 2013

7 Carsten Weidner, Sampling methods for stellar masses and the mmax? Mecl relation in the starburst dwarf galaxy NGC 4214, Royal Astronomical Society, MNRAS 441, p.3348–3358, 2013

8 Th.Maschberger, On the function describing stellar initial mass function, Astronomical Society of India, 2012, p.1-8