Decision Science Assignment In continuation with the data of performance scores of employees in previous example, perform the following: a. Calculate

Decision Science Assignment
In continuation with the data of performance scores of employees in previous example, perform the following: a. Calculate the range and interquartile range. b. Calculate the z- scores. c. Calculate the skewness and Kurtosis (using excel). d. Comment.
In continuation with the data of performance scores of employees in previous example, perform the following: a. Make the histogram b. Plot the box-plot diagram c. Plot the frequency polygon d. Plot the Ogive diagram.
HERE IS THE COMPLETE SOLVED ASSIGNMENT ATTACHED AS PDF.

1

Don't use plagiarized sources. Get Your Custom Assignment on
Decision Science Assignment In continuation with the data of performance scores of employees in previous example, perform the following: a. Calculate
From as Little as $13/Page

Decision Science

INTERNAL ASSIGNMENT- JUNE 2020

UNDER THE GUIDANCE OF

Prof. Dr. N. Palaniappan

SUBMITTED BY:

DIKSHA GUPTA

M.B.A. 2ND YEAR

DATE OF SUBMISSION:

June 23, 2020.

2

1. Identify the types of the variable.

In Decision Science or Statistics, a variable can be defined as an attribute to be studied, of any

object. Selecting, which variable attributes to measure can be a good design for experiments.

Types of data: Quantitative and Categorical variables

Data contains a specific measurement of the set of variables. These variables are generally divided

into 2 types:

Quantitative variables:

Quantitative data represents quantities or amounts. While collecting quantitative data, the recorded

numbers represent operations with arithmetic such as add, subtract, divide etc.

Categorical variables:

Categorical data represents groups.

https://www.scribbr.com/methodology/experimental-design/

3

*Sometimes a nominal variable can also be treated as a quantitative variable. A variable may fall

under more than one variable types. If the scale is numeric, such a material quality or survey ratings,

despite having floating values, the scale can fall under continuous quantitative data- type.

Sr. No. Variable Data Type Values

a Gender Nominal Categorical

variable

Male, Female and Others.

b Educational

background

Ordinal Categorical

variable

None(0), Metric pass(12),

Graduate(15), Post- graduate(17),

Doctorate(21) etc.

c Satisfaction Ordinal Categorical

variable

Low, Medium, High

or measured on a 1 to 5 level

scale.

d Motivation Ordinal Categorical

variable

Low, Medium, High

or measured on a 1 to 5 level

scale.

e Exchange rate Continuous Quantitative

variable

Any real number value such as

0.5, 78 etc.

f Gold price Continuous Quantitative

variable

Any real number value such as

0.5, 78 etc.

g Preference of cars Nominal Categorical

variable

SUV, Sedan or Race car.

h Teachers

feedback

Ordinal Categorical

variable

Bad(1), Good(2), Extraordinary(3)

etc.

i Grades in

Post- Graduation

Ordinal Categorical

variable

Grade O, Grade A, Grade B, Grade F
etc.

j Marital status Nominal Categorical

variable

Married, Unmarried, Divorced,

Widow/Widower.

k Quality of

services

Ordinal Categorical

variable

Bad(1), Good(2), Extraordinary(3)

etc.

4

l Age group Ordinal Categorical

variable

Young (0-12), Teenage (13-19),

Adult (20-50), Senior (beyond

50).

m GDP Continuous Quantitative

variable

Any real numbers such as $3000

billion.

n Interest rate Continuous Quantitative

variable

Any real numbers such as 10%,

5.8%, -1.3% etc.

o Twitter comments Nominal Categorical

variable

Exclamatory, Liked, Positive,

Disliked, Joyful etc.

p Facebook pictures Nominal Categorical

variable or

Discrete Quantitative

variable.

No ordered ranking among the

images.

OR

Images with pixel values from 0

to 255 (integers).

a. Gender: (Nominal)

When gender has just two categories simply, named as Male and Female, the gender data can be

treated as a binary variable. It can also be converted into discrete variable by writing Male as 1 and

Female as 0 or vice- versa. The word binary means, of relating to two (bi).

When gender has more than two categories such as Male, Female and others, the gender can be

treated as a nominal variable. There is no ordered ranking among the three categories.

b. Educational background: (Ordinal)

Educational background can be treated as the variable, having the highest level of education

accomplished by someone. Furthermore, these categories can be ranked in an ordered manner or

level. Here, one can give ordered level or ranking.

c. Satisfaction: (Ordinal)

The level is satisfaction cannot be counted in numbers, as it is not a real object. However, one can

grade it between 1 and 5 and compare it with others. Hence, it can be treated as an ordinal variable.

d. Motivation: (Ordinal)

Similar to the level of satisfaction one is having, the amount of impact one has gone through, by

encountering some motivations, can be compared with the others.

The impact on ones life, due to some motivational speeches can be scaled as Low, Medium or

High.

e. Exchange rate: (Continuous)

$1 = 75 or 1 = $0.013.

The rate of exchange can acquire any real value. Above example illustrates the exchange rate to be

75 or 0.013 (real). Furthermore, there is sometimes a possibility of negative rates also.

f. Gold price: (Continuous)

Similar to the exchange rate, gold prices also can be any real values such as 80/gm. Few days ago,

crude oil prices had gone negative in USA. Similar maybe the cases rarely, for golds as well.

5

g. Preference of cars: (Nominal)

A person with huge family might prefer Sedan or SUVs and a rich person might prefer a race car

such as Ferrari. These preferences cannot be explained in terms of orders rankings.

h. Teachers feedback: (Ordinal)

If a feedback is represented in terms of overall ratings, the feedback can be ordered. Good feedback

can be ranked higher the bad one.

i. Grades in Post- Graduation: (Ordinal)

Grade O, Grade A can be ranked in order. Grade A is better than B, hence can be given higher

ranking. However, it is different from 0.0 100.0 (percentage) and 1.0 8.0 (CGPA), that would

have been continuous variables.

j. Marital status: (Nominal)

The status shouldnt be ranked in any order. Ordered representation for married and widower

persons cannot be possible.

k. Quality of services: (Ordinal)

Similar to the teachers feedback, the quality can be measured in ordered manner. A good quality

can be ranked higher than the bad one.

l. Age group: (Ordinal)

Youngsters, adults and senior citizens can be arranged in an ordered manner, acquiring categorical

values.

m. GDP: (Continuous)

GDP of India was 2718.73 and 2800 billion USD, respectively in 2018 and 2019. The values can

be any real numbers, depending upon economical progress and several other factors.

n. Interest rate: (Continuous)

Japanese banks have negative interest rates, for current accounts. Whereas, Indian banks have rates

around 5.0%. The values can hold any real number.

o. Twitter comments: (Nominal)

Comments can be in support or ranging protests. They might be joyful or sad. There is no ordered

ranking.

p. Facebook pictures:

(Nominal)- If there can be no quantitative or ordered ranking among the Facebook pictures/

images, they can be kept classified with the unordered categorical data.

6

(Discrete)- A 1-bit monochrome image consists of pixel values as either 0 or 1 only, i.e. binary

values. An 8-bit grayscale/ coloured image has pixel values from 0 to 255= 28 -1, i.e. discrete or

natural number values.

Example- This is a 1-bit monochrome image (binary). Here among the 2×2 image (four) pixels

shown, white represents 1 and black colour represents 0.

https://www.britannica.com/science/E-mc2-equation

https://www.britannica.com/science/E-mc2-equation

7

2. Following data of performance scores is available of employees working with

a company. You are required to perform the following:

a. Make the frequency distribution. Calculate the frequency and the Cumulative

frequency.

Procedure to find Frequency:
The frequency (f) of a data value, is defined as the number of times the particular data value

occurs/ repeats.

For example, the performance score value 31 is repeated four times, in the given data set. Hence,

the frequency of score 31 is 4. It can be represented as: f31 = 4.

Similarly, the frequencies of each data value (performance scores ranging from 0 to 100) is to be

counted.

Note: Frequency, for the scores that are not observed in the dataset, is written as 0. For example,

f18 = 0.

Procedure to find Cumulative Frequency:
Cumulative frequency (CF) is used to determine the number of observations that lie below a

particular value in a data set.

CFn = f<=n. CF48 = f<=48 = f0 + f1 + f2 + + f48 (for discrete values of observations). The cumulative frequency is calculated by adding every frequency from a frequency distribution table upto the sum of its predecessors. Cumulative frequency can also be calculated by the total of all the frequencies of all the observation upto a particular value. The last value of cumulative frequency will definitely be equal to the total number of all the observations. For example, CF30 = f0 + f1 + + f 29+ f30 = 0+0++0+1 = 1. CF32 = f30 + f31 + f 32 = 1+3+2 = 6. For the given data in the sets of 10 scores at a time, frequency and cumulative frequency are observed as: 8 Observations: It is observed that the performance scores from 0 to 29, 99 and 100 are not obtained by any of the 208 employees. f0 = = f29 = 0. According to the given data, 30 and 98 are the lowest and highest scores obtained by any employee, respectively. Additionally, the performance scores 65 and 81 are also not obtained by anybody. Hence, it can be said that: f99 = f100 = f65 = f81 = 0. Performance score 53 is the most frequent score. f53 = 8. 9 10 b. Calculate the mean, median, mode and quartiles. Arithmetic Mean (Average): An arithmetic mean () is a single number, that summarisingly represents a list of numbers. It is the half way through (middle of) all the observed data set. For example, the set {3,4,5,6,7} has the half way (average) at 5. It is a sum of the list of the observation values given and then divided by the total number of observations. Arithmetic Mean (A.M.) = = x . In the given dataset, the number of employees (observations) is 208. Their performance scores (observation values) ranges from 30 to 98. According to the given data set: = 52+57+50+68+74++66 208 = 63.35096 . Median: The median is the middle number obtained in an ascending/ descending order sorted list. It can be more descriptive of any data set than the average. The median is generally used, when there are outliers in the dataset, causing skewness. Hence, median tends to avoid the outliers and skewness in any data. For example, the set {5,2,6,3,7} can be ascendingly sorted as {2,3,5,6,7} and hence the median is 5. A median separates the higher half (50%tile) from the lower half of a data sample. Median = {(n + 1) 2}th value in a sorted dataset. In the given dataset, sort dataset would be: {30,31,31,31,32,32,33,,98}. Median = 62. Mode: The observation with highest frequency is known as the mode of the data. It indicates, most of the data is crowded around which particular value. The mode of a set of data is the value in the set, that occurs most often. In the given dataset, it is observed that the performance score of 53 has been obtained by most of the people. fmax = f53 = 8. Mode = 53. Note: For the given data, Mode < Mean and Median < Mean. Hence, the data may be positively skewed, approximately. Several data-points may lie on the right side of the mode value. To know this, skewness is also to be measured. 11 Geometric Mean (G.M.): It is the central number in a G.P. (geometric progression), such as 3, 9, 27 has G.M. as 9. G. M. = ( xi) 1n = (x1. x2 xn) 1n G.M. = (52 x 57 x 50 x 68 x 74 x x 66) 1 208 = 59.8147 Harmonic Mean (H.M.): The harmonic mean is a kind of average of the reciprocals and also the Pythagorean mean. H.M. = 208 1 52 + 1 57 + 1 50 + + 1 66 = 56.25897 Note: It can be always noted that H.M. G.M. A.M. Quartiles: A quartile is a type of quantile, which divides the number of data points into 4 parts of equal lengths, that are known as quarters. The first quartile (Q1) is a middle number between the lowest number and the median of the data set. Hence in an ordered dataset, Q1 stands at the boundary between first 25%tile and the rest 75%tile of the data. In the given dataset, Q1= 45, i.e. 25% of the data (208/4 = 52 observations) lie on or on the left of this number and the rest lie on the right side of this number. The second quartile (Q2) is defined as the point, beyond which just 50% of the ordered observations exist. It is also known as the median of the data. In the given dataset, Q2= 62, i.e. 50% of the data (208/2 = 104 observations) lie on or on the left of the dataset. The third quartile (Q3) is defined as the middle number between the largest number and the median of the data set. Hence in an ordered dataset, Q3 stands at the boundary between first 75%tile and the rest 25%tile of the data. In the given dataset, Q3= 83, i.e. 25% of the data (208x75% = 156 observations) lie on or on the left of this number and the rest lie on the right side of this number. The fourth quartile (Q4) is defined as the largest observed data value, beyond which no more values of the ordered observations exist. It is also known as the largest observed value of the data. In the given dataset, Q4= 98, i.e. 100% of the data (208x100% = 208 observations) lie on or on the left of the dataset. 12 c. Calculate the variance and the standard deviation. Variance (2): Variance measures how far the dataset values are spread out, from their average value. It is the fact or quality of being inconsistent, divergent or different. Lower the variance means, lower (narrower) is spread of the data and it is more tightly clumped around a certain mean value. Similarly, high variance value indicates a lose widespread (broader) of the data, around a certain mean value. Variance (2) is always a non- negative number. 2 0. Population variance refers to the value of variance that is calculated from the complete population data (with all the number of samples). Sample variance is the variance calculated from sample data (with re-scaling to n-1 samples in the denominator). Complete steps to calculate variance have been shown in the Excel sheet attached. Formulae for Sample variance (s 2) and Population variance (p 2) are given by, 13 s2 = (x ) 1. and p2 = (x ) . = E[X2] 2 where, = average value or arithmetic mean n= Number of observations x= the value of the one observation at a time. E= Expected value. In the given dataset, s 2 = 428.8859 and p 2 = 426.824 Note: Since, 1/n < 1/(n-1), hence p 2 < s 2. If the observed data values are in Kgs, the variance will be in Kgs2, i.e. having the squared unit as that of the original data. Standard Deviation (): The Standard Deviation is also a measure of how spread out numbers are, similar to the variance. It actually is the positive square root of the variance. is always a non- negative number. 0. Low value indicates low (narrow) spread of the data around a mean value and vice- versa. Formulae for Sample Standard Deviation (s) and Population Standard Deviation (p) are given by, s = (x ) 1. and p = (x ) . Standard Deviation= where, = average value or arithmetic mean n= Number of observations x= the value of the one observation at a time s = Sample Standard Deviation p = Population or Universal Standard Deviation. Complete steps to calculate SD have been shown in the Excel sheet attached. In the given dataset, s = 20.70956 and p = 20.65972. Note: Since, 1/n < 1/(n-1), hence p < s. If the observed data values are in Kgs, the standard deviation will also be in Kgs, i.e. having the same unit as that of the original data. Another important aspect of the SD is the fact that it tells about the most likely range of the data. Most of the given data lies in the range s, i.e. between 43 and 84 approximately. 14 Results in R: 15 16 3. A. In continuation with the data of performance scores of employees in previous example, perform the following: a. Calculate the range and interquartile range. b. Calculate the z- scores. c. Calculate the skewness and Kurtosis (using excel). d. Comment on the distribution of the data. Range: The range of a data is the difference between the amount of highest and lowest values. Range = Highest value Lowest value Range = 98 30 = 68. Disadvantage However, the range can be misleading, at giving the idea about values possible to be seen in the data. In the given data, other possible values, that are not seen yet, are 0- 29, 99 and 100. Here, the lowest possible value= 0 and the highest possible value= 100. Hence, maximum possible range= 100 0 = 100. For example, {8,11,5,9,1,6,3616} for this set, the range= 3616 1= 3615. But, except just one data value, all the other data values are around 10. Hence, IQR and are to be calculated. Inter-Quartile Range (IQR): The inter-quartile range (IQR or Midspread) is a measure of variability, on the basis of dividing the dataset into the quartiles. IQR is the difference between the amount of largest and smallest values, in the middle 50% of the dataset. Inter-Quartile Range = Third Quartile First Quartile. IQR = Q3 Q1. But, Q1 = 45 and Q3 = 83. Therefore, IQR = 83 45 = 38. Z- scores (Standardised values): A z-score describes the standardised position of data value, as its distance from the mean value. A data value has the positive z-score, if it lies above the mean value. Similarly, the z-score is negative, if the data value lies below the mean. The z-score (standard score) is the amount of standard deviations, by which a value of the data is below or above the mean (). Note: The average of all the z-scores must be equal to 0. Sample SD of all the z-scores must be 1. 17 Formula for the z-score of a data value (x) is given by: z-score = where, = sample standard deviation = 20.70956 = Arithmetic mean = 63.35096 . For example, {1,2,3,4,5} dataset has = 3 and s = 1.58. Hence, z-scores will be: {-1.265, - 0.632, 0, 0.632, 1.265}. The z-scores have been displayed in the corresponding Excel sheet. For the provided dataset of employees performances, some of the z- scores are: 18 19 20 Skewness: Skewness refers to the amount of distortion, in a normal bell curve. Skewness represents the extent, to which a given dataset varies. A normal distribution (bell) generally has a skewness of zero. A negative skewness indicates that the tail of the distribution is on the left side. A positive skewness indicates, the tail to be on the right. Application- Investors note skewness, while judging a return distribution because it considers the extremes (outliers) of the dataset instead of focusing only on the means. Kurtosis also does the same and hence used as an alternative. Some formulae for measuring skewness are: Pearsons Mode Skewness = S1 = Mode = 63.3509653 20.70956 S1 = 0.5 Pearsons Median Skewness = 3 () S2 = 3 (Median) = 3 (63.3509662) 20.70956 S2 = 0.2 Co-efficient of Skewness = ( )3 (n1). 3 = ( 63.35096)3 207 20.709563 Co-efficient of Skewness = 0.09767 (using MS Excel). Kurtosis: Kurtosis tells us about the height and the sharpness of the central peak, when compared to that of a standard bell curve. Kurtosis defines, how heavily/ mildly the tails of the data distribution differ from that of a bell distribution. Kurtosis tells, whether the tails of the given data distribution contain extreme values and the number of extreme values. Kurtosis is a measure of the tailedness (outliers/ extremes) in a probability distribution. Formula for measuring kurtosis is: Kurtosis = ( )4 (n1). 4 -3 = ( 63.35096)4 207 20.709564 -3 Kurtosis = -1.2854 (using MS Excel). 21 Data distribution: Mean= 63.35096, Median= 62, Mode= 53. Median < Mean and Mode < Mean. Hence, it appears that the data may be positively skewed (towards the right). If Skewness (-1,1); highly skewed distribution. If Skewness (-1, -0.5) or Skewness (0.5, 1); moderately skewed distribution. If Skewness [-0.5, 0.5]; approximately symmetric distribution. However, Co-efficient of Skewness = 0.097. Hence, the distribution is approximately symmetric. If Kurtosis 0; Mesokurtic distribution Close to Normal distribution. If Kurtosis > 0; Leptokurtic distribution More outliers, heavy tails, risky financial investments.

If Kurtosis < 0; Platykurtic distribution Less outliers, low tails, desirable financial investments. Since, Kurtosis= -1.2854. Hence, the data follows a Platykurtic distribution. First Quartile= 45 and Third Quartile= 83. IQR= 38. Hence, on the performance scores scale between 0 and 100, the middle 50% of the data lies between just 45 and 83, i.e. just 38% (nearly 3/8th) of all possible performance scores. Range = 68. Out of all the possible scores between 0 and 100, every employee is having the score in 68% range (nearly 2/3rd). About 1/3rd of the scores are not obtained by anybody. 22 Results in R: 23 3. B. In continuation with the data of performance scores of employees in previous example, perform the following: a. Make the histogram b. Plot the box-plot diagram c. Plot the frequency polygon d. Plot the Ogive diagram. Histogram: A histogram is a graphical plot/ display of the given data, using bars of different frequencies (heights). Taller bars indicate more data falling into the particular range. A histogram also displays the spread and shape of the continuous given sampled data. Box-Plot: A box- plot graphically depicts groups of given data, with the help of their quartiles. Box- plots also have extended lines, indicating variability outside the first and third quartiles. Hence, it is also known as the terms box-and-whisker plot diagram. It contains minima, Q1, median, Q3 and maxima in this proper sequence only. Frequency Polygon: A frequency polygon graph is constructed by using histogram bars/ bins/ intervals. There are lines to join midpoints of each bin. Here, the heights of bins are converted into points, that represent frequencies. A frequency polygon is generally created from the histogram. From the frequency distribution table, it can also be created by calculating midpoints of the intervals. 24 Ogive Diagram: An ogive diagram (cumulative frequency polygon) shows cumulative frequencies. Here, the cumulative percentages are added on graph from left to right (lower range to higher range). An ogive graph has cumulative frequencies on the y-axis and class ranges (boundaries) on the x- axis. Conclusion: The histogram does not follow any normal distribution. However, the ogive diagram suggests that the data bins are more likely to be flat, rather than curved, i.e. nearly equal number of employees fall under each bin. However, middle 50% of distribution has scores between 45 and 83 (Q1 and Q3). On the scale of 0 to 100, the 1/3rd of the scores have not been obtained by anybody. The data is approximately symmetric. 25 36 27 32 33 21 26 33 0 5 10 15 20 25 30 35 40 30-39 40-49 50-59 60-69 70-79 80-89 90-98 F R E Q U E N C Y EMPLOYEES' PERFORMANCE SCORES Histogram 0 10 20 30 40 50 60 70 80 90 100 Box- Plot 36 27 32 33 21 26 33 0 5 10 15 20 25 30 35 40 30-39 40-49 50-59 60-69 70-79 80-89 90-98 F re q u e n cy Employees' Performance Scores Frequency Polygon 17.31 30.29 45.67 61.54 71.63 84.13 100 0 10 20 30 40 50 60 70 80 90 100 30-39 30-49 30-59 30-69 30-79 30-89 30-98 C u m u la ti ve F re q u e n cy % Employees' Performance Scores Ogive Diagram 26 27