STATISTICS EXCEL: WEEK 3 URGENT
Statistics Excel Week 3 Assignment
Attached
I need only Week 3 done now
I need in 6 hours from now maximum
Data
ID Salary Compa Midpoint Age Performance Rating Service Gender Raise Degree Gender1 Grade Do not manipuilate Data set on this page, copy to another page to make changes
1 60.2 1.056 57 34 85 8 0 5.7 0 M E The ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)?
2 27.7 0.893 31 52 80 7 0 3.9 0 M B Note: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.
3 35.5 1.145 31 30 75 5 1 3.6 1 F B
4 56.1 0.985 57 42 100 16 0 5.5 1 M E The column labels in the table mean:
5 48.9 1.018 48 36 90 16 0 5.7 1 M D ID Employee sample number Salary Salary in thousands
6 74.1 1.106 67 36 70 12 0 4.5 1 M F Age Age in years Performance Rating – Appraisal rating (employee evaluation score)
7 42.2 1.055 40 32 100 8 1 5.7 1 F C Service Years of service (rounded) Gender 0 = male, 1 = female
8 21.4 0.929 23 32 90 9 1 5.8 1 F A Midpoint salary grade midpoint Raise percent of last raise
9 77.1 1.151 67 49 100 10 0 4 1 M F Grade job/pay grade Degree (0= BSBA 1 = MS)
10 22.6 0.983 23 30 80 7 1 4.7 1 F A Gender1 (Male or Female) Compa-ratio – salary divided by midpoint
11 23.8 1.036 23 41 100 19 1 4.8 1 F A
12 67.4 1.183 57 52 95 22 0 4.5 0 M E
13 40.2 1.004 40 30 100 2 1 4.7 0 F C
14 23.7 1.032 23 32 90 12 1 6 1 F A
15 23 1.000 23 32 80 8 1 4.9 1 F A
16 47.1 1.177 40 44 90 4 0 5.7 0 M C
17 65.9 1.156 57 27 55 3 1 3 1 F E
18 37.1 1.197 31 31 80 11 1 5.6 0 F B
19 24.4 1.062 23 32 85 1 0 4.6 1 M A
20 34.7 1.120 31 44 70 16 1 4.8 0 F B
21 74.4 1.110 67 43 95 13 0 6.3 1 M F
22 55.7 1.161 48 48 65 6 1 3.8 1 F D
23 24.7 1.074 23 36 65 6 1 3.3 0 F A
24 53 1.104 48 30 75 9 1 3.8 0 F D
25 24.3 1.056 23 41 70 4 0 4 0 M A
26 23.5 1.021 23 22 95 2 1 6.2 0 F A
27 41.7 1.043 40 35 80 7 0 3.9 1 M C
28 77.2 1.152 67 44 95 9 1 4.4 0 F F
29 77.7 1.160 67 52 95 5 0 5.4 0 M F
30 47.7 0.994 48 45 90 18 0 4.3 0 M D
31 23.9 1.038 23 29 60 4 1 3.9 1 F A
32 27.1 0.875 31 25 95 4 0 5.6 0 M B
33 62 1.088 57 35 90 9 0 5.5 1 M E
34 27.6 0.890 31 26 80 2 0 4.9 1 M B
35 22.9 0.994 23 23 90 4 1 5.3 0 F A
36 22.4 0.975 23 27 75 3 1 4.3 0 F A
37 23.4 1.017 23 22 95 2 1 6.2 0 F A
38 65.8 1.155 57 45 95 11 0 4.5 0 M E
39 35.1 1.131 31 27 90 6 1 5.5 0 F B
40 24.8 1.078 23 24 90 2 0 6.3 0 M A
41 50.3 1.257 40 25 80 5 0 4.3 0 M C
42 23.1 1.006 23 32 100 8 1 5.7 1 F A
43 74.4 1.110 67 42 95 20 1 5.5 0 F F
44 61.4 1.078 57 45 90 16 0 5.2 1 M E
45 48.5 1.010 48 36 95 8 1 5.2 1 F D
46 59.4 1.042 57 39 75 20 0 3.9 1 M E
47 63.5 1.114 57 37 95 5 0 5.5 1 M E
48 66.4 1.165 57 34 90 11 1 5.3 1 F E
49 60.9 1.068 57 41 95 21 0 6.6 0 M E
50 63.5 1.114 57 38 80 12 0 4.6 0 M E
Week 1
Week 1: Descriptive Statistics, including Probability
While the lectures will examine our equal pay question from the compa-ratio viewpoint, our weekly assignments will focus on
examining the issue using the salary measure.
The purpose of this assignmnent is two fold:
1. Demonstrate mastery with Excel tools.
2. Develop descriptive statistics to help examine the question.
3. Interpret descriptive outcomes
The first issue in examining salary data to determine if we – as a company – are paying males and females equally for doing equal work is to develop some
descriptive statistics to give us something to make a preliminary decision on whether we have an issue or not.
1 Descriptive Statistics: Develop basic descriptive statistics for Salary
The first step in analyzing data sets is to find some summary descriptive statistics for key variables.
Suggestion: Copy the gender1 and salary columns from the Data tab to columns T and U at the right.
Then use Data Sort (by gender1) to get all the male and female salary values grouped together.
a. Use the Descriptive Statistics function in the Data Analysis tab Place Excel outcome in Cell K19
to develop the descriptive statistics summary for the overall
group’s overall salary. (Place K19 in output range.)
Highlight the mean, sample standard deviation, and range.
b. Using Fx (or formula) functions find the following (be sure to show the formula
and not just the value in each cell) asked for salary statistics for each gender:
Male Female
Mean:
Sample Standard Deviation:
Range:
2 Develop a 5-number summary for the overall, male, and female SALARY variable.
For full credit, use the excel formulas in each cell rather than simply the numerical answer.
Overall Males Females
Max
3rd Q
Midpoint
1st Q
Min
3 Location Measures: comparing Male and Female midpoints to the overall Salary data range.
For full credit, show the excel formulas in each cell rather than simply the numerical answer.
Using the entire Salary range and the M and F midpoints found in Q2 Male Female
a. What would each midpoint’s percentile rank be in the overall range? Use Excel’s =PERCENTRANK.EXC function
b. What is the normal curve z value for each midpoint within overall range? Use Excel’s =STANDARDIZE function
4 Probability Measures: comparing Male and Female midpoints to the overall Salary data range
For full credit, show the excel formulas in each cell rather than simply the numerical answer.
Using the entire Salary range and the M and F midpoints found in Q2, find Male Female
a. The Empirical Probability of equaling or exceeding (=>) that value for Show the calculation formula = value/50 or =countif(range,”>=”&cell)/50
b. The Normal curve Prob of => that value for each group Use “=1-NORM.S.DIST” function
Note: be sure to use the ENTIRE salary range for part a when finding the probability.
5 Conclusions: What do you make of these results? Be sure to include findings from this week’s lectures as well.
In comparing the overall, male, and female outcomes, what relationship(s) see, to exist between the data sets?
Your findings:
The lecture’s related findings:
Overall conclusion:
What does this suggest about our equal pay for equal work question?
Week 2
Week 2: Identifying Significant Differences – part 1
To Ensure full credit for each question, you need to show how you got your results. This involves either showing where the data you used is located
or showing the excel formula in each cell. Be sure to copy the appropriate data columns from the data tab to the right for your use this week.
As with our examination of compa-ratio in the lecture, the first question we have about salary between the genders involves equality – are they the same or different?
What we do, depends upon our findings.
1 As with the compa-ratio lecture example, we want to examine salary variation within the groups – are they equal? Use Cell K10 for the Excel test outcome location.
a What is the data input ranged used for this question:
b Which is needed for this question: a one- or two-tail hypothesis statement and test ?
Answer:
Why:
c. Step 1: Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:
Step 5: Conduct the test – place test function in cell k10
Step 6: Conclusion and Interpretation
What is the p-value:
What is your decision: REJ or NOT reject the null?
Why?
What is your conclusion about the variance in the population for male and female salaries?
2 Once we know about variance quality, we can move on to means: Are male and female average salaries equal? Use Cell K35 for the Excel test outcome location.
(Regardless of the outcome of the above F-test, assume equal variances for this test.)
a What is the data input ranged used for this question:
b Does this question need a one or two-tail hypothesis statement and test?
Why:
c. Step 1: Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:
Step 5: Conduct the test – place test function in cell K35
Step 6: Conclusion and Interpretation
What is the p-value:
What is your decision: REJ or NOT reject the null?
Why?
What is your conclusion about the means in the population for male and female salaries?
3 Education is often a factor in pay differences.
Do employees with an advanced degree (degree = 1) have higher average salaries? Use Cell K60 for the Excel test outcome location.
Note: assume equal variance for the salaries in each degree for this question.
a What is the data input ranged used for this question:
b Does this question need a one or two-tail hypothesis statement and test?
Why:
c. Step 1: Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:
Step 5: Conduct the test – place test function in cell K60
Step 6: Conclusion and Interpretation
What is the p-value:
Is the t value in the t-distribution tail indicated by the arrow in the Ha claim?
What is your decision: REJ or NOT reject the null?
Why?
What is your conclusion about the impact of education on average salaries?
4 Considering both the compa-ratio information from the lectures and your salary information, what conclusions can you reach about equal pay for equal work?
Your findings:
The lecture’s related findings:
Overall conclusion:
Why – what statistical results support this conclusion?
Week 3
Week 3: Identifying Significant Differences – part 2 Data Input Table: Salary Range Groups
Group name: A B C D E F
To Ensure full credit for each question, you need to show how you got your results. This involves either showing where the data you used is located List salaries within each grade
or showing the excel formula in each cell. Be sure to copy the appropriate data columns from the data tab to the right for your use this week.
1 A good pay program will have different average salaries by grade. Is this the case for our company?
a What is the data input ranged used for this question: Use Cell K08 for the Excel test outcome location.
Note: assume equal variances for each grade, even though this may not be accurate, for purposes of this question.
b. Step 1: Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:
Step 5: Conduct the test – place test function in cell K08
Step 6: Conclusion and Interpretation
What is the p-value:
What is your decision: REJ or NOT reject the null?
Why?
What is your conclusion about the means in the population for grade salaries?
2 If the null hypothesis in question 1 was rejected, which pairs of means differ?
(Use the values from the ANOVA table to complete the follow table.)
Groups Compared Mean Diff. T value used +/- Term Low to High Difference Significant? Why?
A-B
A-C
A-D
A-E
A-F
B-C
B-D
B-E
B-E
C-D
C-E
C-F
D-E
D-F
E-F
3 One issue in salary is the grade an employee is in – higher grades have higher salaries.
This suggests that one question to ask is if males and females are distributed in a similar pattern across the salary grades?
a What is the data input ranged used for this question: Use Cell K54 for the Excel test outcome location.
b. Step 1: Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test: Place the actual distribution in the table below.
Why this test? A B C D E F Sum
Step 4: Decision rule: Male 0
Step 5: Conduct the test – place test function in cell K54 Female 0
Sum: 0 0 0 0 0 0 0
Step 6: Conclusion and Interpretation Place the expected distribution in the table below.
What is the p-value: A B C D E F
What is your decision: REJ or NOT reject the null? Male 0
Why? Female 0
What is your conclusion about the means in the population for male and female salaries? Sum: 0 0 0 0 0 0 0
4 What implications do this week’s analysis have for our equal pay question?
Your findings:
The lecture’s related findings:
Overall conclusion:
Why – what statistical results support this conclusion?
Week 4
Week 4: Identifying relationships – correlations and regression
To Ensure full credit for each question, you need to show how you got your results. This involves either showing where the data you used is located
or showing the excel formula in each cell. Be sure to copy the appropriate data columns from the data tab to the right for your use this week.
1 What is the correlation between and among the interval/ratio level variables with salary? (Do not include compa-ratio in this question.)
a. Create the correlation table. Use Cell K08 for the Excel test outcome location.
i. What is the data input ranged used for this question:
ii. Create a correlation table in cell K08.
b. Technically, we should perform a hypothesis testing on each correlation to determine
if it is significant or not. However, we can be faithful to the process and save some
time by finding the minimum correlation that would result in a two tail rejection of the null.
We can then compare each correlation to this value, and those exceeding it (in either a
positive or negative direction) can be considered statistically significant.
i. What is the t-value we would use to cut off the two tails? T =
ii. What is the associated correlation value related to this t-value? r =
c. What variable(s) is(are) significantly correlated to salary?
d. Are there any surprises – correlations you though would be significant and are not, or non significant correlations you thought would be?
e. Why does or does not this information help answer our equal pay question?
2 Perform a regression analysis using salary as the dependent variable and all of the variables used in Q1. Add the
two dummy variables – gender and education – to your list of independent variables. Show the result, and interpret your findings by answering the following questions.
Suggestion: Add the dummy variables values to the right of the last data columns used for Q1.
What is the multiple regression equation predicting/explaining salary using all of our possible variables except compa-ratio?
a. What is the data input ranged used for this question:
b. Step 1: State the appropriate hypothesis statements: Use Cell M34 for the Excel test outcome location.
Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:
Step 5: Conduct the test – place test function in cell M34
Step 6: Conclusion and Interpretation
What is the p-value:
What is your decision: REJ or NOT reject the null?
Why?
What is your conclusion about the factors influencing the population salary values?
c. If we rejected the null hypothesis, we need to test the significance of each of the variable coefficients.
Step 1: State the appropriate coefficient hypothesis statements: (Write a single pair, we will use it for each variable separately.)
Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:
Step 5: Conduct the test
Note, in this case the test has been performed and is part of the Regression output above.
Step 6: Conclusion and Interpretation
Place the t and p-values in the following table
Identify your decision on rejecting the null for each variable. If you reject the null, place the coefficient in the table.
Midpoint Age Perf. Rat. Seniority Raise Gender Degree
t-value:
P-value:
Rejection Decision:
If Null is rejected, what is the variable’s coefficient value?
Using the intercept coefficient and only the significant variables, what is the equation?
Salary =
d. Is gender a significant factor in salary?
e. Regardless of statistical significance, who gets paid more with all other things being equal?
f. How do we know?
3 After considering the compa-ratio based results in the lectures and your salary based results, what else would you like to know
before answering our question on equal pay? Why?
4 Between the lecture results and your results, what is your answer to the question
of equal pay for equal work for males and females? Why?
Your findings:
The lecture’s related findings:
Overall conclusion:
5 What does regression analysis show us about analyzing complex measures? 2018c Canvas Lecture Week 3 – 1a.pdf
BUS 308 Week 3 Lecture 1
Examining Differences – Continued
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. Issues around multiple testing
2. The basics of the Analysis of Variance test
3. Determining significant differences between group means
4. The basics of the Chi Square Distribution.
Overview
Last week, we found out ways to examine differences between a measure taken on two
groups (two-sample test situation) as well as comparing that measure to a standard (a one-sample
test situation). We looked at the F test which let us test for variance equality. We also looked at
the t-test which focused on testing for mean equality. We noted that the t-test had three distinct
versions, one for groups that had equal variances, one for groups that had unequal variances, and
one for data that was paired (two measures on the same subject, such as salary and midpoint for
each employee). We also looked at how the 2-sample unequal t-test could be used to use Excel
to perform a one-sample mean test against a standard or constant value. This week we expand
our tool kit to let us compare multiple groups for similar mean values.
A second tool will let us look at how data values are distributed if graphed, would they
look the same? Different shapes or patterns often means the data sets differ in significant ways
that can help explain results.
Multiple Groups
As interesting as comparing two groups is, often it is a bit limiting as to what it tells us.
One obvious issue that we are missing in the comparisons made last week was equal work. This
idea is still somewhat hard to get a clear handle on. Typically, as we look at this issue, questions
arise about things such as performance appraisal ratings, education distribution, seniority impact,
etc.
Some of these can be tested with the tools introduced last week. We can see, for
example, if the performance rating average is the same for each gender. What we couldnt do, at
this point however, is see if performance ratings differ by grade, do the more senior workers
perform relatively better? Is there a difference between ratings for each gender by grade level?
The same questions can be asked about seniority impact. This week will give us tools to expand
how we look at the clues hidden within the data set about equal pay for equal work.
ANOVA
So, lets start taking a look at these questions. The first tool for this week is the Analysis
of Variance ANOVA for short. ANOVA is often confusing for students; it says it analyzes
variance (which it does) but the purpose of an ANOVA test is to determine if the means of
different groups are the same! Now, so far, we have considered means and variance to be two
distinct characteristics of data sets; characteristics that are not related, yet here we are saying that
looking at one will give us insight into the other.
The reason is due to the way the variance is analyzed. Just as our detectives succeed by
looking at the clues and data in different ways, so does ANOVA. There are two key variances
that are examined with this test. The first, called Within Group variance, is the average variance
of the groups. ANOVA assumes the population(s) the samples are taken from have the same
variation, so this average is an estimate of the population variance.
The second is the variance of the entire group, Between Group Variation, as if all the
samples were from the same group. Here are exhibits showing two situations. In Exhibit A, the
groups are close together, in fact they are overlapping, and the means are obviously close to each
other. The Between Group variation (which would be from the data set that starts with the
orange group on the right and ends with the gray group on the left) is very close to the Within
Group (the average) variation for the three groups.
So, if we divide our estimate of the Between Group (overall) variation by the estimate of
our Within Group (average) variation, we would get a value close to 1, and certainly less than
about 1.5. Recalling the F statistic from last week, we could guess that there is not a significant
difference in the variation estimates. (Of course, with the statistical test we do not guess but
know if the result is significant or not.)
Look at three sample distributions in Exhibit A. Each has the same within group
variance, and the overall variance of the entire data set is not all that much larger than the
average of the three separate groups. This would give us an F relatively close to 1.00.
Exhibit A: No Significant Difference with Overall Variation
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
Exhibit B: Significant Difference with Overall Variation
Now, if we look at exhibit B, we see a different situation. Here the group distributions do
not overlap, and the means are quite different. If we were to divide the Between Group (overall)
variance by the Within Group (average) variance we would get a value quite a bit larger than the
value we calculated with the pervious samples, probably large enough to indicate a difference
between the within and between group variation estimates. And, again, we would examine this F
value for statistical significance.
This is essentially what ANOVA does; we will look at how and the output in the next
lecture. If the F statistic is statistically significant (the null hypothesis of no difference is
rejected), then we can say that the means are different. Neat!
So, why bother learning a new tool to test means? Why dont we merely use multiple t-
tests to test each pair separately. Granted, it would take more time that doing a single test, but
with Excel that is not much of an issue. The best reason to use ANOVA is to ensure we do not
reduce our confidence in our results. If we use an alpha of 0.05, it is essentially saying we are
95% sure we made the right decision in rejecting the null. However, if we do even 3 t-tests on
related data, our confidence drops to the P(Decision 1 correct + Decision 2 correct + Decision 3
correct). As we recall from week 1, the probability of three events occurring is the product of
each event separately, or .95*.95*.95 = 0.857! And in comparing means for 6 groups (such as
means for the different grade levels), we have 16 comparisons which would reduce our overall
confidence that all decisions were correct to 44%. Not very good. Therefore, a single ANOVA
test is much better for our confidence in making the right decision than multiple T-tests.
The hypothesis testing procedure steps are set up in a similar fashion to what we did in
with the t-tests. There is a single approach to wording the null and alternate hypothesis
statements with ANOVA:
Ho: All means are equal
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-10 -5 0 5 10
Ha: At least one mean differs.
The reason for this is simple. No matter how many groups we are testing, if a single mean
differs, we will reject the null hypothesis. And, it can get cumbersome listing all possible
outcomes of one or more means differing for the alternate.
One issue remains for us if we reject the null of no differences among the mean, which
means are different? This is done by constructing what we can call, for now, difference
intervals. A difference interval will give us a range of values that the real difference between
two means could really be. Remember, since the means are from samples, they are close
approximations to the actual population mean, which might be a bit larger or smaller than any
given mean. These difference intervals will take into account the possible sampling error we
have. (How we do this will be discussed in lecture 2 for this week.).
A difference interval might be -2 to +1.8. This says that the actual difference when we
subtract one mean from another could be any value between -2 to +1.8. Since this interval says
the difference could be 0 (meaning the means could be the same), we would find this pair of
means to be not significantly different. If, however, our difference range was, for example, from
+1.8 to + 3.8 (the same range but all positive values), we would say the difference between the
means is significant as 0 is not within the range.
ANOVA is a very useful tool when we need to compare multiple groups. For example,
this can be used to see if average shipping costs are the same across multiple shippers. The
average time to fill open positions using different advertising approaches, or the associated costs
of each, can also be tested with this technique. With our equal pay issues, we can look at mean
equality across grades of variables such as compa-ratio, salary, performance rating, seniority, and
even raise.
Chi Square Tests
The ANOVA test somewhat relies upon the shape of the samples, both with our
assumption that each sample is normally distributed with an equal variance and with their
relative relationship (how close or distant they are). In many cases, we are concerned more with
the distribution of our variables than with other measures. In some cases, particularly with
nominal labels, distribution is all we can measure.
In our salary question, one issue that might impact our analysis is knowing if males and
females are distributed across the grades in a similar pattern. If not, then whichever gender holds
more higher-level jobs would obviously have higher salaries. While this might be an affirmative
action or possible discrimination issue, it is not an equal pay for equal work situation.
So, again, we have some data that we are looking at, but are not sure how to make the
decision if things are the same or not. And, just by examining means we cannot just look at the
data we have and tell anything about how the variables are distributed.
But, have no fear, statistics comes to our rescue! Examining distributions, or shapes, or
counts per group (all ways of describing the same data) is done using a version of the Chi Square
test; and, after setting up the data Excel does the work for us.
In comparing distributions, and we can do this with discrete (such as the number of
employees in each grade) variables or continuous variables (such as age or years of service
which can take any value within a range if measured precisely enough) that we divide into
ranges, we simply count how many are in each group or range. For something like the
distribution of gender by grades; simply count how many males and females are in each grade,
simple even if a bit tedious. For something like compa-ratio, we first set up the range values we
are interested in (such as .80 up to but not including .90, etc.), and then count how many values
fall within each group range.
These counts are displayed in tables, such as the following on gender distribution by
grade. The first is the distribution of employees by grade level for the entire sample, and the
second is the distribution by gender. The question we ask is for both kinds of tables is basically
the same, is the difference enough to be statistically significant or meaningfully different from
our comparison standard?
A B C D E F
Overall 15 7 5 5 12 6
A B C D E F
Male 3 3 3 2 10 4
Female 12 4 2 3 2 2
The answer to the question of whether the distributions are different enough, when using
the Chi Square test, depends with the group we are comparing the distribution with. When we
are dealing with a single row table, we need to decide what our comparison group or distribution
is. For example, we could decide to compare the existing distribution or shape against a claim
that the employees are spread out equally across the 6 grades with 50/6 = 8.33 employees in each
grade. Or we could decide to compare the existing distribution against a pyramid shape – a more
typical organization hierarchy, with the most employees at the lower grades (A and B) and fewer
at the top; for example, 17, 10, 8, 7, 5, 3. The expected frequency per cell does not need to be a
whole number. What is important is having some justification for the comparison distribution
we use.
When we have multi-row tables, such as the second example with 2 rows, the comparison
group is known or considered to be basically the average of the existing counts. We will get into
exactly how to set this up in the next lecture. In either case the comparison (or expected)
distribution needs to have the row and column total sums to be the same as the original or actual
counts.
The hypothesis claims for either chi square test are basically the same:
Ho: Variable counts are distributed as expected (a claim of no difference)
Ha: Variable counts are not distributed as expected (a claim that a difference exists)
Comparing distributions/shapes has a lot of uses in business. Manufacturing generally
produces parts that have some variation in key measures; we can use the Chi Square to see if the
distribution of these differences from the specification value is normally distributed, or if the
distribution is changing overtime (indicating something is changing such as machine
tolerances). The author used this approach to compare the distribution/pattern of responses to
questions on an employee opinion survey between departments and the overall division.
Different response patterns suggested the issue was a departmental one while similar patterns
suggested that the division owned the results, indicating which group should develop ways to
improve the results.
Summary
This week we looked at two different tests, one that looks for mean differences among
two or more groups and one that looks for differences in patterns, distributions, or shapes in the
data set.
The Analysis of Variance (ANOVA) test uses the difference in variance between the
entire data set and the average variance of the groups to see if at least one mean differs. If so, the
construction of difference intervals will tell us which of the pairs of means actually differ.
The Chi Square tests look at patterns within data sets and lets us compare them to a
standard or to each other.
Both tests are found in the Data Analysis link in Excel and follow the same basic set-up
process as we saw with the F and t-tests last week.
If you have any questions on this material, please ask your instructor.
After finishing with this lecture, please go to the first discussion for the week, and engage
in a discussion with others in the class over the first couple of days before reading the second
lecture.
BUS308 W3 Lecture – 3A.pdf
BUS 308 Week 3 Lecture 3
Setting up ANOVA and Chi Square
Expected Outcomes
After reading this lecture, the student should know how to:
1. Set-up the data for an ANOVA analysis.
2. Set-up and perform an ANOVA test.
3. Set-up a table of mean differences.
4. Set-up and perform a Chi Square test.
Overview
Setting up the ANOVA test is quite similar to how the t and F tests were set up. The Chi
Square set-up is a bit more complex, as it is not found in the Data Analysis list of tools.
ANOVA
The set-up of ANOVA within Excel is very similar to how we set up the F and T tests
last week; place the data set in appropriate groups and then use the ANOVA input box. One
difference this week is that the Fx (or Formulas) list does not include an option for ANOVA, so
we need to use the Data | Analysis tools.
Data Set-up
Single Factor. As with the t-test, ANOVA has a couple of versions to select between.
Each is used to answer slightly different questions, and these will be examined below. The most
significant difference lies in the data table us