Multiple Regression: Survey Data Set
Directions: Constructing & Analyzing a MR Model
A. Selecting Your Variables*
• Y-variable: Use the red colored variable as the Y-variable (See Excel Survey Questions\Survey Dataset)
• X1: must be nominal data (See video, “Generating Dummy Variables” in Additional Materials)
• X2: must be either nominal, interval, or ratio data
• X3: must be interval or ratio data
Upfront Planning on Selection of Variables
• Have an overarching idea\theme\theory\hypothesis for your model.
● Support your idea\theme\theory\hypothesis with the X-variables you select.
• Not doing the above will leave you with a statement that is a sterile, grocery-style list of statistical terms,
because you will have no personal and specific context through which to interpret your results.
• See the Survey Questions sheet in the Class Survey Data File for an explanation of the data variables.
B. Running Multiple Regression in Excel
• Copy & Paste your variables (with column headings) from the Survey Data Set onto a new sheet. Place
the X-variables in adjacent columns with the Y-variable in a column before or after.
● Click on the Data Tab from the ribbon bar, Click on Data Analysis, Select Regression, complete the
Popup Box:
○ Input Y Range: the range of cells containing the data of your Y-variable
○ Input X Range: the range of cells containing the data all of your X-variables
○ Click on Labels (so Excel will recognize your column headings as labels)
○ Output Range: input any cell that is not already occupied or click on New Worksheet Ply
○ Note: Excel auto-sets alpha at .05 (95% confidence level). You can use this setting or change it.
C. Creating a Collinearity Matrix
1. Return to your X-variables. Click on the Data Tab from the ribbon bar, Select Data Analysis, Select
Correlation, complete the Popup Box:
○ Input Range: Highlight (Select) the cell range for all the X-variables: data and column headings
○ Grouped By: Columns (Default Setting)
○ Click on Labels in First Row
○ Click on Output Range and input a cell outside the range of your data. Click OK
2. Copy and Paste the correlation matrix below your multiple regression output tables.
D. Formatting Output Tables
• Set cells to 3 – 4 decimals (except whole numbers) and clean up tables, including but not limited to
specifically identifying your output, widening/shortening column widths and/or merging cells so all
words are fully legible, etc.
• Delete the second set of upper and lower 95% boundaries (the last 2 columns) in the Anova table.
E. Complete Assignment Submission Form:
Interpreting Your Results: I am not looking for one “correct” model, significant model, or useful model.
If you get a “good” model, great. If not, great. The data is unique to your class, and as such, the output is
unknown until the data is run. So, there is no reason to run multiple models.
What is important is your demonstration of creating a model, the articulation of your reasoning for your
model, your statement of how the results support and/or do not support your thinking, and any specific
explanation and conclusion regarding your particular outcome. Your writing should display critical
thinking, creativity, insight, and a synthesis of your understanding of multiple regression within the context
you created by the selection of your X-variables.
1 Possible Structure:
○ Intro: What is the idea\theme\theory\hypothesis underlying your model, and specifically how
does your X-variables support the aforementioned.
○ Body: Analyze the data within the context of you created.
○ Conclusion: Summary\Possible Changes\Possible Next Step\Issues to Explore\Etc.
Tips:
○ Let the reader know your dummy variable code, either in the statement or tables.
○ See, FCB Written Rubrics by clicking on Core Course Documents on the homepage
○ See, Written Statements for Assignments in the Chapter 9 module
○ See, MR Assignment Guidance Video in this week’s module
○ Paragraphs show organization of thought. Use them (or write as well as Jose Saramago).
• You are not getting a degree in data entry or Excel, so whether a score moves upwards or downwards
is determined to the degree to which the statement is a well-executed, blended expression of subject
matter context and statistics.
• For the written portion (a) use 11pt/12pt font, (b) do not exceed 1 page, and (c) do not widen margins.
Last Name: Click or tap here to enter text.
First Name
Click or tap here to enter text.
Multiple Regression Statement:
WARNING: Do Not Extend this Text Box Beyond the First Page!
PASTE YOUR TABLES ON THE PAGE IN THE ORDER AS STATED IN THE DIRECTIONS.
U.S. Supreme Court Again Strikes Down a Regression Model Offered for Class
Certification: Is More Rigorous Scrutiny on the Way?
Bloomberg Law, Online*, Aug. 5, 2013
……
1. Omitted Variable Bias
Regression analyses may be attacked for failure to account for all of the variables thought to be
germane to the issue at hand. Indeed…“some regressions models are so incomplete as to be
inadmissible as irrelevant.” … When important variables are omitted from a model, the model
may be biased or its statistical significance may be uncertain.
The key…is to add the omitted variable(s) and see if the variable(s) changes the results [for the
better]. If it does, omitted variable bias is present…
2. Explanatory Power of `R-Squared’ Value
Related to the “missing variable” argument is the argument that a regression analysis should be
excluded because its low “R-squared” value…renders it unreliable. “R-squared” is a commonly
accepted statistical measure of how well a model explains the data that is being measured by
the model….
A model’s R-squared value represents the explanatory power of the model. For example, if the
alleged injury is a reduction in real property value, an R-squared value of 100 percent means
that the model explains all of the observed variations in housing prices. By contrast, an Rsquared value of 0 percent means that the model explains none of the variations in housing
prices…
Put slightly differently, if one home sold for $100,000 more than another home, an R-squared
value of 25 percent means the model can only explain $25,000 of the $100,000 price variation
and cannot account for the remaining $75,000. This is a strong indication that there may be
significant variables omitted from the model.
The R-squared value of a given model is a common method for determining whether it is a good
fit for the data being analyzed… Ideally, “courts should not rely on regression analyses whose Rsquared value has been called into serious question.”…
For example…a regression model [with the Y-variable being Salary]…has an R-square value of
.284. This R-square meant that 71.6 percent of the variation in salary among these employees
was not explained by the regression model…and the Court deemed the regression to have
no…value…
On the other hand, a high R-squared value…only means that the dependent variable marches
in step with the independent one—for any number of possible reasons … .”). “[T]here is always
the possibility that [a] high R-squared value is a product of happenstance [chance].
3. Lack of Statistical Significance
…Other courts, however, … instead rely on what is known as the “t-statistic” value (sometimes
referred to as “t-test” or “t-ratio”)…. The t-statistic demonstrates the statistical significance of a
model based on its standard error/deviation…
Regression models should also be rejected where they fail to demonstrate the statistical
significance of the data relied on. Statistical significance is the determination of whether
changes in data reflect a relationship pattern or simply mere chance…
4. Multicollinearity
Finally, multicollinearity is an additional, albeit related, basis for attacking a regression model.
“Multicollinearity” exists when there are two or more [X] variables that are highly…correlated…
The greater the multicollinearity between two variables, the less likely that a regression analysis
will be able to distinguish between competing explanations for movement in the outcome
variable….Where there is substantial multicollinearity, the size of the sample (large or small)
will not matter—the expert will not be able to determine whether there is a relationship
between the dependent and independent variables…
For example, where [persons] seek to demonstrate that their real property values have been
diminished by an oil spill, if there are two or more variables that are highly…correlated, a
regression model will not be able to determine which of those variables contributed to the
change in property value. Thus, if the oil spill occurred in the same area as another eye sore,
such as an industrial park or waste treatment facility, it is likely that the oil spill and the eye
sore are highly correlated. In that case, it may not be provable that the property values in the
area were reduced as a result of the oil spill as opposed to the other eye sore…
Conclusion
…For practitioners offering or attacking regression analyses, it is important to understand the
methodology underlying each model, and to carefully present the merits and/or deficiencies of
that model…
*Online Source: https://news.bloomberglaw.com/us-law-week/us-supreme-court-again-strikes-down-aregression-model-offered-for-class-certification-is-more-rigorous-scrutiny-on-the-way
Only Use Dataset that
THE BELOW CONTEXT APPLIES TO
DATASETS
ALL
First Letter, Last Name
Business Ma
Response Variable: See Survey Question Below*
Categorical Variable: Student’s Major
Might students of different majors hold different views
regarding whether the primary purpose of college is to
acquire work skills, and if yes or if no, what might be a
plausible explanation or outcome for the test result?
*Survey Question Prompt: The primary importance of
college is to gain work skills.
1
= Strongly Disagree; 7 = Strongly Agree
Alpha = .10
Majagement\Gen Bus
6
6
7
4
5
3
3
4
1
1
5
Only Use Dataset that Falls Under the First Letter of Your Last Name ~ Only Use Dataset that Falls Under the First Letter of Your L
First Letter, Last Name: A through G
First Letter, Last Name: H through O
Business Major
Business Major
Marketing
4
7
5
4
6
5
2
3
3
4
6
Accounting\MIS
7
4
7
6
6
6
6
7
7
3
6
Majagement\Gen Bus
1
1
6
6
4
5
3
2
4
5
4
Marketing
5
5
5
6
3
4
4
5
4
5
3
he First Letter of Your Last Name ~ Only Use Dataset that Falls Under the First Letter of Your Last Name
me: H through O
First Letter, Last Name: P through Z
Major
Business Major
Accounting\MIS
5
6
7
5
7
5
3
7
6
4
5
Majagement\Gen Bus
5
3
5
3
4
2
5
3
6
4
2
Marketing
5
7
5
6
4
5
4
6
6
6
7
Your Last Name
me: P through Z
Major
Accounting\MIS
6
7
6
5
7
7
4
5
5
4
7
Use Dataset with Column Letter that is the Same as the First Letter of Your Last Name ●
Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1
25
20
23
22.5
25
15
13
20
16.5
15.5
22.5
17.5
17
20
22
19.5
15
22.5
18.5
22
12.5
17
17.5
4.5
17.5
17
21
19
23
17.5
18.5
18.5
16.5
19
19
23.5
17
13
14.5
22
21
19
11
16
17
21
19.5
23.5
21
17.5
19.5
19
16
14.5
23.5
17.5
16
21.5
22
19.5
14
15
20.5
14.5
15
16
19
14
15.5
22
17
13.5
17.5
22
18.5
15
20.5
24.5
17
18.5
26
21
21.5
21
17
19.5
23
3.5
17.5
14.5
15
19
22.5
16
19.5
18
17.5
20
21.5
21.5
23
17
16
18
20
21.5
20.5
22.5
22.5
17.5
17
21.5
22
18.5
20.5
20.5
23.5
21
20.5
18.5
19
23
18
22.5
19
21
17
17.5
22
18
18
17.5
23
20.5
19
13.5
18.5
21
15
18.5
17.5
22.5
21.5
21
23
18.5
21.5
22
18.5
18.5
20.5
20.5
22.5
13
23.5
19
19
16.5
13
25.5
19
18.5
18
21.5
15
20.5
21
20
17
20.5
18.5
22.5
18
17
18.5
20
18.5
19
20.5
15.5
Note: Some scores may be greater than 25 because of bonus points.
Copy Data He
our Last Name ●
Copy Data Heading along with Data ● Use Dataset with Column Letter that is the Same as the First Letter of Your L
Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1
22.5
14.5
21
21
21.5
18.5
25
16.5
19.5
14
16
22.5
24.5
18
21.5
14.5
14.5
25
14.5
23
19
20.5
16
14.5
21
21
19
17.5
18.5
16
17.5
20
16
21
20.5
20
19.5
19.5
18
20.5
20.5
19.5
24
20.5
21
17.5
16.5
16.5
12
20.5
18.5
16.5
14
19.5
17
15
17
18
25.5
15
20
14.5
19
14.5
18.5
17.5
20.5
18
14.5
21.5
12.5
21
23.5
17.5
19.5
20
18.5
15
18
18
19.5
19
17
22.5
20.5
21.5
20
19
13
18
17
19.5
17.5
21.5
14.5
22
20
17.5
15.5
21.5
17
20
21
16
7
17
22
16.5
20.5
19.5
23.5
18.5
19.5
15
21.5
18
24
15
19.5
11.5
18.5
18.5
14
18
21
22.5
18
11
16
21.5
15
21.5
24
22
17
19
21.5
16.5
18
14.5
21.5
22
18.5
20
20.5
11
18
12.5
15
15.5
21.5
16
17.5
20
22
22
16
20
16.5
15.5
19
14.5
16.5
22.5
13
24
22
21
18
21
17
17.5
18
17
21
8
25.5
18.5
21
20
Same as the First Letter of Your Last Name ● Copy Data Heading along with Data
Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1 Midterm 1
16.5
20.5
18.5
19
21
20.5
22.5
22.5
24
19.5
16
20
17
17.5
20
20
22.5
19.5
19
21
13.5
18.5
22.5
22.5
17
18
21
19.5
23
24
22
22
20
22
21
19
13
23.5
23
23
17.5
23
19.5
21
18
21.5
23.5
22.5
14.5
18
25
19.5
22.5
18
21.5
18
17
18.5
19
18
20
23
18
21
20
18.5
20
15.5
22.5
22.5
23
22.5
19
16.5
21
16.5
21
18.5
25
19
17.5
21
16.5
19
23.5
18.5
23.5
23.5
18.5
21
19
16
17
15.5
23.5
23.5
23.5
22
19.5
16
14.5
22.5
20.5
21
23.5
18.5
19.5
15
20.5
14
20.5
19.5
14
16
15.5
17
13.5
14.5
25.5
25
18
19
24.5
18.5
24
17.5
18
22.5
20
21
23
17.5
22.5
19.5
23
22
20.5
21
18
20
17
25
19.5
23
23
19.5
20
20.5
20
19
19.5
24.5
18.5
19
25
20.5
17.5
20
17.5
20.5
Use Dataset with Column Letter that is the
Written Report Writen Report Written Report Writen Report Written Report Writen Report
25
19
24.5
23
25
20
11.5
26.5
17.5
19.5
21.5
21
17
24.5
15.5
19.5
18.5
7
20
19.5
17
17.5
18.5
17
20
13.5
15.5
22
19
16.5
24
20
20
20
18
19.5
24
19
18
20.5
22
16.5
16.5
17.5
14.5
17
15
12.5
17.5
23.5
20.5
15
24.5
22
22.5
19
20.5
19.5
13.5
25.5
18.5
21
23
17
18
18
21.5
19.5
19.5
18
19.5
24
21.5
22
19.5
20
23
19.5
21.5
22.5
17.5
19.5
20.5
18.5
19
18.5
22
17.5
17.5
15
15.5
21.5
20
13.5
20.5
21.5
24
17
20.5
20
19
17.5
12.5
21
17
21.5
17.5
12
17
20.5
15
22.5
23.5
20.5
19.5
19
19.5
17.5
21
19.5
Note: Some scores may be greater than 25 because of bonus points.
Dataset with Column Letter that is the Same as the First Letter of Your Last Name ●
Copy Data Heading along with Data ● Use Dat
Written Report Writen Report Written Report Writen Report Written Report Writen Report
12.5
22.5
17.5
24.5
14
22
16
14.5
22.5
16
19.5
21.5
17.5
19
25.5
16.5
23.5
20
15.5
17.5
25
17
18
16
19
18.5
21.5
20
20.5
18
16
20.5
12.5
16
16.5
13
9
18.5
20.5
18.5
18
18
23.5
17.5
15
15
17
19.5
6.5
17.5
24.5
24
21
18.5
6.5
19
16
20.5
17
19
16
21
22
16
23
19
23
22
21
22
20.5
21
21
20
20.5
18.5
19
22
20
21
18
12
20
12.5
25.5
21.5
19.5
18
13
20
24
22
23.5
19
22.5
19
22
18.5
21.5
20
14.5
18.5
24.5
20
17
21.5
22
19.5
20.5
21.5
19
16
22
12.5
22
21
14.5
16
17
17.5
bonus points.
Heading along with Data ● Use Dataset with Column Letter that is the Same as the First Letter of Your Last Name ●
Written Report Writen Report Written Report Writen Report Writen Report Written Report
23.5
22.5
18.5
25
16.5
22
23.5
15.5
21
19
12.5
25.5
23.5
16.5
16
22.5
22.5
21.5
18.5
19
18
22.5
20
23
19.5
17
23.5
22.5
21
19
13
22
20
14.5
15
16.5
20
24.5
16
19
11
21
24
20
12
17.5
13.5
22.5
20
18.5
14
16.5
19.5
20
19.5
21
18.5
17
15
18.5
19.5
16
22.5
18.5
20
15.5
20.5
18
5
17
20.5
16.5
18
18.5
15.5
24
20.5
23
20.5
20
13.5
14.5
24
20
19
16
20.5
22.5
22
16
17.5
12
20.5
22.5
17.5
19
11.5
14
15.5
21
15.5
21
17.5
18.5
19.5
19.5
18.5
14.5
24.5
22.5
22
16.5
20.5
16
20
8
25.5
19
22
21.5
Copy Data Hea
of Your Last Name ●
Copy Data Heading along with Data
Writen Report Written Report Writen Report Written Report Writen Report Written Report
16.5
24.5
17.5
21.5
21.5
22.5
23
13.5
15.5
23
19
18.5
20
18
18.5
19
15.5
21.5
16
19.5
20.5
16.5
24
17
19
23
22
21
12.5
21
21
20.5
22
22.5
17
20.5
14.5
17.5
25.5
20
19.5
19
16
20.5
21.5
18.5
19
24.5
21.5
19
23
15.5
22.5
21.5
19.5
17.5
19
16.5
25.5
17
20.5
23.5
17
17.5
25
19
18
24
16.5
18.5
21.5
14.5
22.5
22
19.5
20.5
12.5
24.5
22
17.5
16.5
18.5
20.5
16
17
15.5
12.5
17.5
15
16.5
18
18.5
22
17.5
23
24
19
20.5
25.5
19
20
20
22
22
18
21
16
25.5
18
22
24
22
19
24.5
13
21.5
25
20
21
23
Writen Report Written Report
19.5
19.5
19
19
26.5
26.5
24.5
24.5
19.5
19.5
21
22
20.5
17
25
22
24.5
21.5
18
20
22.5
22.5
22
22
25
22
22
22
24.5
25.5
20
26.5
22.5
24.5
23.5
19.5
22
23.5
18.5
23.5
Data Column Survey Question
A
To the nearest value what is your GPA?
B
How many units are you taking this semester?
C
To the nearest amount, how much school debt do you expect to have upon graduati
D
My major & career choice is done in pursuit of what I love.
E
I am optimistic about the world’s future.
F
I am motivated by tangible rewards (money, bonuses, gifts, etc)
G
I am really enjoy networking.
H
I have extensively researched the field\career I plan to enter upon graduating.
I
I regularly write down and review my goals.
J
How much do you expect to get paid upon graduating and landing your first job?
K
What is your gender?
L
What is your major?
M
What do you value more? Time or Money
N
At least 1 of my parents attended college? True or False
ct to have upon graduation?
etc)
r upon graduating.
anding your first job?
Response Scale: 1 (strongly disagree) to 7 (stongly agree)
Response Scale: 1 (strongly disagree) to 7 (stongly agree)
Response Scale: 1 (strongly disagree) to 7 (stongly agree)
Response Scale: 1 (strongly disagree) to 7 (stongly agree)
Response Scale: 1 (strongly disagree) to 7 (stongly agree)
Response Scale: 1 (strongly disagree) to 7 (stongly agree)
agree)
agree)
agree)
agree)
agree)
agree)
GPA
3.75
3.25
3.50
4.00
4.00
2.75
3.75
3.75
2.75
3.75
3.25
3.75
3.50
4.00
4.00
2.50
4.00
2.75
2.75
3.50
3.75
4.00
3.00
2.75
3.50
Units
12
16
12
15
12
14
12
12
16
16
18
14
14
12
15
19
12
16
18
14
12
15
17
18
16
School Debt Pursue What Love Optimistic: Future
$5,000
6
5
$20,000
6
6
$0
6
5
$10,000
7
5
$15,000
7
4
$20,000
4
7
$0
7
5
$5,000
5
5
$15,000
5
6
$10,000
5
6
$30,000
3
7
$0
6
5
$15,000
5
6
$0
7
7
$0
6
6
$40,000
4
6
$5,000
7
5
$20,000
4
6
$15,000
5
6
$5,000
6
5
$10,000
6
3
$5,000
7
6
$10,000
5
7
$25,000
6
7
$10,000
6
6
Rewards
4
7
3
5
4
6
3
5
5
4
7
5
6
7
4
7
5
6
7
7
6
7
6
7
5
Networking
5
6
4
5
4
7
3
6
7
5
7
4
5
5
7
5
6
7
6
4
6
6
7
7
6
Career Research
4
5
6
7
4
5
6
6
6
7
4
7
2
7
6
3
7
4
4
6
6
6
3
3
5
Write Goals Exp Starting Salary
4
$60,000
6
$75,000
4
$60,000
3
$50,000
6
$45,000
7
$80,000
2
$45,000
3
$60,000
6
$75,000
5
$65,000
7
$85,000
3
$60,000
4
$70,000
3
$50,000
5
$60,000
6
$100,000
3
$55,000
2
$80,000
7
$100,000
4
$65,000
1
$65,000
1
$50,000
7
$75,000
6
$100,000
5
$70,000
Gender
Male
Male
Female
Female
Female
Male
Female
Female
Male
Female
Male
Male
Female
Female
Female
Male
Female
Female
Male
Female
Female
Female
Male
Male
Female
Student Major
Marketing
Marketing
Management
Accounting
Accounting
Marketing
Accounting
Management
Management
Marketing
Marketing
Management
Management
Accounting
Marketing
Marketing
Accounting
Management
Marketing
Accounting
Marketing
Management
Marketing
Management
Marketing
Value: T\M
Time
Money
Time
Time
Time
Money
Money
Money
Time
Time
Money
Time
Time
Time
Time
Money
Money
Money
Money
Time
Time
Time
Money
Money
Time
Parent –> College
No
Yes
Yes
No
No
Yes
No
Yes
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes