Lesson 6.3: Multiple Linear Regression
Software Lab 6.3
Multiple Linear Regression
Part of this software lab is adapted from Multiple Linear Regression (OpenIntro, n.d.-b) CC BY-SA 4.0.
As you work through the lab, answer the ungraded exercises in the shaded boxes. Check your answers by consulting the Software Lab 6.3 Solutions.
Remember to complete the graded Software Lab Questions for this section in Moodle.
Grading the Professor: The Data
We’ll use the same data from Software Lab 6.1 in evals_prof [CSV file] (OpenIntro, n.d.-a), which is data gathered from end of semester student evaluations for 463 courses taught by a sample of 94 professors from the University of Texas at Austin. The variables we’ll be using in this lab are:
score
: Average professor evaluation score across all courses taught by the professor: (1) very unsatisfactory – (5) excellent.bty_avg
: Average beauty rating of professor based on ratings of the professors’ physical appearance by six students: (1) least attractive – (10) most attractive.age
: Age of professorgender
: Gender of professor
Here score
is the response variable (y) and the other three variables, bty_avg
, age
, and gender
, are potential predictor variables. Before starting, go to the Data
tab, double-click the column header for age
, and change the Measure type
from Nominal
to Continuous
.
Data Exploration
Load the evals_prof [CSV file] (OpenIntro, n.d.-a) data into jamovi. Recall from Software Lab 6.1 that we found:
- A slight positive linear association between
score
andbty_avg
(correlation = 0.156) - Not much of a trend between
score
andage
(correlation = –0.080)
Now, we’ll investigate whether there is any association between score
and gender
.
Analyses > Exploration > Descriptives
, move score
to the Variables
box, and move gender
to the Split by
box. In the Plots
sub-menu, select Box plot
. Briefly summarize the distributions of score
for males and females. Which gender tends to receive higher evaluation scores, if any? Check your answer by consulting the Software Lab 6.3 Solutions.Multiple Linear Regression Model
In the “data exploration” step, we just considered associations between score
and each of the potential predictor variables individually, without taking into account how the variables might be associated together. Next, we’ll consider all the variables together in a multiple linear regression model.
Analyses > Regression > Linear Regression
, move score
to the Dependent Variable
box, move bty_avg
and age
to the Covariates
box, and move gender
to the Factors
box. Write down the equation of the least squares regression line.Let’s look at a model fit measure for the model used in question 2. Review Section 9.1.3 in OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0 to read the definition and formula for “Adjusted R2.”
Recall from Lesson 6.2 that each p-value in the last column in the “model coefficients” table is testing the null hypothesis that the corresponding model coefficient is 0. Roughly speaking, if a p-value is smaller than 0.05, then there is a statistically significant linear association between y and the corresponding predictor.
Any predictor with a relatively high p-value may not be needed in the model, and may in fact be harmful to the model. So, let’s remove the predictor with the largest p-value (age
) and refit the model without it.
Analyses > Regression > Linear Regression
, move score
to the Dependent Variable
box, move bty_avg
to the Covariates
box, and move gender
to the Factors
box. Write down the equation of the least squares regression line.age
.age
have significant coefficients at the 0.05 significance level? Which (if any) do not?Although, one of the predictors in the model without age
has a p-value above 0.05, it is barely above (0.052), so we’ll retain it in our model.
Model Interpretation
Next, we plug in 0s and 1s to the estimated regression equation and simplify to derive estimated regression equations for each category of gender. For example, for females, , so the estimated regression equation is
.
An estimated coefficient for a numerical predictor in a multiple linear regression model represents the expected change in the response variable for a one-unit increase in the predictor, holding all other predictors fixed.
bty_avg
in the multiple linear regression model fit in question 5. Is the interpretation the same for males and females?References
Diez, D. M., Çetinkaya-Rundel, M., Barr, C. D. (2019). OpenIntro Statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
OpenIntro. (n.d.-a). Data sets [Data sets]. https://openintro.org/data/
OpenIntro. (n.d.-b) CC BY-SA 4.0. Multiple linear regression. OpenIntro Labs for jamovi. https://openintrostat.github.io/oilabs-jamovi/09_multiple_regression/multiple_regression.html