Lesson 6.3: Multiple Linear Regression

Software Lab 6.3

Multiple Linear Regression

Part of this software lab is adapted from Multiple Linear Regression (OpenIntro, n.d.-b) CC BY-SA 4.0.

As you work through the lab, answer the ungraded exercises in the shaded boxes. Check your answers by consulting the Software Lab 6.3 Solutions.

Remember to complete the graded Software Lab Questions for this section in Moodle.

Grading the Professor: The Data

We’ll use the same data from Software Lab 6.1 in evals_prof [CSV file] (OpenIntro, n.d.-a), which is data gathered from end of semester student evaluations for 463 courses taught by a sample of 94 professors from the University of Texas at Austin. The variables we’ll be using in this lab are:

  • score: Average professor evaluation score across all courses taught by the professor: (1) very unsatisfactory – (5) excellent.
  • bty_avg: Average beauty rating of professor based on ratings of the professors’ physical appearance by six students: (1) least attractive – (10) most attractive.
  • age: Age of professor
  • gender: Gender of professor

Here score is the response variable (y) and the other three variables, bty_avg, age, and gender, are potential predictor variables. Before starting, go to the Data tab, double-click the column header for age, and change the Measure type from Nominal to Continuous.

Data Exploration

Load the evals_prof [CSV file] (OpenIntro, n.d.-a) data into jamovi. Recall from Software Lab 6.1 that we found:

  • A slight positive linear association between score and bty_avg (correlation = 0.156)
  • Not much of a trend between score and age (correlation = –0.080)

Now, we’ll investigate whether there is any association between score and gender.

1. Select Analyses > Exploration > Descriptives, move score to the Variables box, and move gender to the Split by box. In the Plots sub-menu, select Box plot. Briefly summarize the distributions of score for males and females. Which gender tends to receive higher evaluation scores, if any? Check your answer by consulting the Software Lab 6.3 Solutions.

Multiple Linear Regression Model

In the “data exploration” step, we just considered associations between score and each of the potential predictor variables individually, without taking into account how the variables might be associated together. Next, we’ll consider all the variables together in a multiple linear regression model.

2. Select Analyses > Regression > Linear Regression, move score to the Dependent Variable box, move bty_avg and age to the Covariates box, and move gender to the Factors box. Write down the equation of the least squares regression line.

Let’s look at a model fit measure for the model used in question 2. Review Section 9.1.3 in OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0 to read the definition and formula for “Adjusted R2.”

3. In the Model Fit sub-menu, select “Adjusted R2” and report the resulting value.

Recall from Lesson 6.2 that each p-value in the last column in the “model coefficients” table is testing the null hypothesis that the corresponding model coefficient is 0. Roughly speaking, if a p-value is smaller than 0.05, then there is a statistically significant linear association between y and the corresponding predictor.

4. Which predictors (if any) in the multiple linear regression model have significant coefficients at the 0.05 significance level? Which (if any) do not?

Any predictor with a relatively high p-value may not be needed in the model, and may in fact be harmful to the model. So, let’s remove the predictor with the largest p-value (age) and refit the model without it.

5. Select Analyses > Regression > Linear Regression, move score to the Dependent Variable box, move bty_avg to the Covariates box, and move gender to the Factors box. Write down the equation of the least squares regression line.
When a model is refit after removing a predictor, the numbers in the coefficients table and model fit measures all change. If, after removing a predictor, the value of adjusted R2 increases, this suggests the model without the predictor provides a better fit than the model with the predictor.
6. Confirm whether the value of adjusted R2 has increased or decreased after removing age.
7. Which predictors (if any) in the model without age have significant coefficients at the 0.05 significance level? Which (if any) do not?

Although, one of the predictors in the model without age has a p-value above 0.05, it is barely above (0.052), so we’ll retain it in our model.

Model Interpretation

Next, we plug in 0s and 1s to the estimated regression equation and simplify to derive estimated regression equations for each category of gender. For example, for females, \text{gender}_\text{male}=0, so the estimated regression equation is \widehat{\text{score}}=3.6656+0.0600\,\text{bty\_avg}+0.2407(0)=3.6656+0.0600\,\text{bty\_avg}.

8. Derive the estimated regression equation for males.

An estimated coefficient for a numerical predictor in a multiple linear regression model represents the expected change in the response variable for a one-unit increase in the predictor, holding all other predictors fixed.

9. Interpret the estimated coefficient for bty_avg in the multiple linear regression model fit in question 5. Is the interpretation the same for males and females?
10. Do you think the association between the average professor evaluation score and the average professor beauty rating in this model is of practical significance?

References

Diez, D. M., Çetinkaya-Rundel, M., Barr, C. D. (2019). OpenIntro Statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/

OpenIntro. (n.d.-a). Data sets [Data sets]. https://openintro.org/data/

OpenIntro. (n.d.-b) CC BY-SA 4.0. Multiple linear regression. OpenIntro Labs for jamovi. https://openintrostat.github.io/oilabs-jamovi/09_multiple_regression/multiple_regression.html

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Software Lab 6.3 Copyright © 2023 by Thompson Rivers University is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book