Lesson 6.2: Simple Linear Regression

""
“Straight Lines” by Duda Arraes is licensed under CC BY-NC-ND 2.0

Lesson Learning Objectives

  • Understand what a linear model means.
  • Know the least squares criterion for fitting a straight line.
  • Use statistical software to find the least squares regression line.
  • Know how to find the least squares regression line from the summary statistics.
  • Interpret and apply the least squares regression line.
  • Interpret the residual scatterplot.
  • Understand the meaning of R2.
  • Identify the common pitfalls in linear regression.
  • Identify pattern changes in scatterplots.
  • Know the risks of extrapolation.
  • Understand the issue of regression on summary data.
  • Appreciate the risks of interpreting linear models as causal.
  • Understand the possible effects of outliers.
  • Identify high leverage points.
  • Identify influential points.

Lesson 6.2 Checklist

Learning activity Graded? Estimated time
Read OpenIntro Statistics sections 8.2 and 8.3 and supplementary notes No 30 mins
Watch instructional videos No 10 mins
Answer two section check-in questions Yes 15 mins
Work through virtual statistical software lab No 45 mins
Answer two virtual statistical software lab questions Yes 15 mins
Work on practice exercises No 1.5 hours
Explore suggested websites No 15 mins

Learning Activities

Readings 📖 and Instructional Video 🎦

Least Squares Regression

Read Section 8.2: Least Squares Regression in OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0. In Lesson 6.1 we considered the linear association between two numerical variables in terms of a scatterplot and correlation. This lesson builds on that by developing the least squares regression model, also known as simple linear regression. Here, one of the numerical variables, the predictor or explanatory variable (x), is used to predict the other numerical variable, the response variable (y). The least squares regression model can also handle a categorical predictor variable with two categories if we re-code it as an indicator variable (a variable that is 0 for one category and 1 for the other category). As you read, look up new terminology in the Glossary and self-assess your understanding by attempting the guided practice exercises.

Watch the following video, Fitting a Line with Least Squares Regression (Barr et al., 2014-a), on how to fit a least squares regression line to a predictor variable (x) and a response variable (y) (duration 00:06:48).

You can also watch the 44-minute video, OS4 Section 8.2 Least Squares Regression—Overview by an Author (OpenIntroOrg, 2021), for a walk-through of the content in Section 8.2. This video provides a less-formal discussion of the textbook material.

Influential Points

Read Section 8.3: Types of Outliers in Linear Regression in OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0. When fitting a least squares regression model, we should always graph the data in a scatterplot with the least squares regression line. This allows us to see whether there are any points that stand-out from the overall pattern in the point cloud. Such points, called outliers, may be influential in the sense of having an outsized effect on the fit of the model. Again, as you read, look up new terminology in the Glossary and self-assess your understanding by attempting the guided practice exercises.

Watch the following video, Types of Outliers in Linear Regression (Barr et al., 2014-b), on the different types of outliers in linear regression (duration 00:02:52).

Inference for Linear Regression

Inference for linear regression lies beyond the scope of this course, so don’t worry about reading Section 8.4 in OpenIntro Statistics. For the purposes of this course, all we need to know about inference for regression is contained in Example 8.11 in Section 8.24: “The second column is a standard error for this point estimate [of β1]: SEb1 = 0.0108. The third column is a t-test statistic for the null hypothesis that β1 = 0: T = −3.98. The last column is the p-value for the t-test statistic for the null hypothesis β1 = 0 and a two-sided alternative hypothesis: 0.0002″ (Diez et al., 2019) CC BY-SA 3.0. Roughly speaking, if this p-value is small (say smaller than 0.05), then there is a statistically significant linear association between y and x.

Least Squares Regression and Influential Points

Read Supplementary Notes 6.2, which gives a whole lot more detail and examples of least squares regression and influential points.

Lesson Check-in Questions ✍

Answer the two check-in questions for Lesson 6.2 in your Moodle course. The questions are based on the material covered in the readings and instructional videos. The questions are multiple-choice, fill-in-the-blank, matching, or calculation questions, and they are auto-graded in Moodle. Once you access the questions, you have 15 minutes to submit your answers. Overall the Lesson Check-in Questions count 6% toward your total grade.

Virtual Statistical Software Lab 💻

Work through the virtual statistical software lab: Software Lab 6.2. In this lab you’ll analyze some simple linear regression models fit to “personal freedom” data for different countries around the world. As you work through the lab, answer the exercises in the shaded boxes. These exercises are not graded but the solutions are available: Software Lab 6.2 Solutions. The lab should take you no more than 45 minutes to complete.

Virtual Statistical Software Lab Questions ✍

Answer the two virtual statistical software lab questions for Software Lab 6.2 in your Moodle course. The questions are based on the lab you just completed. The questions are multiple-choice, fill-in-the-blank, matching, or calculation questions, and they are auto-graded in Moodle. Once you access the questions, you have 15 minutes to submit your answers. Overall the Software Lab Questions count 6% toward your total grade.

Practice Exercises 🖊

Work on the following exercises in OpenIntro Statistics: Exercises 8.17, 8.19, 8.21, 8.23, 8.25, 8.27, 8.29, 8.31, and 8.33, and Chapter Exercises 8.39 (parts b–f only), and 8.41 (Diez et al., 2019) CC BY-SA 3.0.  Check your answers using these solutions (Diez et al., 2019) CC BY-SA 3.0. You’ll deepen your understanding much more effectively if you genuinely attempt the questions by yourself before checking the solutions.

Work on the WeBWorK exercises, which are linked from your Moodle course. Check your answers using the solutions provided.

Suggested Websites 🌎

Media Attributions

Straight Lines, by Eduardo Fonseca Arraes (2013), on Flickr, CC BY-NC-ND 2.0

References

Arraes, E. F. [Duda Arraes]. (2012). Straight lines [Photograph]. Flickr. https://flic.kr/p/dJtw8X

Barr, C., Rico, J., & Diez, D. [OpenIntroOrg]. (2014-a, Jan. 26). Fitting a line with least squares regression [Video]. YouTube. https://www.youtube.com/watch?v=mPvtZhdPBhQ

Barr, C., Rico, J., & Diez, D. [OpenIntroOrg]. (2014-b, Feb. 10). Types of outliers in linear regression [Video]. YouTube. https://www.youtube.com/watch?v=jZEKAlo1E54

Bevans, R. (2020, Feb. 19). Simple linear regression | An easy introduction & examples. Scribbr. https://www.scribbr.com/statistics/simple-linear-regression/

Çetinkaya-Rundel, M. (2019, Jun. 25). Diagnostics for simple linear regression [Application]. OpenIntro. https://gallery.shinyapps.io/CLT_mean/

Diez, D. M., Çetinkaya-Rundel, M., Barr, C. D. (2019). OpenIntro Statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/

OpenIntroOrg. (2021, Dec. 10). OS4 Section 8.2 least squares regression — Overview by an author [Video]. Youtube. https://www.youtube.com/watch?v=yqf0L_WEaW0

 

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Probability and Statistics Copyright © 2023 by Thompson Rivers University is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book