Lesson 6.2: Simple Linear Regression
![""](http://introprobabilityandstatistics.pressbooks.tru.ca/wp-content/uploads/sites/113/2022/05/8356943453_95c8ee7b6c_c.jpg)
Lesson Learning Objectives
- Understand what a linear model means.
- Know the least squares criterion for fitting a straight line.
- Use statistical software to find the least squares regression line.
- Know how to find the least squares regression line from the summary statistics.
- Interpret and apply the least squares regression line.
- Interpret the residual scatterplot.
- Understand the meaning of R2.
- Identify the common pitfalls in linear regression.
- Identify pattern changes in scatterplots.
- Know the risks of extrapolation.
- Understand the issue of regression on summary data.
- Appreciate the risks of interpreting linear models as causal.
- Understand the possible effects of outliers.
- Identify high leverage points.
- Identify influential points.
Lesson 6.2 Checklist
Learning activity | Graded? | Estimated time |
---|---|---|
Read OpenIntro Statistics sections 8.2 and 8.3 and supplementary notes | No | 30 mins |
Watch instructional videos | No | 10 mins |
Answer two section check-in questions | Yes | 15 mins |
Work through virtual statistical software lab | No | 45 mins |
Answer two virtual statistical software lab questions | Yes | 15 mins |
Work on practice exercises | No | 1.5 hours |
Explore suggested websites | No | 15 mins |
Learning Activities
Readings 📖 and Instructional Video 🎦
Least Squares Regression
Read Section 8.2: Least Squares Regression in OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0. In Lesson 6.1 we considered the linear association between two numerical variables in terms of a scatterplot and correlation. This lesson builds on that by developing the least squares regression model, also known as simple linear regression. Here, one of the numerical variables, the predictor or explanatory variable (x), is used to predict the other numerical variable, the response variable (y). The least squares regression model can also handle a categorical predictor variable with two categories if we re-code it as an indicator variable (a variable that is 0 for one category and 1 for the other category). As you read, look up new terminology in the Glossary and self-assess your understanding by attempting the guided practice exercises.
Watch the following video, Fitting a Line with Least Squares Regression (Barr et al., 2014-a), on how to fit a least squares regression line to a predictor variable (x) and a response variable (y) (duration 00:06:48).
You can also watch the 44-minute video, OS4 Section 8.2 Least Squares Regression—Overview by an Author (OpenIntroOrg, 2021), for a walk-through of the content in Section 8.2. This video provides a less-formal discussion of the textbook material.
Influential Points
Read Section 8.3: Types of Outliers in Linear Regression in OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0. When fitting a least squares regression model, we should always graph the data in a scatterplot with the least squares regression line. This allows us to see whether there are any points that stand-out from the overall pattern in the point cloud. Such points, called outliers, may be influential in the sense of having an outsized effect on the fit of the model. Again, as you read, look up new terminology in the Glossary and self-assess your understanding by attempting the guided practice exercises.
Watch the following video, Types of Outliers in Linear Regression (Barr et al., 2014-b), on the different types of outliers in linear regression (duration 00:02:52).
Inference for Linear Regression
Inference for linear regression lies beyond the scope of this course, so don’t worry about reading Section 8.4 in OpenIntro Statistics. For the purposes of this course, all we need to know about inference for regression is contained in Example 8.11 in Section 8.24: “The second column is a standard error for this point estimate [of β1]: SEb1 = 0.0108. The third column is a t-test statistic for the null hypothesis that β1 = 0: T = −3.98. The last column is the p-value for the t-test statistic for the null hypothesis β1 = 0 and a two-sided alternative hypothesis: 0.0002″ (Diez et al., 2019) CC BY-SA 3.0. Roughly speaking, if this p-value is small (say smaller than 0.05), then there is a statistically significant linear association between y and x.
Least Squares Regression and Influential Points
Read Supplementary Notes 6.2, which gives a whole lot more detail and examples of least squares regression and influential points.
Lesson Check-in Questions ✍
Virtual Statistical Software Lab 💻
Work through the virtual statistical software lab: Software Lab 6.2. In this lab you’ll analyze some simple linear regression models fit to “personal freedom” data for different countries around the world. As you work through the lab, answer the exercises in the shaded boxes. These exercises are not graded but the solutions are available: Software Lab 6.2 Solutions. The lab should take you no more than 45 minutes to complete.
Virtual Statistical Software Lab Questions ✍
Practice Exercises 🖊
Work on the following exercises in OpenIntro Statistics: Exercises 8.17, 8.19, 8.21, 8.23, 8.25, 8.27, 8.29, 8.31, and 8.33, and Chapter Exercises 8.39 (parts b–f only), and 8.41 (Diez et al., 2019) CC BY-SA 3.0. Check your answers using these solutions (Diez et al., 2019) CC BY-SA 3.0. You’ll deepen your understanding much more effectively if you genuinely attempt the questions by yourself before checking the solutions.
Work on the WeBWorK exercises, which are linked from your Moodle course. Check your answers using the solutions provided.
Suggested Websites 🌎
- For another take on simple linear regression, have a look at this well-written online resource: Simple Linear Regression | An Easy Introduction & Examples (Bevans, 2020).
- Try Diagnostics for Simple Linear Regression [Application] (Çetinkaya-Rundel, 2019) that fits a simple linear regression line to data with a selected trend (linear, curved, or fan-shaped). The app is designed to help you practice evaluating whether or not the linear model is an appropriate fit to the data.
Media Attributions
Straight Lines, by Eduardo Fonseca Arraes (2013), on Flickr, CC BY-NC-ND 2.0
References
Arraes, E. F. [Duda Arraes]. (2012). Straight lines [Photograph]. Flickr. https://flic.kr/p/dJtw8X
Barr, C., Rico, J., & Diez, D. [OpenIntroOrg]. (2014-a, Jan. 26). Fitting a line with least squares regression [Video]. YouTube. https://www.youtube.com/watch?v=mPvtZhdPBhQ
Barr, C., Rico, J., & Diez, D. [OpenIntroOrg]. (2014-b, Feb. 10). Types of outliers in linear regression [Video]. YouTube. https://www.youtube.com/watch?v=jZEKAlo1E54
Bevans, R. (2020, Feb. 19). Simple linear regression | An easy introduction & examples. Scribbr. https://www.scribbr.com/statistics/simple-linear-regression/
Çetinkaya-Rundel, M. (2019, Jun. 25). Diagnostics for simple linear regression [Application]. OpenIntro. https://gallery.shinyapps.io/CLT_mean/
Diez, D. M., Çetinkaya-Rundel, M., Barr, C. D. (2019). OpenIntro Statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
OpenIntroOrg. (2021, Dec. 10). OS4 Section 8.2 least squares regression — Overview by an author [Video]. Youtube. https://www.youtube.com/watch?v=yqf0L_WEaW0