Course syllabus Course resources
Course discussions
Course schedule
Send message to instructor
This week's lesson
Send message to course mentor
Class list
Home
Grade book

 

Using SPSS to Conduct Regression Analyses
 

In the SPSS lesson on correlations, you examined two variables to determine whether and how they were related to one another. Now you will apply correlation information for a related but slightly different purpose: to determine whether you can predict the value of one variable based on the value of another variable using regression. Upon completing this lesson, you should be able to:

  • Identify the slope of the regression line and the intercept on a scatterplot generated by SPSS.
  • Construct the regression equation from SPSS linear regression output.
  • Calculate the predicted value of the dependent variable for a specified value of the independent variable.
  • Calculate the residual value for a specified value of the independent variable.
  • Determine whether a specified data point is below or above the regression line.
  • Determine whether a specified independent variable is a significant predictor of a dependent variable.
  • Add a regression line to a scatterplot.
  • Conduct a linear regression analysis.

To practice generating correlation output using SPSS, download one of the following attitudes data files:

(Note: If the linked file does not begin downloading when you click the link, right-click on the link and select save target as or save link as from the menu.)

Interpreting Regression Output

Suppose you wanted to use students' attitudes towards statistics to predict the type of attitudes they might have toward research. You could:

  1. Select a sample of students.
  2. Administer an instrument that measures attitudes towards statistics.
  3. Administer an instrument that measures attitudes towards research.
  4. Use statistical regression to determine the whether students' attitudes towards statistics are a significant predictor of students' attitudes towards research.
  5. Make an inference about the population of students based on the results of the regression analysis.

Shown on the right is a scatterplot of data that you might obtain for conducting the study. From what you learned in the SPSS lesson on correlation, are attitude towards statistics and attitude towards research related? Based on the scatterplot, there appears to be a positive linear relationship between attitude towards statistics and attitude towards research.

Based on a correlation matrix for the variables (shown below), there is a moderately strong positive relationship between the attitudes (r = .648).

 

Regression Line

Recall from the lesson on regression that if two variables are correlated, it is possible to predict (with relative accuracy) the value of one of the variables given the value of the other variable. The closer the relationship is between the two variables, the higher the correlation coefficient, and the better the prediction. When two variables are linearly related, there is a line that describes the relationship between them. The equation of that line, known as the regression line, is used to make such predictions about one variable based on another variable.

 

The regression line that represents the relationship between attitudes towards statistics and attitudes towards research has been added to the scatterplot on the right. The 95% confidence interval for the mean of the data has also been added.

What is the slope of the corresponding regression line and what is the y-intercept? The regression equation (also displayed above the scatterplot) is

Research = 2.05 + 0.71*Stats

In terms of X and Y, the equation for the regression line is

Y = 2.05 + 0.71*X

Hence, the slope of the regression line is .71 and the intercept is 2.05. How can you reassure yourself that these values make sense? To double-check the slope, visually examine the scatterplot. Slope is equivalent to the rise over run (or steepness) of a line. A slope of 0.71 means that for every unit change in X, the value of Y increases by 0.71 units. The value of the intercept tells you where the regression line crosses the y-axis.

The slope and intercept of the regression line are also presented in the textual portion of the SPSS output in the section labeled coefficients.

To read the intercept from this portion of the output, you would look for "(constant)" in the column labeled model. (Think of the regression line as a model of the data.) The value of the intercept is listed in the column labeled B (which is the parameter estimate). The value of the slope is also listed in the B column but is on the row labeled with the variable name (which is stats in this example). The equation of the regression line from based on this portion of SPSS output would be Y = 0.716X + 2.046. The difference between this equation and the one obtained from the scatterplot is the number of significant digits (i.e., decimal places) in the slope value.

Now that you have the regression equation, where do you go from there? Remember that the regression equation can be used to predict values of Y for a specified value of X. For example, the predicted value for X = 7 is Y = 7.04. How well does the regression line predict Y for various values of X? To answer this question, you would examine the discrepancies (differences) between the predicted values and the actual values. The magnitude of the error in the prediction is given by the formula

which is the actual (or observed) value minus the predicted value. For our example of X = 7, the actual value of Y is 7. The corresponding residual, then, would be 7 - 7.04 = -0.04. The error in the prediction (which is the difference between the actual value and the predicted value) is known as the residual. Here is another example: Observation 4 had an actual value of Y = 5 with a predicted value of Y = 4.19; the residual is 0.81.

Based on the residuals, you can tell which points lie closer to the regression line and whether they are above or below the line. By the definition of a residual value, every actual value that differs from the predicted value has a residual value that is not zero. In order for a data point to lie directly on the regression line, it would have to have a residual value of 0 (so that the predicted value equals the observed value). (Although it is theoretically possible to obtain a residual value that is equal to 0, in real-life applications, it is rare to attain such an accurate prediction.) What does it mean if a residual has a positive value? Since we defined the residual as the actual value minus the predicted value, a positive residual indicates that the actual value is greater than the predicted value, and the data point is therefore above the regression line. A negative residual indicates that the data point lies below the regression line.

Significance of Prediction

Although we have answered the question of whether there is a relationship between attitudes towards statistics and attitudes towards research (based on our data), we have not answered the question of whether attitudes towards statistics is a significant predictor of attitudes towards research. To answer that question, we would examine the significance column in the Coefficients section of the output.

The value on the row with the variable name (stats in our example) is the probability that the null hypothesis is true (i.e., the probability that attitude towards statistics is a not a significant predictor of attitudes towards research). Based on this data, what decision would you make regarding the null hypothesis? What conclusion would you draw? Since 0.003 is less than 0.05, you would reject the null hypothesis and conclude that attitudes towards statistics was a significant predictor of attitudes towards research.


Generating Regression Output Using SPSS

Scatterplot with Overlay of the Regression Line

In order to use linear regression, the variables you are examining must be linearly related. Recall from the SPSS lesson on correlation that you can use a scatterplot to visually assess whether the variables are linearly related. Now we are going to add the regression line to a scatterplot in SPSS. To create a scatterplot with an overlay of the regression line:

  1. Open the Graphs menu and select scatterplot from the interactive submenu.
  2. Click the Assign Variables tab in the Create Scatterplot dialog box:
    • Drag the name of the variable you want to plot on the y-axis to the vertical axis. In this example, the research variable will be plotted on the y-axis.
    • Drag the name of the variable you want to plot on the x-axis to the horizontal axis. In this example, the statistics variable will be plotted on the x-axis.
    • Note: A scatterplot is a bivariate plot; you can only plot two variables in a single scatterplot.
  3. Click the Fit tab in the Create Scatterplot dialog box to specify the settings to use in "fitting" a line to the data.:
    • Select regression as the method for the fit line. Check the option labeled "include constant in equation" to include a term for the y-intercept in the equation for the regression line.
    • Select mean and/or individual as the basis for the prediction line.
      If desired, change the confidence interval. If mean in selected, SPSS will draw a band corresponding to the confidence interval of the mean of the data. If individual is selected, SPSS will draw the confidence interval for the predicted values.
    • Check the box labeled total to treat all the pair of values as one group.
  4. Click the OK button to create the scatterplot.

 

Linear Regression Analysis

To conduct a linear regression analysis in SPSS:

  1. Open the analyze menu, and select bivariate from the regression submenu.
  2. In the Linear Regression dialog box:
    • Add the dependent variable to the dependent box. In this example, research is the dependent variable
    • Add the independent variable to the independent(s) box. In this example, stats is the independent variable. (Note: Although you may add multiple independent variables to the box, selecting more than one indecent variable uses a different type of regression.)
    • Click the statistics button to access additional options for the regression analysis.
  3. In the Linear Regression: Statistics dialog box:
    • Check the box labeled estimates to include the regression coefficients (used for the regression line) in the output.
    • Check the box labeled descriptives to include information about the mean and standard deviation of the data in the output.
    • Click the continue button to return to the Linear Regression dialog box.

  4. Click the options button in the Linear Regression dialog box.
  5. In the Linear Regression: Options dialog box:
    • Check the box labeled include constant in equation to include a term for the y-intercept in the equation for the regression line.
    • Click the continue button to return to the Linear Regression dialog box.
  6. Click the OK button in the Linear Regression: dialog box to generate the regression analysis.

 

 

 

 


Review

In this lesson, you have interpreted regression output generated by SPSS to examine whether a particular variable is a significant predictor of another variable.

  • When given a scatterplot generated by SPSS, you can identify the slope of the regression line and the intercept.
  • When given linear regression output, you can:
    • Construct the regression equation.
    • Calculate the predicted value of the dependent variable for a specified value of the independent variable.
    • Calculate the residual value for a specified value of the independent variable.
    • Determine whether a specified data point is below or above the regression line.
    • Determine whether a specified independent variable is a significant predictor of a dependent variable.
  • You can add a regression line to a scatterplot.
  • You can conduct a linear regression analysis.

To practice generating correlation output using SPSS, download one of the following attitudes data files:

(Note: If the linked file does not begin downloading when you click the link, right-click on the link and select save target as or save link as from the menu.)

 

 

© 2007 by Melissa Kelly and L. K. Curda. All rights reserved. Updated on November 21, 2007