Using SPSS to Conduct Regression Analyses 
In the SPSS lesson on correlations, you examined two variables to determine whether and how they were related to one another. Now you will apply correlation information for a related but slightly different purpose: to determine whether you can predict the value of one variable based on the value of another variable using regression. Upon completing this lesson, you should be able to:
To practice generating correlation output using SPSS, download one of the following attitudes data files:
Interpreting Regression Output Suppose you wanted to use students' attitudes towards statistics to predict the type of attitudes they might have toward research. You could:
Shown on the right is a scatterplot of data that you might obtain for conducting the study. From what you learned in the SPSS lesson on correlation, are attitude towards statistics and attitude towards research related? Based on the scatterplot, there appears to be a positive linear relationship between attitude towards statistics and attitude towards research. Based on a correlation matrix for the variables (shown below), there is a moderately strong positive relationship between the attitudes (r = .648).
Regression Line Recall from the lesson on regression that if two variables are correlated, it is possible to predict (with relative accuracy) the value of one of the variables given the value of the other variable. The closer the relationship is between the two variables, the higher the correlation coefficient, and the better the prediction. When two variables are linearly related, there is a line that describes the relationship between them. The equation of that line, known as the regression line, is used to make such predictions about one variable based on another variable.
The regression line that represents the relationship between attitudes towards statistics and attitudes towards research has been added to the scatterplot on the right. The 95% confidence interval for the mean of the data has also been added. What is the slope of the corresponding regression line and what is the yintercept? The regression equation (also displayed above the scatterplot) is
In terms of X and Y, the equation for the regression line is
Hence, the slope of the regression line is .71 and the intercept is 2.05. How can you reassure yourself that these values make sense? To doublecheck the slope, visually examine the scatterplot. Slope is equivalent to the rise over run (or steepness) of a line. A slope of 0.71 means that for every unit change in X, the value of Y increases by 0.71 units. The value of the intercept tells you where the regression line crosses the yaxis. The slope and intercept of the regression line are also presented in the textual portion of the SPSS output in the section labeled coefficients. To read the intercept from this portion of the output, you would look for "(constant)" in the column labeled model. (Think of the regression line as a model of the data.) The value of the intercept is listed in the column labeled B (which is the parameter estimate). The value of the slope is also listed in the B column but is on the row labeled with the variable name (which is stats in this example). The equation of the regression line from based on this portion of SPSS output would be Y = 0.716X + 2.046. The difference between this equation and the one obtained from the scatterplot is the number of significant digits (i.e., decimal places) in the slope value. Now that you have the regression equation, where do you go from there? Remember that the regression equation can be used to predict values of Y for a specified value of X. For example, the predicted value for X = 7 is Y = 7.04. How well does the regression line predict Y for various values of X? To answer this question, you would examine the discrepancies (differences) between the predicted values and the actual values. The magnitude of the error in the prediction is given by the formula which is the actual (or observed) value minus the predicted value. For our example of X = 7, the actual value of Y is 7. The corresponding residual, then, would be 7  7.04 = 0.04. The error in the prediction (which is the difference between the actual value and the predicted value) is known as the residual. Here is another example: Observation 4 had an actual value of Y = 5 with a predicted value of Y = 4.19; the residual is 0.81. Based on the residuals, you can tell which points lie closer to the regression line and whether they are above or below the line. By the definition of a residual value, every actual value that differs from the predicted value has a residual value that is not zero. In order for a data point to lie directly on the regression line, it would have to have a residual value of 0 (so that the predicted value equals the observed value). (Although it is theoretically possible to obtain a residual value that is equal to 0, in reallife applications, it is rare to attain such an accurate prediction.) What does it mean if a residual has a positive value? Since we defined the residual as the actual value minus the predicted value, a positive residual indicates that the actual value is greater than the predicted value, and the data point is therefore above the regression line. A negative residual indicates that the data point lies below the regression line. Significance of Prediction Although we have answered the question of whether there is a relationship between attitudes towards statistics and attitudes towards research (based on our data), we have not answered the question of whether attitudes towards statistics is a significant predictor of attitudes towards research. To answer that question, we would examine the significance column in the Coefficients section of the output. The value on the row with the variable name (stats in our example) is the probability that the null hypothesis is true (i.e., the probability that attitude towards statistics is a not a significant predictor of attitudes towards research). Based on this data, what decision would you make regarding the null hypothesis? What conclusion would you draw? Since 0.003 is less than 0.05, you would reject the null hypothesis and conclude that attitudes towards statistics was a significant predictor of attitudes towards research. Generating Regression Output Using SPSS Scatterplot with Overlay of the Regression Line In order to use linear regression, the variables you are examining must be linearly related. Recall from the SPSS lesson on correlation that you can use a scatterplot to visually assess whether the variables are linearly related. Now we are going to add the regression line to a scatterplot in SPSS. To create a scatterplot with an overlay of the regression line:
Linear Regression Analysis To conduct a linear regression analysis in SPSS:
Review In this lesson, you have interpreted regression output generated by SPSS to examine whether a particular variable is a significant predictor of another variable.
To practice generating correlation output using SPSS, download one of the following attitudes data files:
(Note: If the linked file does not begin downloading when you click the link, rightclick on the link and select save target as or save link as from the menu.)

© 2007 by Melissa Kelly and L. K. Curda. All rights reserved.  Updated on November 21, 2007 