Course syllabus Course resources
Course discussions
Course schedule
Send message to instructor
This week's lesson
Send message to course mentor
Class list
Home
Grade book

 

Using SPSS to Measure Association
 

In the prior SPSS lessons (on t-tests and ANOVA), you compared the means for two or more groups to determine whether the groups were significantly different in terms of a particular factor. In this SPSS lesson, you will be comparing a set of scores for one group to determine whether the scores are related to one another. The test that you will use is correlation which measures the relationship or association between variables. You will apply information from the lesson on correlation and regression to use SPSS to measure the strength and direction of the relationship between two variables. Upon completing this lesson, you should be able to:

  • Describe the relationship between two variables based on SPSS output
  • Determine the coefficient of determination based on SPSS output
  • Conduct a correlation test in SPSS.
  • Generate a scatterplot using SPSS.

To practice generating correlation output using SPSS, download one of the following data files:

IQ and SAT data SPSS (SAV) format
SPSS (SAV) format (in alternative download location)
Excel (XLS) format
Score data SPSS (SAV) format
SPSS (SAV) format (in alternative download location)
Excel (XLS) format
(Note: If the linked file does not begin downloading when you click the link, right-click on the link and select save target as or save link as from the menu.)

Interpreting Correlation Results

Shown below is sample correlation output for the correlation between a set of IQ scores and SAT scores.

The top portion of the output provides us with an opportunity to review a few things you learned in prior weeks. What measure(s) of central tendency is(are) displayed for each set of data? What measure(s) of variability is(are) displayed? Recall that measures of central tendency are mean, median, and mode. The mean of the IQ data set is 112.4 and the mean of the SAT data set is 1042. Recall that measures of variability include range, interquartile range, variance, and standard deviation. The standard deviation of the IQ data is 16.319, and the standard deviation of the SAT data is 207.654. You could calculate the variance by squaring the standard deviation.

What else does the descriptives output tell us about the data used for the correlation test? It tells us which variables are being tested in the correlation. In this example, the variables being correlated are IQ and SAT; the name of each variable is listed in the left column of the output. The output also tells us the number of scores being compared. In this case, we have 5 sets of scores (i.e., an IQ score and an SAT score for each person).

Correlation Matrixes

Whereas the descriptives output tells us about each set of data (i.e., the mean, standard deviation, and number of values for each variable), the correlation matrix in the output tells us how the IQ and SAT data are related. Based on what you learned in the lesson on correlation, are IQ and SAT score related according to the output? If so, describe the relationship.

In this example, the statistic used to determine if and how IQ and SAT are related was the Pearson correlation coefficient. The way that you would determine the Pearson r in the above output is to look at one of the variables listed in a row, and then read the value in the column that corresponds to the other variable, as indicated by the table below.

  Variable 1 Name Variable 2 Name
Variable 1 Name   Pearson r
Variable 2 Name Pearson r  

 

For the output that was given, the Pearson correlation coefficient between IQ and SAT is r = 0.978. Recall that the value of r allows you to determine the strength and direction of a relationship between two variables. Hence, there is a strong, positive relationship between IQ and SAT score. What does a positive relationship mean? Recall that if two variables have a positive relationship, then as the value of one variable increases, the value of the other variable also increases. Remember, though, that even such a strong correlation as 0.98 does not indicate that a certain IQ causes a certain SAT score. The correlation merely tells us that there is an association between the two variables.

Is the correlation between IQ and SAT statistically significant or is it more likely to be due to chance? To answer this question, you would refer to the row labeled sig. The value in this row is the probability of the null hypothesis being true. For a two-tailed correlation test, the probability of the null hypothesis (i.e., that there is no relationship between the variables) being true is 0.004. Since this probability is less than our preset level of significance (typically 0.01 or 0.05), we can reject the null hypothesis and conclude that the relationship between IQ and SAT is statistically significant.

In addition to using the sig. value to determine whether to reject or retain the null hypothesis, there is also another visual indication of statistical significance on the output. By default, SPSS "flags" (marks) significant relationships with asterisks.

If the sig value is below the present criterion of significance (typically 0.01 or 0.004), SPSS will put asterisks next the correlation value. The note below the correlation matrix tells you the level at which the relationship was significance. In this example, the relationship was significant at the 0.01 level. Notice that the output also indicates whether the correlation was examined as a two-tailed or one-tailed test.

Now suppose that you have multiple variables and you want to determine whether any of them are related to one another. By definition, you can only correlate two variables at a time using Pearson's r. However, you can include as many variables as you are interested in when you run a correlation test using SPSS. The program will automatically compare them two at a time and show you the all of the results in a single correlation matrix.

The data used to generate the same output shown here contained several variables corresponding to various test scores and other measures: Vocab, Comp, Algebra, Analytic, and Reason.

The correlation matrix displays the correlations for each pair of variables:

  • Vocab and Comp (r = .698)
  • Vocab and Algebra (r = .382)
  • Vocab and Analytic (r = .272)
  • Vocab and Reason (r = .300)
  • Comp and Algebra (r = .483)
  • Comp and Analytic (r = .510)
  • Comp and Reason (r = .594)
  • Algebra and Analytic (r = .554)
  • Algebra and Reason (r = .516)

Based on the output, which of these relationships is(are) statistically significant?

Scatterplots

As you learned in the lesson on correlation and regression, the use of Pearson's r for correlations requires that the variables have a linear relationship (e.g., the value of Variable A increases as the value of Variable B increases or the value of Variable A decreases as the value of Variable B increases). One way to determine whether the values change linearly is to visually examine a scatterplot.

Shown below are two scatterplots: one for the IQ/SAT data and one for the analytical skill and reasoning ability variables in the scores data.

 

The data in the plots appear to be linearly related. As IQ values increase, SAT values generally increase. As analytic ability (analytic) increases, reasoning ability (reason) generally increases.

It is always a good idea to examine a scatterplot of the variables you want to correlate before conducting the correlation test. If the variables are not linearly related (either positively or negatively), you should use a statistic other than Pearson's r. You will learn about other correlation tests in other statistics courses.

 

Coefficient of Determination

In addition to being able to describe the direction of the relationship between IQ and SAT from the scatterplot and being able to describe the strength and direction of the relationship from the correlation output, you could also determine the proportion of variability in SAT scores that can be explained by IQ. In other words, you could determine relatively how much of the variability (spread) in SAT scores can be attributed to IQ. To make this determination, you would calculate the coefficient of determination, which is . In the example given, the coefficient of determination is .9564. Since the coefficient of determination is close to 1, you would conclude that IQ explains most (98%) of the variation in SAT scores.

 


Generating Correlation Output

Generating a Scatterplot

It is a good idea to use a scatterplot to visually assess the relationship between the two variables that you want to correlate prior to conducting the correlation test. To create a scatterplot in SPSS:

  1. Open the Graphs menu and select scatterplot from the interactive submenu.
  2. In the Create Scatterplot dialog box:
    • Drag the name of the variable you want to plot on the y-axis to the vertical axis. In this example, the IQ variable will be plotted on the y-axis.
    • Drag the name of the variable you want to plot on the x-axis to the horizontal axis. In this example, the SAT variable will be plotted on the x-axis.
    • Note: A scatterplot is a bivariate plot; you can only plot two variables in a single scatterplot.
    • Click the OK button to create the scatterplot.
  3.  

 

 

Conducting a Correlation Test

SPSS will generate a correlation matrix as part of the output for a correlation test. To conduct a correlation test using Pearson's r in SPSS, you would use a bivariate correlation. It is a bivariate test because you are correlating two variables. To conduct the correlation test:

  1. Open the analyze menu, and select bivariate from the correlate submenu.
  2. In the Bivariate Correlations dialog box:
    • Add the variables that you want to correlate to the variables box. In this example, IQ and SAT are selected. (Note: You must select at least two variables for the correlation. When a data set contains more than two variables, you may add as many as desired to the variables box to be correlated in pairs.)
    • Select Pearson as the correlation coefficient.
    • Specify whether the test is one- or two-tailed by selecting the corresponding option for the test of significance.
    • Check the flag significant correlations box to have SPSS mark the significant relations with asterisks.
    • Click the options button.
  3. In the Bivariate Correlations: Options dialog box:
    • Check the box to include means and standard deviations in the output.
    • Click the continue button to return to the Bivariate Correlations dialog box.
  4. Click the OK button to generate the correlation output.

 

 


Review

Now that you know how to interpret and generate correlation output, let's review what you have learned.

  • Given a sample SPSS correlation matrix:
    • You can describe the magnitude and direction of the relationship between two variables.
    • You can calculate the coefficient of determination for two variables.
    • You can determine whether the relationship between two variables is statistically significant.
  • Given a scatterplot generated by SPSS, you can determine:
    • Whether the variables are linearly related
    • The direction of a linear relationship between variables
  • You can create a scatterplot for specified variables using SPSS.
  • You can conduct a correlation test to examine the relationship between two variables.

To practice generating correlation output using SPSS, download one of the following data files:

IQ and SAT data SPSS (SAV) format
SPSS (SAV) format (in alternative download location)
Excel (XLS) format
Score data SPSS (SAV) format
SPSS (SAV) format (in alternative download location)
Excel (XLS) format

(Note: If the linked file does not begin downloading when you click the link, right-click on the link and select save target as or save link as from the menu.)

 

 

© 2007 by Melissa Kelly and L. K. Curda. All rights reserved. Updated on November 21, 2007