{"id":196,"date":"2021-02-26T11:47:00","date_gmt":"2021-02-26T11:47:00","guid":{"rendered":"http:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/ziyang-yang\/?p=196"},"modified":"2021-04-30T12:17:57","modified_gmt":"2021-04-30T12:17:57","slug":"statistics-in-social-science-1-how-to-choose-an-appropriate-statistical-test","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/ziyang-yang\/2021\/02\/26\/statistics-in-social-science-1-how-to-choose-an-appropriate-statistical-test\/","title":{"rendered":"Statistics in Social science (1): How to choose an appropriate correlation test?"},"content":{"rendered":"\n

This blog will give you the idea of choosing an appropriate statistical correlation test in social science area.<\/p>\n\n\n\n

Recently I am talking with friends who are studying in the social science area, and they are confused about how to use statistical test appropriately. So I decided to write a series of blogs talking about the common statistical method in social science and how to explain the result.<\/p>\n\n\n\n

In social science, it is common to calculate the association between two variables. For example, you may want to test the relationship between smoking and lung cancer, consumption and income, etc. The test method could be summarized in the table below under different variables and different distributions. In this blog, we only measure two continuous variables.<\/p>\n\n\n\n

Two continuous variables<\/strong><\/td><\/td><\/tr>
Normal distributed?<\/td>Pearson Correlation coefficient<\/a><\/td><\/tr>
Not normal distributed<\/td>Spearman Correlation coefficient<\/a><\/td><\/tr>
Two categorical variables<\/strong><\/td>Fisher exact test<\/a>; Chi-square test<\/a><\/td><\/tr>
one continuous variable and one continuous variable<\/strong><\/td>Boxplot<\/a>; Biserial rank correlation<\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n

Analysis of Correlation<\/h1>\n\n\n\n

Drawing the plot – a direct way<\/h2>\n\n\n\n

The first step of measuring the correlation is drawing the plot:<\/p>\n\n\n\n

\"\"<\/figure><\/div>\n\n\n\n

Assume we have continuous data y1, x1, x2, x3 and x4. From the plot above, we could see the correlation between y1 and x1 is a positive linear correlation; y2 and x2 seem no apparent linear correlation and non-linear correlation; y1 and x3 have negative linear correlation and y1 and x4 have a non-linear correlation.<\/p>\n\n\n\n

Calculating the correlation coefficient – a mathematical way<\/h2>\n\n\n\n

The correlation coefficient is a statistic measuring the strength of the linear correlation. Usually, there are two ways: the Pearson correlation coefficient and the Spearman correlation coefficient. Although you may want to report the P-value of the correlation test, it is necessary to report the coefficient at the same time.<\/p>\n\n\n\n

\"Correlation<\/figure><\/div>\n\n\n\n

Pearson correlation coefficient<\/h3>\n\n\n\n

Pearson correlation coefficient could be calculated in R by cor() function. It is the most commonly used statistics; However, it assumes normal or bell-shaped distribution for continuous variable. We didn’t check the assumption here but it has to be done in real data analysis.<\/p>\n\n\n\n

The correlation coefficient ranges from -1 to 1. The sign measures the direction of correlation: positive refers to the positive relationship while negative value refers to the negative relationship. The absolute value measures the strength of the correlation. Usually, the absolute value |value|>0.7 could be considered as a strong correlation.<\/p>\n\n\n\n

From the example we could see, y1 and x1 have a strong positive correlation; the correlation coefficient between y1 and x2 is really small only 0.016; y1 and x3 have a strong negative correlation; while y1 and x4 have a mild correlation. Note: Here we could only say they have a linear correlation since Pearson ignore the non-linear relationship.<\/p>\n\n\n\n

> cor(y1,x1)\n[1] 0.8708785\n> cor(y1,x2)\n[1] 0.01631352\n> cor(y1,x3)\n[1] -0.9145617\n> cor(y1,x4)\n[1] 0.405236<\/code><\/pre>\n\n\n\n

Spearman Rank correlation coefficient<\/h3>\n\n\n\n

Unlike Pearson’s method, Spearman’s method does not assume the distribution of the variables. Usually, we got a similar result to Pearson (as the result we see below). The difference between the Spearman rank correlation and Pearson rank correlation is that Pearson only takes account into the linear relationship but discards non-linear relationship. However, the Spearman test considers both linear and non-linear relationship.<\/p>\n\n\n\n

> cor(y1,x1,method = 'spearman')\n[1] 0.8520012\n> cor(y1,x2,method = 'spearman')\n[1] 0.01749775\n> cor(y1,x3,method = 'spearman')\n[1] -0.9017702\n> cor(y1,x4,method = 'spearman')\n[1] 0.4252865<\/code><\/pre>\n\n\n\n

How we report?<\/h2>\n\n\n\n
  1. Firstly, draw the plot to see the relationship.<\/li>
  2. If you want a statistical test:<\/li><\/ol>\n\n\n\n