This tutorial will explain how to conduct an independent t-test in Python. We can use scipy.stats.ttest_ind() and researchpy.ttest() to do the independent t-test. The core syntax is as follows.
Example 1: Use Method 1 for an Independent t-test in Python
In example 1, we are going to test how gender impacts writing scores. That, we would like to test whether males and females differ in terms of writing scores.
The following is the complete Python code. Further, it is followed by the output. From the output, we can see that the p-value is 0.0003. Thus, males and females differ in writing scores. However, we do not know whether males or females have higher scores.
import pandas as pd import scipy.stats # read data from GitHub df = pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/hsbdemo.csv") # create series for males and females separately s1=df['write'][df['female'] == 'male'] s2=df['write'][df['female'] == 'female'] # use scipy to do the test scipy.stats.ttest_ind(s1, s2, equal_var=False)
To know whether males or females have higher writing scores, we need to know the means for males and females respectively.
The following is the Python code to calculate the means. From the output, we can see that females are 54.99, whereas males are 50.12.
# calculate means for males and females separately writing=df[['write','female']] means=writing.groupby('female').describe() print(means)
write count mean std min 25% 50% 75% max female female 109.0 54.990826 8.133715 35.0 50.0 57.0 62.0 67.0 male 91.0 50.120879 10.305161 31.0 41.0 52.0 59.0 67.0
Example 2: Use Method 2 for an independent t-test in Python
Example 2 uses ttest() from researchpy to conduct an independent t-test in Python. We can see that the output has similar results as in Example 1 where scipy is used.
import researchpy as rp import pandas as pd # read data from GitHub df = pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/hsbdemo.csv") # use researchpy for independent t-test in Python summary, results=rp.ttest(group1= df['write'][df['female'] == 'male'], group1_name= "Male", group2= df['write'][df['female'] == 'female'], group2_name= "Female") # print out the results print(summary) print(results)
Variable N Mean SD SE 95% Conf. Interval 0 Male 91.0 50.120879 10.305161 1.080274 47.974726 52.267033 1 Female 109.0 54.990826 8.133715 0.779069 53.446577 56.535075 2 combined 200.0 52.775000 9.478586 0.670237 51.453321 54.096679 Independent t-test results 0 Difference (Male - Female) = -4.8699 1 Degrees of freedom = 198.0000 2 t = -3.7341 3 Two side test p value = 0.0002 4 Difference < 0 p value = 0.0001 5 Difference > 0 p value = 0.9999 6 Cohen's d = -0.5302 7 Hedge's g = -0.5282 8 Glass's delta = -0.4726 9 Pearson's r = 0.2565
Example 3: Use Method 1 for an Independent t-test in Python
Different from the previous two examples, Example 3 will generate data from scratch.
Step 1: Data
Suppose you want to test whether women and men differ in their attitudes toward a brand, and the attitude is measured on a 7-point scale (1= Not like at all, 7 = Like it a lot).
The following is the hypothetical data, one column for men’s attitudes and another one for women’s attitudes toward the brand.
|Men’s Attitudes||Women’s Attitudes|
Step 2: Python code for independent sample t-test
The following is the complete Python code to conduct the independent sample t-test. The Python code is followed by its output.
import scipy.stats # create data men_attitudes=[4,6,7,7,6,7] women_attitudes=[4,3,4,5,2,1] # function to calculate means def Average(lst): return sum(lst) / len(lst) # print out means print("men's attitudes:") print(Average(men_attitudes)) print("Women's attitude:") print(Average(women_attitudes)) # use scipy to conduct independent t-test in Python ttest_results=scipy.stats.ttest_ind(men_attitudes, women_attitudes, equal_var=False) print(ttest_results)
men's attitudes: 6.166666666666667 Women's attitude: 3.1666666666666665 Ttest_indResult(statistic=3.9093501848676255, pvalue=0.003208100523708222)
Step 3: Interpretation of independent sample t-test output
Based on the output shown above, we can see that men have more favorable attitudes than women, 6.17 vs. 3.17. The p-value is also significant, namely 0.003. It means that women and men significantly differ in their attitudes toward the brand.