Independent t-test in Python (3 Examples)

This tutorial will explain how to conduct an independent t-test in Python. We can use scipy.stats.ttest_ind() and researchpy.ttest() to do the independent t-test. The core syntax is as follows.

Method 1:

scipy.stats.ttest_ind(list_1, list_2)

Method 2:

rp.ttest(pandas_series_1, pandas_series_2)

Example 1: Use Method 1 for an Independent t-test in Python

In example 1, we are going to test how gender impacts writing scores. That, we would like to test whether males and females differ in terms of writing scores.

The following is the complete Python code. Further, it is followed by the output. From the output, we can see that the p-value is 0.0003. Thus, males and females differ in writing scores. However, we do not know whether males or females have higher scores.

import pandas as pd
import scipy.stats

# read data from GitHub
df = pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/hsbdemo.csv")

# create series for males and females separately 
s1=df['write'][df['female'] == 'male']
s2=df['write'][df['female'] == 'female']

# use scipy to do the test
scipy.stats.ttest_ind(s1, s2, equal_var=False)
Ttest_indResult(statistic=-3.6564080478875276, pvalue=0.000340884935942662)

To know whether males or females have higher writing scores, we need to know the means for males and females respectively.

The following is the Python code to calculate the means. From the output, we can see that females are 54.99, whereas males are 50.12.

# calculate means for males and females separately 
writing=df[['write','female']]
means=writing.groupby('female').describe()
print(means)
        write                                                    
        count       mean        std   min   25%   50%   75%   max
female                                                           
female  109.0  54.990826   8.133715  35.0  50.0  57.0  62.0  67.0
male     91.0  50.120879  10.305161  31.0  41.0  52.0  59.0  67.0

Example 2: Use Method 2 for an independent t-test in Python

Example 2 uses ttest() from researchpy to conduct an independent t-test in Python. We can see that the output has similar results as in Example 1 where scipy is used.

import researchpy as rp
import pandas as pd
# read data from GitHub
df = pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/hsbdemo.csv")

# use researchpy for independent t-test in Python
summary, results=rp.ttest(group1= df['write'][df['female'] == 'male'], group1_name= "Male",
 group2= df['write'][df['female'] == 'female'], group2_name= "Female")

# print out the results 
print(summary)
print(results)
   Variable      N       Mean         SD        SE  95% Conf.   Interval
0      Male   91.0  50.120879  10.305161  1.080274  47.974726  52.267033
1    Female  109.0  54.990826   8.133715  0.779069  53.446577  56.535075
2  combined  200.0  52.775000   9.478586  0.670237  51.453321  54.096679

              Independent t-test   results
0  Difference (Male - Female) =    -4.8699
1          Degrees of freedom =   198.0000
2                           t =    -3.7341
3       Two side test p value =     0.0002
4      Difference < 0 p value =     0.0001
5      Difference > 0 p value =     0.9999
6                   Cohen's d =    -0.5302
7                   Hedge's g =    -0.5282
8               Glass's delta =    -0.4726
9                 Pearson's r =     0.2565

Example 3: Use Method 1 for an Independent t-test in Python

Different from the previous two examples, Example 3 will generate data from scratch.

Step 1: Data

Suppose you want to test whether women and men differ in their attitudes toward a brand, and the attitude is measured on a 7-point scale (1= Not like at all, 7 = Like it a lot).

The following is the hypothetical data, one column for men’s attitudes and another one for women’s attitudes toward the brand.

Men’s AttitudesWomen’s Attitudes
44
63
74
75
62
71
Data for independent sample t-test in Python

Step 2: Python code for independent sample t-test

The following is the complete Python code to conduct the independent sample t-test. The Python code is followed by its output.

import scipy.stats
# create data
men_attitudes=[4,6,7,7,6,7]
women_attitudes=[4,3,4,5,2,1]

# function to calculate means 
def Average(lst):
    return sum(lst) / len(lst)

# print out means 
print("men's attitudes:")
print(Average(men_attitudes))

print("Women's attitude:")
print(Average(women_attitudes))

# use scipy to conduct independent t-test in Python 
ttest_results=scipy.stats.ttest_ind(men_attitudes, women_attitudes, equal_var=False)
print(ttest_results)
men's attitudes:
6.166666666666667

Women's attitude:
3.1666666666666665

Ttest_indResult(statistic=3.9093501848676255, pvalue=0.003208100523708222)

Step 3: Interpretation of independent sample t-test output

Based on the output shown above, we can see that men have more favorable attitudes than women, 6.17 vs. 3.17. The p-value is also significant, namely 0.003. It means that women and men significantly differ in their attitudes toward the brand.


Further Reading

Leave a Comment