Major Python Packages for Hypothesis Testing

The 2 popular Python packages for hypothesis testing are scipy.stats and statsmodels. The following includes a brief introduction to each of them, along with simple examples.

Introduction of Scipy.stats

Based on scipy.stats official webpage, the module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, and more.

Example 1

The following uses the example of using Scipy stats for an independent t-test.

Suppose that you have men’s and women’s attitudes toward a certain brand and you want to test if their attitudes are significantly different. The following is the python code and output.

import scipy.stats

# create data
men_attitudes=[4,6,7,7,6,7]
women_attitudes=[4,3,4,5,2,1]

# use scipy.stats to conduct independent t-test in Python 
scipy.stats.ttest_ind(men_attitudes, women_attitudes, equal_var=False)
Ttest_indResult(statistic=3.9093501848676255, pvalue=0.003208100523708222)

Example 2

The following uses the example of using Scipy stats for simple linear regression. Below tests the impact of price on purchase intention. Based on the output, we can write the regression statement as follows.

Purchase Intention = 10.14 – 0.69 Price

import numpy as np
from scipy.stats import linregress

# generate the X and Y sample data
Price = np.array([5,6,7,8,9,10])
Purchase_intention= np.array([7,6,5,5,3,4])

# conduct linear regression using scipy.stats
res = linregress(Price, Purchase_intention)

# print out slope and intercept
print(res.slope)
print(res.intercept)
-0.6857142857142857
10.142857142857142

Introduction of Statsmodels

Similar to scipy.stats, statsmodels can also do a wide range of hypothesis testing. We are going to show how you can use statsmodels to do a two-sample t-test.

Example 1

We are going to use the same data shown above. The following is the python code for using Statsmodels to do a two-sample t-test and its output.

import statsmodels

# create data
men_attitudes=[4,6,7,7,6,7]
women_attitudes=[4,3,4,5,2,1]

# use statsmodels to do independent t-test in Python
statsmodels.stats.weightstats.ttest_ind(men_attitudes, women_attitudes,alternative="two-sided",usevar="unequal")
(3.9093501848676255, 0.0032081005237082198, 9.51236031154758)

Example 2

We are going to use the same data shown above. The following is the python code for using Statsmodels to do linear regression and its output.

Different from Scipy.stats, statsmodels requires a column of intercept (i.e., 1s). We can use np.ones() to add such column of 1s into the x matrix.

Purchase Intention = 10.14 – 0.69 Price

import statsmodels.api as sm

# generate the X and Y sample data
Price = np.array([np.ones(6),[5,6,7,8,9,10]])
Price=Price.T
Purchase_intention= np.array([7,6,5,5,3,4])
Purchase_intention=Purchase_intention.T

# conduct linear regression using statsmodels
results = sm.OLS(Purchase_intention,Price)
print(results.fit().summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.823
Model:                            OLS   Adj. R-squared:                  0.779
Method:                 Least Squares   F-statistic:                     18.58
Date:                Tue, 07 Feb 2023   Prob (F-statistic):             0.0125
Time:                        09:47:20   Log-Likelihood:                -4.8537
No. Observations:                   6   AIC:                             13.71
Df Residuals:                       4   BIC:                             13.29
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         10.1429      1.224      8.289      0.001       6.746      13.540
x1            -0.6857      0.159     -4.311      0.013      -1.127      -0.244
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   2.956
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.437
Skew:                          -0.550   Prob(JB):                        0.804
Kurtosis:                       2.266   Cond. No.                         35.2
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.