The 2 popular Python packages for hypothesis testing are scipy.stats and statsmodels. The following includes a brief introduction to each of them, along with simple examples.
Introduction of Scipy.stats
Based on scipy.stats official webpage, the module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, and more.
Example 1
The following uses the example of using Scipy stats for an independent t-test.
Suppose that you have men’s and women’s attitudes toward a certain brand and you want to test if their attitudes are significantly different. The following is the python code and output.
import scipy.stats
# create data
men_attitudes=[4,6,7,7,6,7]
women_attitudes=[4,3,4,5,2,1]
# use scipy.stats to conduct independent t-test in Python
scipy.stats.ttest_ind(men_attitudes, women_attitudes, equal_var=False)
Ttest_indResult(statistic=3.9093501848676255, pvalue=0.003208100523708222)
Example 2
The following uses the example of using Scipy stats for simple linear regression. Below tests the impact of price on purchase intention. Based on the output, we can write the regression statement as follows.
Purchase Intention = 10.14 – 0.69 Price
import numpy as np
from scipy.stats import linregress
# generate the X and Y sample data
Price = np.array([5,6,7,8,9,10])
Purchase_intention= np.array([7,6,5,5,3,4])
# conduct linear regression using scipy.stats
res = linregress(Price, Purchase_intention)
# print out slope and intercept
print(res.slope)
print(res.intercept)
-0.6857142857142857 10.142857142857142
Introduction of Statsmodels
Similar to scipy.stats, statsmodels can also do a wide range of hypothesis testing. We are going to show how you can use statsmodels to do a two-sample t-test.
Example 1
We are going to use the same data shown above. The following is the python code for using Statsmodels to do a two-sample t-test and its output.
import statsmodels
# create data
men_attitudes=[4,6,7,7,6,7]
women_attitudes=[4,3,4,5,2,1]
# use statsmodels to do independent t-test in Python
statsmodels.stats.weightstats.ttest_ind(men_attitudes, women_attitudes,alternative="two-sided",usevar="unequal")
(3.9093501848676255, 0.0032081005237082198, 9.51236031154758)
Example 2
We are going to use the same data shown above. The following is the python code for using Statsmodels to do linear regression and its output.
Different from Scipy.stats, statsmodels requires a column of intercept (i.e., 1s). We can use np.ones()
to add such column of 1s into the x matrix.
Purchase Intention = 10.14 – 0.69 Price
import statsmodels.api as sm
# generate the X and Y sample data
Price = np.array([np.ones(6),[5,6,7,8,9,10]])
Price=Price.T
Purchase_intention= np.array([7,6,5,5,3,4])
Purchase_intention=Purchase_intention.T
# conduct linear regression using statsmodels
results = sm.OLS(Purchase_intention,Price)
print(results.fit().summary())
OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.823 Model: OLS Adj. R-squared: 0.779 Method: Least Squares F-statistic: 18.58 Date: Tue, 07 Feb 2023 Prob (F-statistic): 0.0125 Time: 09:47:20 Log-Likelihood: -4.8537 No. Observations: 6 AIC: 13.71 Df Residuals: 4 BIC: 13.29 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 10.1429 1.224 8.289 0.001 6.746 13.540 x1 -0.6857 0.159 -4.311 0.013 -1.127 -0.244 ============================================================================== Omnibus: nan Durbin-Watson: 2.956 Prob(Omnibus): nan Jarque-Bera (JB): 0.437 Skew: -0.550 Prob(JB): 0.804 Kurtosis: 2.266 Cond. No. 35.2 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.