# Use t-test to Analyze Financial Well-being Data

## Introduction

Since we have covered the theoretical basics of t-test (see the tutorial here), it would be interesting to showcase how we can use t-test for a real-world application. In particular, we are going to use Financial well-being survey data to show how we can use independent t-test via Python. In particular, we could test how gender differs in subjective well-being. You can download the CSV file using this link.

## Data Explanation

We will combine all these 3 items to form an index of subjective well-being (SWB).

• SWB_1: I am satisfied with my life.
• SWB_2: I am optimistic about my future.
• SWB_3: If I work hard today, I will be more successful in the future.

Before proceeding, we need to clean the data a bit because there are missing values in the responses. In particular, some responses are “Response not written to database” (-4) or “Refused” (-1).

```"SWB_1":{
-4: "Response not written to database",
-1: "Refused",
1: "1 Strongly disagree",
2: "2",
3: "3",
4: "4",
5: "5",
6: "6",
7: "7 Strongly agree"},```

## Data Cleaning

The following code is to check whether there are such missing values. If so, we need to remove them before conducting the t-test.

```import pandas as pd
SWB_1_count=df["SWB_1"].value_counts()
print(SWB_1_count)
SWB_2_count=df["SWB_2"].value_counts()
print(SWB_2_count)
SWB_3_count=df["SWB_3"].value_counts()
print(SWB_3_count)```

Below is the output, we can see that indeed, there are some missing values.

``` 6    1926
7    1535
5    1458
4     803
3     335
1     154
2     152
-1      30
-4       1
Name: SWB_1, dtype: int64
6    1846
7    1642
5    1399
4     839
3     335
2     144
1     132
-1      56
-4       1
Name: SWB_2, dtype: int64
7    1991
6    1653
5    1251
4     862
3     267
1     167
2     138
-1      64
-4       1
Name: SWB_3, dtype: int64```

The following code is to remove them and then print out to check whether the removal is successful. As we can see the removal is a success.

```print("after del")
rslt_df = df.loc[df['SWB_1']>=1]
rslt_df = rslt_df.loc[df['SWB_2'] >=1]
rslt_df = rslt_df.loc[df['SWB_3'] >=1]
SWB_1_count=rslt_df["SWB_1"].value_counts()
print(SWB_1_count)
SWB_2_count=rslt_df["SWB_2"].value_counts()
print(SWB_2_count)
SWB_3_count=rslt_df["SWB_3"].value_counts()
print(SWB_3_count)```
```after del
6    1912
7    1515
5    1450
4     799
3     333
1     153
2     152
Name: SWB_1, dtype: int64
6    1841
7    1632
5    1394
4     838
3     333
2     144
1     132
Name: SWB_2, dtype: int64
7    1984
6    1649
5    1250
4     860
3     266
1     167
2     138
Name: SWB_3, dtype: int64```

The following is the code to form a new column called “Combined_SWB” and the output of “print(rslt_df[“Combined_SWB”]).”

```column_names = ['SWB_1', 'SWB_2', 'SWB_3']
df['Fruit Total']= df[column_names].sum(axis=1)
rslt_df["Combined_SWB"]=rslt_df[column_names].sum(axis=1)
print(rslt_df["Combined_SWB"])```
```0       16
1       18
2       11
3       18
4       12
..
6389    20
6390    21
6391    17
6392    15
6393    14
Name: Combined_SWB, Length: 6314, dtype: int64```

We also need to check whether there are missing values in X, namely the gender. The following is the coding of gender in the survey.

```"PPGENDER":{
1: "Male",
2: "Female"}```

gender_count=rslt_df[“PPGENDER”].value_counts()print(gender_count)

The following is the output, which shows that there are no missing values.

```1    3328
2    2986
Name: PPGENDER, dtype: int64```

The following is the key code for t-test.

```data_men = rslt_df[rslt_df['PPGENDER']==1]
data_women = rslt_df[rslt_df['PPGENDER']==2]
print("Men's SWB:")
print(data_men["Combined_SWB"].mean())
print("\n")
print("Women's SWB:")
print(data_women["Combined_SWB"].mean())
print("\n")
print("t-test results:")
ttest_results=scipy.stats.ttest_ind(data_men["Combined_SWB"], data_women["Combined_SWB"], equal_var=False)
print(ttest_results)```

The following is the output.

```Men's SWB:
16.341346153846153

Women's SWB:
16.248827863362358

t-test results:
Ttest_indResult(statistic=1.0023437227400076, pvalue=0.3162168765314645)```

Based on the p-value, we can see that the difference is not significant. The means are also really very close to each other, SWB men = 16.34 versus SWB women = 16.25, suggesting that men and women do not really differ in terms of subjective well-being. We can also plot the means using bar chart.

```sns.barplot(x='PPGENDER', y="Combined_SWB", data=rslt_df)
plt.xlabel('Gender', fontsize=18)
plt.ylabel('SWB', fontsize=18)
plt.show()```