Difference between Scatter Plots and Line Charts in Python

This tutorial explains the difference between scatter plots and line charts in data visualization. I will use actual data and Python code to illustrate the nuanced difference between them.

Data is pulled from Github. It includes keywords of Peloton and Covid as the search queries in Google Trends from early 2020 to early 2022. For more information, please refer to my other post.

import pandas as pd
Peloton_Covid_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/Covid_and_Peloton.csv")
print(Peloton_Covid_data)
          Week  Peloton  Covid
0    2/23/2020       26      2
1     3/1/2020       24      5
2     3/8/2020       24     20
3    3/15/2020       67     53
4    3/22/2020       69     62
..         ...      ...    ...
106   3/6/2022       32     16
107  3/13/2022       29     16
108  3/20/2022       27     14
109  3/27/2022       25     14
110   4/3/2022       25     14

Use Seaborn sns.lmplot() to Plot a Scatter Plot

The following is the complete Python code to plot the scatter plot. As we can see, there is a positive relationship between covid and Peloton. It makes sense since Covid made people scared of going to the gym and prefer to work out at home.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Peloton_Covid_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/Covid_and_Peloton.csv")
Peloton_Covid_data=Peloton_Covid_data.set_index('Week')
sns.lmplot(x="Covid",y="Peloton",data=Peloton_Covid_data,fit_reg=True)
plt.xlabel('Keyword of Covid', fontsize=18)
plt.ylabel('Keyword of Peloton', fontsize=18)
plt.title("Relationship of Covid and Peloton based on Google Trends",fontdict = {'fontsize' : 20})
plt.show()
Relationship of Covid and Peloton based on Google Trends

Use Pandas pd.plot() to Plot a Line Chart

We can also use line charts to show the relationship between Peloton and Covid. By comparing scatter plots and line charts, you will have a better idea of the difference and connection between these two.

Note that, the line chart uses the function pd.plot() (i.e., Peloton_Covid_data.plot() in the example), which is from Pandas. Pandas builds on Matplotlib on plotting.

import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [15, 10]
import seaborn as sns
Peloton_Covid_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/Covid_and_Peloton.csv")
Peloton_Covid_data=Peloton_Covid_data.set_index('Week')
Peloton_Covid_data.plot()
plt.legend(fontsize=20)

plt.xlabel('Week')
plt.title("Relationship of Covid and Peloton based on Google Trends",fontdict = {'fontsize' : 20})
plt.show()
Line Charts in Python

From the line chart above, we can see that the two lines of Covid and Peloton went up and down almost simultaneously, suggesting that potentially they are somewhat related. Of course, correlation is not causation, and thus we should treat such a finding with caution.