This tutorial explains the difference between scatter plots and line charts in data visualization. I will use actual data and Python code to illustrate the nuanced difference between them.
Data is pulled from Github. It includes keywords of Peloton and Covid as the search queries in Google Trends from early 2020 to early 2022. For more information, please refer to my other post.
import pandas as pd
Peloton_Covid_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/Covid_and_Peloton.csv")
print(Peloton_Covid_data)
Week Peloton Covid 0 2/23/2020 26 2 1 3/1/2020 24 5 2 3/8/2020 24 20 3 3/15/2020 67 53 4 3/22/2020 69 62 .. ... ... ... 106 3/6/2022 32 16 107 3/13/2022 29 16 108 3/20/2022 27 14 109 3/27/2022 25 14 110 4/3/2022 25 14
Use Seaborn sns.lmplot()
to Plot a Scatter Plot
The following is the complete Python code to plot the scatter plot. As we can see, there is a positive relationship between covid and Peloton. It makes sense since Covid made people scared of going to the gym and prefer to work out at home.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Peloton_Covid_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/Covid_and_Peloton.csv")
Peloton_Covid_data=Peloton_Covid_data.set_index('Week')
sns.lmplot(x="Covid",y="Peloton",data=Peloton_Covid_data,fit_reg=True)
plt.xlabel('Keyword of Covid', fontsize=18)
plt.ylabel('Keyword of Peloton', fontsize=18)
plt.title("Relationship of Covid and Peloton based on Google Trends",fontdict = {'fontsize' : 20})
plt.show()
Use Pandas pd.plot()
to Plot a Line Chart
We can also use line charts to show the relationship between Peloton and Covid. By comparing scatter plots and line charts, you will have a better idea of the difference and connection between these two.
Note that, the line chart uses the function pd.plot()
(i.e., Peloton_Covid_data.plot()
in the example), which is from Pandas. Pandas builds on Matplotlib on plotting.
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [15, 10]
import seaborn as sns
Peloton_Covid_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/Covid_and_Peloton.csv")
Peloton_Covid_data=Peloton_Covid_data.set_index('Week')
Peloton_Covid_data.plot()
plt.legend(fontsize=20)
plt.xlabel('Week')
plt.title("Relationship of Covid and Peloton based on Google Trends",fontdict = {'fontsize' : 20})
plt.show()
From the line chart above, we can see that the two lines of Covid and Peloton went up and down almost simultaneously, suggesting that potentially they are somewhat related. Of course, correlation is not causation, and thus we should treat such a finding with caution.