This tutorial explains when to use bar charts versus line charts in data visualization. I will use examples, including data, Python code, and actual charts to illustrate the difference.
When Bar Charts are Better than Line Charts
You can use bar charts when you can see gaps on X-axis. I will illustrate this principle using examples of one Y variable and 3 Y variables.
One Y Variable
import numpy as np
import pandas as pd
x_simple=np.linspace(0, 20, 10)
y_simple=x_simple*x_simple
import matplotlib.pyplot as plt
plt.bar(x_simple, y_simple)
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
If you change x_simple=np.linspace(0, 20, 10)
to x_simple=np.linspace(0, 20, 100)
, it looks not very elegant to use bar charts anymore, as it becomes very crowded.
import numpy as np
import pandas as pd
# The following code has been changed.
x_simple=np.linspace(0, 20, 100)
y_simple=x_simple*x_simple
import matplotlib.pyplot as plt
plt.bar(x_simple, y_simple)
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
Three Y Variables
Bar charts can be used when there are more 2 or 3 Y variables, as long as the X levels are limited. The following shows the example.
import pandas as pd
import matplotlib.pyplot as plt
MSFT_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/data_MSFT_T.csv")
# The following code limits the the row of data
MSFT_data_partial=MSFT_data.loc[:2,]
print(MSFT_data_partial)
MSFT_data_partial.plot(x='Quarter', kind='bar', stacked=False,)
plt.ylim([0, 5500])
plt.show()
Quarter RD Expenses Sales and Marketing General Admin Expenses 0 2017Q1 3355 3879 1202 1 2017Q2 3514 4356 1355 2 2017Q3 3574 3812 1166
However, when X-axis has too many levels, bars charts are not a good choice, as they would seem a bit too crowded.
import pandas as pd
import matplotlib.pyplot as plt
MSFT_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/data_MSFT_T.csv")
print(MSFT_data)
MSFT_data.plot(x='Quarter', kind='bar', stacked=False,)
plt.show()
Quarter RD Expenses Sales and Marketing General Admin Expenses 0 2017Q1 3355 3879 1202 1 2017Q2 3514 4356 1355 2 2017Q3 3574 3812 1166 3 2017Q4 3504 4562 1109 4 2018Q1 3715 4335 1208 5 2018Q2 3933 4760 1271 6 2018Q3 3977 4098 1149 7 2018Q4 4070 4588 1132 8 2019Q1 4316 4565 1179 9 2019Q2 4513 4962 1425 10 2019Q3 4565 4337 1061 11 2019Q4 4603 4933 1121 12 2020Q1 4887 4911 1273 13 2020Q2 5214 5417 1656 14 2020Q3 4926 4231 1119 15 2020Q4 4899 4947 1139 16 2021Q1 5204 5082 1327 17 2021Q2 5687 5857 1522 18 2021Q3 5599 4547 1287 19 2021Q4 5758 5379 1384
When Line Charts are Better than Bar Charts
Opposite to bar charts, you should use line charts when there are a lot of levels for X, regardless of the number of Y variables.
One Y Variable
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x_simple=np.linspace(0, 20, 100)
y_simple=x_simple*x_simple
plt.plot(x_simple, y_simple)
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
The following is the comparison between line charts and bar charts for the same dataset.
Three Y Variables
The following code and charts show the difference between bar charts and line charts.
import pandas as pd
import matplotlib.pyplot as plt
MSFT_data=pd.read_csv("https://raw.githubusercontent.com/TidyPython/data_visualization/main/data_MSFT_T.csv")
MSFT_data=MSFT_data.set_index('Quarter')
MSFT_data.plot()
plt.gca().xaxis.set_major_locator(plt.MultipleLocator(3))
plt.xlabel('Quarter')
plt.show()
The following is the comparison between line charts and bar charts for the same dataset.