How to Do Scatter Plots in Python

This tutorial shows how to use Pandas, Matplotlib, and Seaborn for scatter plots in Python with examples, codes, and charts.

There are two methods of doing scatter plots in Python. The following shows the core syntax.

  • Pandas: df.plot (kind=”scatter”, x=”column_x”, y=”column_y”)
  • Seaborn: sns.lmplot (x=”column_x”, y=”column_y”, data=df, fit_reg=True)

Example 1: Use Pandas for scatter plots in Python

We can use Panda to do scatter plots in Python. Pandas is built on top of Matplotlib for plotting. That is why in the end, it has the statement of plt.show().

plt.show() is optional for some Python programming environments, such as Jupyter. That is, you do not need plt.show() and Jupyter will show the plot.

import pandas as pd
import matplotlib.pyplot as plt

# reed data from Github
df=pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/NFL_2020_Combine_simplified_version.csv")

# plot using the plot() function in pandas
df.plot(kind="scatter",x="Weight",y="Height")
plt.show()

The following is the scatter plot.

Scatter Plots in Python using Pandas and Matplotlib
Scatter Plots in Python using Pandas and Matplotlib

Example 2: Use Seaborn to add fitted regression lines in Scatter Plots in Python

Similar to PandasSeaborn also uses Matplotlib as the underlying plotting package. But, Seaborn provides a higher-level interface for drawing statistical graphics.

Note that, the nice thing about using Seaborn is that it can easily add the regression line into the plot. That is, you can add the parameter fit_reg=True to add the regression line.

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# reed data from GitHub
df=pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/NFL_2020_Combine_simplified_version.csv")

# scatter plot using seaborn
sns.lmplot(x="Weight",y="Height",data=df,fit_reg=True)
plt.show()

The following is the scatter plot with the regression trend line.

Scatter Plots with Fitted Linear Regression Line in Python using seaborn
Scatter Plots with Fitted Linear Regression Line in Python using seaborn

Example 3: Use Pandas and Matplotlib for Scatter Plots in Python (with generated data)

If you do not have data in hand, we can also generate data using Python, before doing the actual plotting.

In particular, we need to generate sample data of X and Y. Note that, if you have your own data ready, you can skip this step (as in Example 1).

With data in hand, we can now use Python to do the scatter plot. Note that, the code pd_df.plot(kind="scatter",x="x",y="y") suggests that the scatter plot is using Pandas, rather than directly using Matplotlib.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# mean and standard deviation for x
mu_1, sigma_1 = 5, 0.1 
x = np.random.normal(mu_1, sigma_1, 100)

# mean and standard deviation for y
mu_2, sigma_2 = 0, 2 
e = np.random.normal(mu_2, sigma_2, 100)
y=x*3+e

# combine x and y into a dataframe and print it out
d = {'x': x,'y':y}
pd_df=pd.DataFrame(data=d)
print(pd_df)

# plot the scatter plot
pd_df.plot(kind="scatter",x="x",y="y")
plt.show()

The following is the output of print data.

           x          y
0   5.048436  15.017243
1   4.966731  13.614946
2   5.040685  15.690755
3   4.910718  15.683145
4   5.183769  18.028059
..       ...        ...
95  4.896289  15.513315
96  4.919661  15.098267
97  4.906681  16.298011
98  5.032974  16.328086
99  5.266889  12.799115

[100 rows x 2 columns]

The following is the scatter plot output:

Scatter Plots in Python using Pandas and Matplotlib
Scatter Plots in Python using Pandas and Matplotlib

Further Reading