This tutorial shows how to use Pandas
, Matplotlib
, and Seaborn
for scatter plots in Python with examples, codes, and charts.
There are two methods of doing scatter plots in Python. The following shows the core syntax.
- Pandas: df.plot (kind=”scatter”, x=”column_x”, y=”column_y”)
- Seaborn: sns.lmplot (x=”column_x”, y=”column_y”, data=df, fit_reg=True)
Example 1: Use Pandas for scatter plots in Python
We can use Panda to do scatter plots in Python. Pandas
is built on top of Matplotlib
for plotting. That is why in the end, it has the statement of plt.show()
.
plt.show()
is optional for some Python programming environments, such as Jupyter. That is, you do not need plt.show()
and Jupyter will show the plot.
import pandas as pd import matplotlib.pyplot as plt # reed data from Github df=pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/NFL_2020_Combine_simplified_version.csv") # plot using the plot() function in pandas df.plot(kind="scatter",x="Weight",y="Height") plt.show()
The following is the scatter plot.

Example 2: Use Seaborn to add fitted regression lines in Scatter Plots in Python
Similar to Pandas
, Seaborn
also uses Matplotlib
as the underlying plotting package. But, Seaborn
provides a higher-level interface for drawing statistical graphics.
Note that, the nice thing about using Seaborn
is that it can easily add the regression line into the plot. That is, you can add the parameter fit_reg=True
to add the regression line.
import seaborn as sns import pandas as pd import matplotlib.pyplot as plt # reed data from GitHub df=pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/NFL_2020_Combine_simplified_version.csv") # scatter plot using seaborn sns.lmplot(x="Weight",y="Height",data=df,fit_reg=True) plt.show()
The following is the scatter plot with the regression trend line.

Example 3: Use Pandas and Matplotlib for Scatter Plots in Python (with generated data)
If you do not have data in hand, we can also generate data using Python, before doing the actual plotting.
In particular, we need to generate sample data of X and Y. Note that, if you have your own data ready, you can skip this step (as in Example 1).
With data in hand, we can now use Python to do the scatter plot. Note that, the code pd_df.plot(kind="scatter",x="x",y="y")
suggests that the scatter plot is using Pandas
, rather than directly using Matplotlib
.
import numpy as np import pandas as pd import matplotlib.pyplot as plt # mean and standard deviation for x mu_1, sigma_1 = 5, 0.1 x = np.random.normal(mu_1, sigma_1, 100) # mean and standard deviation for y mu_2, sigma_2 = 0, 2 e = np.random.normal(mu_2, sigma_2, 100) y=x*3+e # combine x and y into a dataframe and print it out d = {'x': x,'y':y} pd_df=pd.DataFrame(data=d) print(pd_df) # plot the scatter plot pd_df.plot(kind="scatter",x="x",y="y") plt.show()
The following is the output of print data.
x y 0 5.048436 15.017243 1 4.966731 13.614946 2 5.040685 15.690755 3 4.910718 15.683145 4 5.183769 18.028059 .. ... ... 95 4.896289 15.513315 96 4.919661 15.098267 97 4.906681 16.298011 98 5.032974 16.328086 99 5.266889 12.799115 [100 rows x 2 columns]
The following is the scatter plot output:
