This tutorial shows how to use `Pandas`

, `Matplotlib`

, and `Seaborn`

for scatter plots in Python with examples, codes, and charts.

There are two methods of doing scatter plots in Python. The following shows the core syntax.

**Pandas: df.plot (kind=”scatter”, x=”column_x”, y=”column_y”)****Seaborn: sns.lmplot (x=”column_x”, y=”column_y”, data=df, fit_reg=True)**

## Example 1: Use Pandas for scatter plots in Python

We can use Panda to do scatter plots in Python. `Pandas`

is built on top of `Matplotlib`

for plotting. That is why in the end, it has the statement of `plt.show()`

.

`plt.show()`

is optional for some Python programming environments, such as Jupyter. That is, you do not need `plt.show()`

and Jupyter will show the plot.

import pandas as pd import matplotlib.pyplot as plt # reed data from Github df=pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/NFL_2020_Combine_simplified_version.csv") # plot using the plot() function in pandas df.plot(kind="scatter",x="Weight",y="Height") plt.show()

The following is the scatter plot.

## Example 2: Use Seaborn to add fitted regression lines in Scatter Plots in Python

Similar to `Pandas`

, `Seaborn`

also uses `Matplotlib`

as the underlying plotting package. But, `Seaborn`

provides a higher-level interface for drawing statistical graphics.

Note that, the nice thing about using `Seaborn`

is that it can easily add the regression line into the plot. That is, you can add the parameter `fit_reg=True`

to add the regression line.

import seaborn as sns import pandas as pd import matplotlib.pyplot as plt # reed data from GitHub df=pd.read_csv("https://raw.githubusercontent.com/TidyPython/SPSS/main/NFL_2020_Combine_simplified_version.csv") # scatter plot using seaborn sns.lmplot(x="Weight",y="Height",data=df,fit_reg=True) plt.show()

The following is the scatter plot with the regression trend line.

## Example 3: Use Pandas and Matplotlib for Scatter Plots in Python (with generated data)

If you do not have data in hand, we can also generate data using Python, before doing the actual plotting.

In particular, we need to generate sample data of X and Y. Note that, if you have your own data ready, you can skip this step (as in Example 1).

With data in hand, we can now use Python to do the scatter plot. Note that, the code `pd_df.plot(kind="scatter",x="x",y="y")`

suggests that the scatter plot is using `Pandas`

, rather than directly using `Matplotlib`

.

import numpy as np import pandas as pd import matplotlib.pyplot as plt # mean and standard deviation for x mu_1, sigma_1 = 5, 0.1 x = np.random.normal(mu_1, sigma_1, 100) # mean and standard deviation for y mu_2, sigma_2 = 0, 2 e = np.random.normal(mu_2, sigma_2, 100) y=x*3+e # combine x and y into a dataframe and print it out d = {'x': x,'y':y} pd_df=pd.DataFrame(data=d) print(pd_df) # plot the scatter plot pd_df.plot(kind="scatter",x="x",y="y") plt.show()

The following is the output of print data.

x y 0 5.048436 15.017243 1 4.966731 13.614946 2 5.040685 15.690755 3 4.910718 15.683145 4 5.183769 18.028059 .. ... ... 95 4.896289 15.513315 96 4.919661 15.098267 97 4.906681 16.298011 98 5.032974 16.328086 99 5.266889 12.799115 [100 rows x 2 columns]

The following is the scatter plot output: