There are built-in datasets in Python and you can use them to do some practice. In doing so, you do not need to import external datasets. The following provides a list of built-in sample datasets in Python.
1. penguins
in seaborn
The penguins dataset was collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER. For more information about this dataset, you can refer to this post.
# Built-in sample dataset of penguins in seaborn
import seaborn as sns
# load the penguins dataset from seaborn
penguins = sns.load_dataset("penguins")
# print the penguins dataset
print(penguins)
Output:
species island bill_length_mm bill_depth_mm flipper_length_mm \ 0 Adelie Torgersen 39.1 18.7 181.0 1 Adelie Torgersen 39.5 17.4 186.0 2 Adelie Torgersen 40.3 18.0 195.0 3 Adelie Torgersen NaN NaN NaN 4 Adelie Torgersen 36.7 19.3 193.0 .. ... ... ... ... ... 339 Gentoo Biscoe NaN NaN NaN 340 Gentoo Biscoe 46.8 14.3 215.0 341 Gentoo Biscoe 50.4 15.7 222.0 342 Gentoo Biscoe 45.2 14.8 212.0 343 Gentoo Biscoe 49.9 16.1 213.0 body_mass_g sex 0 3750.0 Male 1 3800.0 Female 2 3250.0 Female 3 NaN NaN 4 3450.0 Female .. ... ... 339 NaN NaN 340 4850.0 Female 341 5750.0 Male 342 5200.0 Female 343 5400.0 Male [344 rows x 7 columns]
2. iris
in statsmodels
The Iris flower dataset is from the British statistician and biologist Ronald Fisher in his 1936 paper.
# Built-in sample dataset of iris in statsmodels
import statsmodels.api as sm
# load the iris dataset from statsmodels
iris = sm.datasets.get_rdataset('iris').data
# print the iris dataset
print(iris)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa .. ... ... ... ... ... 145 6.7 3.0 5.2 2.3 virginica 146 6.3 2.5 5.0 1.9 virginica 147 6.5 3.0 5.2 2.0 virginica 148 6.2 3.4 5.4 2.3 virginica 149 5.9 3.0 5.1 1.8 virginica [150 rows x 5 columns]
3. iris
in sklearn
# Built-in sample dataset of iris in sklearn
from sklearn.datasets import load_iris
# load the iris dataset
iris = load_iris()
# print the iris dataset
print(iris['data'])
The following is a partial output, as the array is quite long. (As shown in the last section, iris
dataset has 149 rows in a dataframe.)
[[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2] [5.4 3.9 1.7 0.4] [4.6 3.4 1.4 0.3] [5. 3.4 1.5 0.2] [4.4 2.9 1.4 0.2] [4.9 3.1 1.5 0.1] ...
Further Reading
Overview of Python built-in data types