Automated Exploratory Data Analysis

PACKAGE              VERSION
Python               3.8.5
numpy                1.19.2
pandas               1.1.3
matplotlib           3.3.2
seaborn              0.11.0
astropy              4.0.2

1. Dataset overview

species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female
... ... ... ... ... ... ... ...
339 Gentoo Biscoe NaN NaN NaN NaN NaN
340 Gentoo Biscoe 46.8 14.3 215.0 4850.0 Female
341 Gentoo Biscoe 50.4 15.7 222.0 5750.0 Male
342 Gentoo Biscoe 45.2 14.8 212.0 5200.0 Female
343 Gentoo Biscoe 49.9 16.1 213.0 5400.0 Male

344 rows × 7 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            344 non-null    object 
 1   island             344 non-null    object 
 2   bill_length_mm     342 non-null    float64
 3   bill_depth_mm      342 non-null    float64
 4   flipper_length_mm  342 non-null    float64
 5   body_mass_g        342 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 18.9+ KB
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
count 344 344 344 344 344 344 344
unique 3 3 165 81 56 95 3
top Adelie Biscoe 41.1 17.0 190.0 3800.0 Male
freq 152 168 7 12 22 12 168
VARIABLE             TYPE
species              categorical
island               categorical
bill_length_mm       numerical continuous
bill_depth_mm        numerical continuous
flipper_length_mm    numerical discrete (integers only)
body_mass_g          numerical discrete (integers only)
sex                  binary
Maximum number of unique values for plotting categorical/discrete variables: 8 

OVERVIEW OF VARIABLES GROUPED ACCORDING TO PLOTS

Histograms
 ['bill_length_mm' 'bill_depth_mm' 'flipper_length_mm' 'body_mass_g'] 

Scatter plots
 ['bill_length_mm' 'bill_depth_mm' 'flipper_length_mm' 'body_mass_g'] 

Numerical discrete variable to visualize like categorical variables
 [] 

Bar charts and hue in scatter plots
 ['species' 'island' 'sex'] 

Categorical variable with too many unique values to be plotted
 [] 

Datetime variable to use as index for time series
 []

1.1 Numerical variables distributions

bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
count 342.00 342.00 342.00 342.00
mean 43.92 17.15 200.92 4201.75
std 5.46 1.97 14.06 801.95
min 32.10 13.10 172.00 2700.00
25% 39.23 15.60 190.00 3550.00
50% 44.45 17.30 197.00 4050.00
75% 48.50 18.70 213.00 4750.00
max 59.60 21.50 231.00 6300.00
Automatically selected histogram bin widths

VARIABLE             BIN WIDTH
bill_length_mm             2.5
bill_depth_mm              1.0
flipper_length_mm            5
body_mass_g              250.0

1.2 Categorical variables distributions

Description of categorical/numerical discrete variables with few enough
unique values to be visualized
species island sex
count 344 344 333
unique 3 3 2
top Adelie Biscoe Male
freq 152 168 168

2. Numerical variables relationships

Variables to plot as hue in scatterplot matrix
 ['species' 'island' 'sex']

3. Numerical-categorical relationships

4. Categorical variables relationships

5. Time series

There is no time series in this dataset