In [1]:
%%html
<style>
p {
    max-width: 45em;
    word-wrap: break-word;
    }
li{
    max-width: 40em;
    word-wrap: break-word;
    margin:0 0 5px 0;
    }
.output {
    margin:0px 0px 0px 0px;
    }
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
    margin:10px 0px 30px -30px;
    }
</style>



Ecological Footprint and Human Development Index 2016


In [2]:
# Import packages and set package parameters
import os
import requests
from bs4 import BeautifulSoup
import json
import re
from math import sqrt, floor, ceil
import numpy as np
import pandas as pd
pd.set_option('precision',2) # Limit number of significant figures displayed in outputs
# pd.set_option('display.max_rows', None) # Display full pd.DataFrame
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10,6) # Set default figure size in inches (width, height)
import seaborn as sns
import altair as alt

pyv = !python --version
print(f'{"PACKAGE":<20} VERSION')
print(f'{"Python":<20} {str(pyv[0]).split()[1]}')
print('\n'.join(f'{m.__name__:<20} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
PACKAGE              VERSION
Python               3.8.5
requests             2.24.0
json                 2.0.9
re                   2.2.1
numpy                1.19.2
pandas               1.1.3
seaborn              0.11.0
altair               4.1.0

1. Introduction

The goal of this data analysis is to study the relationship between the ecological footprint (EF) and the Human Development Index (HDI) of countries worldwide using the latest data available which dates from 2016 (as of October 2020).

The EF measures the surface area needed to provide the resources necessary to meet consumer demand in any given country. This includes the surface area used for food crops, fiber production, timber regeneration, absorption of carbon dioxide emissions from fossil fuel burning, and built infrastructure. Imports and exports are taken into account. Further details can be found here. This indicator is provided by the Global Footprint Network (GFN), a non-profit organization whose mission is to help end ecological overshoot by making ecological limits central to decision-making.

The HDI measures the level of development of a country by combining indices that assess three key dimensions: health, education and standard of living. Health is assessed by life expectancy at birth, education is measured by mean of years of schooling for adults aged 25 years and more and expected years of schooling for children of school entering age. Standard of living is measured by gross national income per capita. Further details can be found here. The HDI is calculated by the United Nations Development Programme using data from various sources.

This notebook presents the entire data analysis process starting with the extraction of the raw data from online sources, followed by data preparation operations and finishing with a few descriptive graphs, the one of key interest being the scatterplot showing the relationship betweenship ecological footprint and the HDI for countries worldwide.

2. Data extraction and preparation

2.1 Import data through the GFN API

The data concerning both the EF and the HDI can be accessed on the GFN website as an excel file containing all the data, as individual datasets through the Ecological Footprint Explorer open data platform or through the website's API for which an API key must be requested using their API Key Request Form to be able to access the data. The data is provided by the GFN under a Creative Commons Attribution-ShareAlike 4.0 International License.

For this data analysis, the data will be imported through the API as it makes the whole process easier to reproduce and requires fewer files.

In [3]:
# Set API key and username needed for authentication to access the GFN API
import config # Script file containing confidential authentication details
api_key = config.api_key # API key received from the GFN by email
user_agent = config.user_agent # User name which can be any string

Different datasets can be accessed through the API, the types dataset lists all the indicators that are available.

In [4]:
# Extract list of available types of indicators and check that API
# access is successful
url = 'http://api.footprintnetwork.org/v1/types'
request_types = requests.get(url, auth=(user_agent, api_key))
print(f'API HTTP response code: {request_types.status_code}') # Check that request is successful: HTTP code = 200
API HTTP response code: 200
In [5]:
# Display json file contents listing available indicators
types_json = request_types.json()
df_types = pd.DataFrame(types_json)
df_types
Out[5]:
id version code record note enabled name
0 3 None BCpc BiocapPerCap Biocapacity divided by population in global he... 1 Biocapacity per person
1 4 None BCtot BiocapTotGHA Total biocapacity in global hectares (gha) 1 Biocapacity
2 5 None EFCpc EFConsPerCap Ecological Footprint of consumption in global ... 1 Ecological Footprint per person
3 6 None EFCtot EFConsTotGHA Total Ecological Footprint of consumption in g... 1 Ecological Footprint
4 13 None earth BiocapTotGHA EFConsPerCap of Country / EFConsPerCap of World 1 Earths
5 14 None pop Population Population 1 Population
6 15 None hdi HDI Human Development Index; Source: Trends in the... 1 Human Development Index
7 16 None gdp GDP GDP per capita (constant 2010 US$); Source: Wo... 1 Gross Domestic Product

For this data analysis, these three variables will be extracted: Earths, Population and Human Development Index. Earths measures how many Earths would be required to provide the surface required to meet consumption needs if everyone on the planet were to consume at the same level as the given country.

The HDI can also be accessed through the API of the United Nations Human Development Report Office where the data may be slightly more up to date as it is recalculated retroactively each year with the integration of new methodologies and updated data.

In [6]:
# Extract data on population, ecological footprint (in terms of number
# of Earths), and the Human Development Index
url = 'http://api.footprintnetwork.org/v1/data/all/2016/pop,earth,hdi'
request_ef2016 = requests.get(url, auth=(user_agent, api_key))
print(f'API HTTP response code: {request_ef2016.status_code}')
API HTTP response code: 200
In [7]:
df_ef2016_raw = pd.DataFrame(request_ef2016.json())
df_ef2016_raw
Out[7]:
id version year countryCode countryName shortName isoa2 record cropLand grazingLand forestLand fishingGround builtupLand carbon value score
0 221204 None 2016 1 Armenia Armenia AM Earths 0.33 0.14 0.15 4.62e-03 0.03 0.51 1.17e+00 3A
1 121257 None 2016 1 Armenia Armenia AM HDI NaN NaN NaN NaN NaN NaN 7.51e-01 None
2 125727 None 2016 1 Armenia Armenia AM Population NaN NaN NaN NaN NaN NaN 2.92e+06 None
3 221261 None 2016 2 Afghanistan Afghanistan AF Earths 0.18 0.11 0.04 1.41e-04 0.02 0.09 4.39e-01 3A
4 121285 None 2016 2 Afghanistan Afghanistan AF HDI NaN NaN NaN NaN NaN NaN 4.91e-01 None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
596 231913 None 2016 2005 Oceania º Oceania None Earths 0.34 0.10 0.57 8.78e-02 0.03 2.82 3.94e+00
597 216927 None 2016 2005 Oceania º Oceania None HDI NaN NaN NaN NaN NaN NaN 8.38e-01 None
598 231970 None 2016 5001 World º World None Earths 0.32 0.09 0.17 5.58e-02 0.04 1.03 1.70e+00 3A
599 125701 None 2016 5001 World º World None HDI NaN NaN NaN NaN NaN NaN 7.27e-01 None
600 135252 None 2016 5001 World º World None Population NaN NaN NaN NaN NaN NaN 7.47e+09 None

601 rows × 16 columns

The record column shows that the values of three variables of interest are contained on different lines of the value column. Country names and the ISO 3166-1 alpha-2 country codes are also needed for merging with other datasets later on.

In [8]:
# Select columns of interest
df_ef2016_select = df_ef2016_raw[['shortName', 'isoa2', 'record', 'value']]
df_ef2016_select
Out[8]:
shortName isoa2 record value
0 Armenia AM Earths 1.17e+00
1 Armenia AM HDI 7.51e-01
2 Armenia AM Population 2.92e+06
3 Afghanistan AF Earths 4.39e-01
4 Afghanistan AF HDI 4.91e-01
... ... ... ... ...
596 º Oceania None Earths 3.94e+00
597 º Oceania None HDI 8.38e-01
598 º World None Earths 1.70e+00
599 º World None HDI 7.27e-01
600 º World None Population 7.47e+09

601 rows × 4 columns

In [9]:
df_ef2016_select.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 601 entries, 0 to 600
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   shortName  601 non-null    object 
 1   isoa2      547 non-null    object 
 2   record     601 non-null    object 
 3   value      601 non-null    float64
dtypes: float64(1), object(3)
memory usage: 18.9+ KB

The record column needs to be split into separate columns containing the variables of interest and the columns can be renamed for better readability.

In [10]:
# Pivot table, reset index and rename columns
df_ef2016_pivot = df_ef2016_select.pivot_table(index=['shortName','isoa2'], columns=['record'], values='value').reset_index()
df_ef2016_pivot.columns = ['country', 'code', 'Earths', 'HDI', 'population']
df_ef2016_pivot
Out[10]:
country code Earths HDI population
0 Afghanistan AF 0.44 0.49 3.47e+07
1 Albania AL 1.23 0.79 2.93e+06
2 Algeria DZ 1.48 0.75 4.06e+07
3 Angola AO 0.57 0.57 2.88e+07
4 Antigua and Barbuda AG 2.69 0.77 1.01e+05
... ... ... ... ... ...
183 Venezuela VE 1.55 0.75 3.16e+07
184 Viet Nam VN 1.37 0.69 9.46e+07
185 Yemen YE 0.40 0.48 2.76e+07
186 Zambia ZM 0.60 0.58 1.66e+07
187 Zimbabwe ZW 0.65 0.55 1.62e+07

188 rows × 5 columns

In [11]:
df_ef2016_pivot.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 188 entries, 0 to 187
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     188 non-null    object 
 1   code        188 non-null    object 
 2   Earths      184 non-null    float64
 3   HDI         175 non-null    float64
 4   population  188 non-null    float64
dtypes: float64(3), object(2)
memory usage: 7.5+ KB

The dataset contains data on 187 countries, of which 12 are missing a value for the HDI (shown in a later section).

We can now look at how Earths and HDI are distributed.

In [12]:
# Create histogram function
def hist_align_bins(variable, xlabel, title, bin_width):
    """
    Create a histogram with bins of a selected width and overlaid with a
    kernel density estimator.
    
    A list of bins for the histogram 'bins' argument is created based on 
    the range of the variable scores and on the selected width of bins.
    The bins are automatically aligned with round units making the
    histogram more readable.
    
    Parameters
    ----------
    variable: pandas Series with dtype integer or float
        Numerical variable to plot in histogram.
    xlabel: str
        Label x axis.
    title: str
        Figure title.
    bin_width: integer, float
        Width of histogram bins in the units of the variable.
    """
    
    leftmost_bin_edge = variable.min() - variable.min()%bin_width
    rightmost_bin_edge = variable.max() - variable.max()%bin_width + 2*bin_width
    bins_list = np.arange(leftmost_bin_edge, rightmost_bin_edge, bin_width)
    
    ax_hist = sns.histplot(data=variable, bins=bins_list, kde=True, alpha= 0.9,
                           edgecolor='white', linewidth=0.5,
                           line_kws=dict(alpha=0.5, linewidth=1.5,
                           label='kernel density estimator'))
    ax_hist.get_lines()[0].set_color('black') # manually edit line color due to bug with line_kws

    # Additional formatting
    ax_hist.set_xlabel(xlabel, size=12, labelpad=15)
    ax_hist.set_ylabel('Count', size=12, labelpad=15)
    
    plt.title(title, size=14, pad=30)
    plt.legend(frameon=False)
    sns.despine()
In [13]:
# Draw Earths histogram
hist_align_bins(variable=df_ef2016_pivot['Earths'],
                xlabel='Earths',
                title='Distribution of Ecological Footprint scores',
                bin_width=0.5)
In [14]:
# Earths top 10 countries
df_ef2016_pivot.nlargest(10, 'Earths')
Out[14]:
country code Earths HDI population
139 Qatar QA 8.91 0.85 2.57e+06
102 Luxembourg LU 7.99 0.90 5.76e+05
179 United Arab Emirates AE 5.52 0.86 9.27e+06
12 Bahrain BH 5.47 0.84 1.43e+06
93 Kuwait KW 5.29 0.81 4.05e+06
181 United States US 5.11 0.92 3.22e+08
33 Canada CA 5.10 0.92 3.63e+07
173 Trinidad and Tobago TT 4.90 0.80 1.36e+06
114 Mongolia MN 4.77 0.73 3.03e+06
19 Bermuda BM 4.72 NaN 6.17e+04

The list contains mainly fossil fuel producing countries with the exceptions of Luxembourg and Bermuda. As a wealthy country, Luxembourg has a high per capita consumption level and it also experiences a high level of tank tourism. Bermuda also is a wealthy country with a high per capita consumption level and it additionaly relies heavily on fossil fuels for all its energy needs.

In [15]:
# Earths bottom 10 countries
df_ef2016_pivot.nsmallest(10, 'Earths')
Out[15]:
country code Earths HDI population
170 Timor-Leste TL 0.31 0.63 1.27e+06
58 Eritrea ER 0.32 0.43 4.95e+06
185 Yemen YE 0.40 0.48 2.76e+07
78 Haiti HT 0.41 0.50 1.08e+07
29 Burundi BI 0.41 0.43 1.05e+07
143 Rwanda RW 0.43 0.52 1.19e+07
0 Afghanistan AF 0.44 0.49 3.47e+07
131 Palestine PS 0.47 0.69 4.79e+06
118 Mozambique MZ 0.48 0.44 2.88e+07
42 Congo DR CD 0.48 0.45 7.87e+07
In [16]:
# Draw HDI histogram
hist_align_bins(variable=df_ef2016_pivot['HDI'],
                xlabel='HDI',
                title='Distribution of HDI scores',
                bin_width=0.025)
In [17]:
# HDI top 10
df_ef2016_pivot.nlargest(10, 'HDI')
Out[17]:
country code Earths HDI population
128 Norway NO 3.42 0.95 5.25e+06
165 Switzerland CH 2.88 0.94 8.40e+06
69 Germany DE 2.98 0.94 8.19e+07
85 Ireland IE 3.16 0.94 4.73e+06
8 Australia AU 4.02 0.94 2.41e+07
164 Sweden SE 3.84 0.93 9.84e+06
150 Singapore SG 3.96 0.93 5.62e+06
122 Netherlands NL 3.12 0.93 1.70e+07
50 Denmark DK 4.33 0.93 5.71e+06
62 Finland FI 3.85 0.92 5.50e+06
In [18]:
# HDI bottom 10
df_ef2016_pivot.nsmallest(10, 'HDI')
Out[18]:
country code Earths HDI population
125 Niger NE 1.00 0.37 2.07e+07
35 Central African Republic CF 0.73 0.37 4.59e+06
36 Chad TD 1.10 0.40 1.45e+07
28 Burkina Faso BF 0.71 0.42 1.86e+07
107 Mali ML 0.92 0.42 1.80e+07
149 Sierra Leone SL 0.62 0.42 7.40e+06
29 Burundi BI 0.41 0.43 1.05e+07
58 Eritrea ER 0.32 0.43 4.95e+06
118 Mozambique MZ 0.48 0.44 2.88e+07
42 Congo DR CD 0.48 0.45 7.87e+07

To get a better understanding of how much of the world is covered in the dataset, we can compare the countries listed in the dataset to a comprehensive list of countries considered to be independent States.

2.2 Import list of independent States by scraping online HTML table

A list of independent States will be used as reference to better understand the coverage of the available data, with the assumption that countries or territories that are not independent are small in size and therefore have less impact on the global consumption of resources.

The list of Independent States in the World published by the United States Department of State (USDS) will be used as reference. It includes GENC country codes which are based on the ISO 3166 country code standard and will be used to merge with the other dataset. This data is in the public domain and may be copied and distributed without permission.

The USDS allows scraping of its website but for some reason the pandas function pd.read_html(url) does not work on this page so the BeautifulSoup package is used here instead.

In [19]:
# Check that webpage access is successful
url = 'https://www.state.gov/independent-states-in-the-world/'
response = requests.get(url)
print(f'API HTTP response code: {response.status_code}')
API HTTP response code: 200
In [20]:
# Scrape table, code based on pluralsight.com/guides/extracting-data-html-beautifulsoup
# and kite.com/python/examples/4420/beautifulsoup-parse-an-html-table-and-write-to-a-csv

html_content = requests.get(url).text
soup = BeautifulSoup(html_content, 'lxml')
table = soup.find('table')

output_rows = []
for table_row in table.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        output_row.append(column.text)
    output_rows.append(output_row)

df_USDS_raw = pd.DataFrame(data=output_rows[1:], columns=output_rows[0])
df_USDS_raw
Out[20]:
Short-form name Long-form name GENC 2A Code (see Note 2) GENC 3A Code (see Note 2) Capital
0 Afghanistan *+ Islamic Republic of Afghanistan AF AFG Kabul
1 Albania *+ Republic of Albania AL ALB Tirana
2 Algeria *+ People’s Democratic Republic of Algeria DZ DZA Algiers
3 Andorra *+ Principality of Andorra AD AND Andorra la Vella
4 Angola *+ Republic of Angola AO AGO Luanda
... ... ... ... ... ...
190 Venezuela *+ Bolivarian Republic of Venezuela VE VEN Caracas
191 Vietnam *+ Socialist Republic of Vietnam VN VNM Hanoi
192 Yemen *+ Republic of Yemen YE YEM Sanaa
193 Zambia *+ Republic of Zambia ZM ZMB Lusaka
194 Zimbabwe *+ Republic of Zimbabwe ZW ZWE Harare

195 rows × 5 columns

There are 195 independent States according to the USDS.

In [21]:
df_USDS = df_USDS_raw[['Short-form name', 'GENC 2A Code (see Note 2)']].copy()
df_USDS.columns = ['country_USDS', 'code']
df_USDS[30:60]
Out[21]:
country_USDS code
30 Cambodia *+ KH
31 Cameroon *+ CM
32 Canada *+ CA
33 Central\nAfrican Republic *+ CF
34 Chad *+ TD
35 Chile *+ CL
36 China *+ (see note 3) CN
37 Colombia *+ CO
38 Comoros *+ KM
39 Congo (Brazzaville) *+\n(see note 4) CG
40 Congo (Kinshasa) *+\n(see note 4) CD
41 Costa Rica *+ CR
42 Côte d’Ivoire *+ CI
43 Croatia *+ HR
44 Cuba *+ CU
45 Cyprus *+ CY
46 Czechia *+ CZ
47 Denmark *+ DK
48 Djibouti *+ DJ
49 Dominica *+ DM
50 Dominican Republic *+ DO
51 Ecuador *+ EC
52 Egypt *+ EG
53 El Salvador *+ SV
54 Equatorial Guinea *+ GQ
55 Eritrea *+ ER
56 Estonia *+ EE
57 \n \nEswatini *+ SZ
58 Ethiopia *+ ET
59 Fiji *+ FJ

This sample of the table shows that country names contain extra characters and notes which can be removed for better readability.

In [22]:
# Remove extra characters and notes from country names

# Remove '(see note [digit])'
for name in df_USDS.loc[:,'country_USDS']:
    if re.findall(r'[0-9]', name):
        df_USDS.loc[:,'country_USDS'][df_USDS.loc[:,'country_USDS'] == name] = name[:-13]

# Remove '\n' '*+' white spaces and no-break space left of Eswatini
df_USDS.loc[:,'country_USDS'] = df_USDS.loc[:,'country_USDS']\
                                       .str.replace('\n','')\
                                       .str.lstrip('\xa0')\
                                       .str.rstrip(' *+')

# Remove no-break space left of Eswatini country code
df_USDS.loc[:,'code'] = df_USDS.loc[:,'code'].str.lstrip('\xa0 ')

# Print subset of column to check that strings have been formatted correctly
df_USDS[30:60]
Out[22]:
country_USDS code
30 Cambodia KH
31 Cameroon CM
32 Canada CA
33 CentralAfrican Republic CF
34 Chad TD
35 Chile CL
36 China CN
37 Colombia CO
38 Comoros KM
39 Congo (Brazzaville) CG
40 Congo (Kinshasa) CD
41 Costa Rica CR
42 Côte d’Ivoire CI
43 Croatia HR
44 Cuba CU
45 Cyprus CY
46 Czechia CZ
47 Denmark DK
48 Djibouti DJ
49 Dominica DM
50 Dominican Republic DO
51 Ecuador EC
52 Egypt EG
53 El Salvador SV
54 Equatorial Guinea GQ
55 Eritrea ER
56 Estonia EE
57 Eswatini SZ
58 Ethiopia ET
59 Fiji FJ
In [23]:
# Merge list of independent States with list of countries in EF dataset for comparison
df_ef2016_merge_outer = pd.merge(df_USDS, df_ef2016_pivot[['country', 'code']], how='outer', on='code')
df_ef2016_merge_outer
Out[23]:
country_USDS code country
0 Afghanistan AF Afghanistan
1 Albania AL Albania
2 Algeria DZ Algeria
3 Andorra AD NaN
4 Angola AO Angola
... ... ... ...
202 NaN GP Guadeloupe
203 NaN MQ Martinique
204 NaN MS Montserrat
205 NaN PS Palestine
206 NaN RE Reunion

207 rows × 3 columns

In [24]:
# Check that merge is successful
df_ef2016_merge_outer.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 207 entries, 0 to 206
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   country_USDS  195 non-null    object
 1   code          207 non-null    object
 2   country       188 non-null    object
dtypes: object(3)
memory usage: 6.5+ KB

The following cell displays the independent States not included in the EF dataset.

In [25]:
missing_states = df_ef2016_merge_outer[df_ef2016_merge_outer['country'].isna()]
print(missing_states)
print()
print(f'Number of independent countries not included in EF dataset:\
 {len(missing_states)}')
                       country_USDS code country
3                           Andorra   AD     NaN
74                         Holy See   VA     NaN
77                          Iceland   IS     NaN
90                         Kiribati   KI     NaN
93                           Kosovo   XK     NaN
102                   Liechtenstein   LI     NaN
108                        Maldives   MV     NaN
111                Marshall Islands   MH     NaN
115  Micronesia,Federated States of   FM     NaN
117                          Monaco   MC     NaN
123                           Nauru   NR     NaN
134                           Palau   PW     NaN
146           Saint Kitts and Nevis   KN     NaN
150                      San Marino   SM     NaN
155                      Seychelles   SC     NaN
163                     South Sudan   SS     NaN
181                          Tuvalu   TV     NaN
187                         Uruguay   UY     NaN
189                         Vanuatu   VU     NaN

Number of independent countries not included in EF dataset: 19

It is also interesting to see which are the 10 out of the 187 countries listed in the EF dataset that are not independent.

In [26]:
additional_territories = df_ef2016_merge_outer[df_ef2016_merge_outer['country_USDS'].isna()]
print(additional_territories)
print()
print(f'Number of dependent territories included in EF dataset:\
 {len(additional_territories)}')
    country_USDS code                 country
195          NaN   AW                   Aruba
196          NaN   BM                 Bermuda
197          NaN   VG  British Virgin Islands
198          NaN   KY          Cayman Islands
199          NaN   CK            Cook Islands
200          NaN   GF           French Guiana
201          NaN   PF        French Polynesia
202          NaN   GP              Guadeloupe
203          NaN   MQ              Martinique
204          NaN   MS              Montserrat
205          NaN   PS               Palestine
206          NaN   RE                 Reunion

Number of dependent territories included in EF dataset: 12

10 dependent territories are included in the EF dataset with a computed ecological footprint. As they are not independent states they have not been included in the HDI, along with North Korea and Somalia, as shown in the table below.

In [27]:
# Display EF dataset countries/territories with no HDI
df_ef2016_pivot[df_ef2016_pivot['HDI'].isna()]
Out[27]:
country code Earths HDI population
7 Aruba AW 4.11 NaN 1.05e+05
19 Bermuda BM 4.72 NaN 6.17e+04
25 British Virgin Islands VG 2.07 NaN 3.07e+04
34 Cayman Islands KY 3.65 NaN 6.08e+04
43 Cook Islands CK NaN NaN 1.74e+04
64 French Guiana GF 1.50 NaN 2.76e+05
65 French Polynesia PF 2.47 NaN 2.80e+05
73 Guadeloupe GP 2.56 NaN 4.50e+05
109 Martinique MQ 2.80 NaN 3.85e+05
116 Montserrat MS 2.82 NaN 5.15e+03
127 North Korea KP 0.54 NaN 2.54e+07
140 Reunion RE 2.98 NaN 8.70e+05
154 Somalia SO 0.60 NaN 1.43e+07

Now that we better understand the coverage of the dataset, it would be interesting to include information about the region to which each country belongs as it would provide a better overview of the geographical distribution of the scores of both indicators.

2.3 Import list of countries with regions from GitHub repository

This information can be obtained from different places. A list built by GitHub user lukes last updated on 19 March 2019 will be imported here. This list has been obtained by merging two sources, the Wikipedia ISO 3166-1 article table containing the alpha and numeric country codes, and the United Nations Statistics Division table containing regional, and sub-regional names and codes. The information on regions may also be obtained by downloading a CSV or Excel file from the United Nations Statistics Division page.

In [28]:
# Note: set keep_default_na to False to avoid Namibia code NA from being
# interpreted as NaN when merging dataframes later on
url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df_regions_raw = pd.read_csv(url, keep_default_na=False)
df_regions_raw
Out[28]:
name alpha-2 alpha-3 country-code iso_3166-2 region sub-region intermediate-region region-code sub-region-code intermediate-region-code
0 Afghanistan AF AFG 4 ISO 3166-2:AF Asia Southern Asia 142 034
1 Ã…land Islands AX ALA 248 ISO 3166-2:AX Europe Northern Europe 150 154
2 Albania AL ALB 8 ISO 3166-2:AL Europe Southern Europe 150 039
3 Algeria DZ DZA 12 ISO 3166-2:DZ Africa Northern Africa 002 015
4 American Samoa AS ASM 16 ISO 3166-2:AS Oceania Polynesia 009 061
... ... ... ... ... ... ... ... ... ... ... ...
244 Wallis and Futuna WF WLF 876 ISO 3166-2:WF Oceania Polynesia 009 061
245 Western Sahara EH ESH 732 ISO 3166-2:EH Africa Northern Africa 002 015
246 Yemen YE YEM 887 ISO 3166-2:YE Asia Western Asia 142 145
247 Zambia ZM ZMB 894 ISO 3166-2:ZM Africa Sub-Saharan Africa Eastern Africa 002 202 014
248 Zimbabwe ZW ZWE 716 ISO 3166-2:ZW Africa Sub-Saharan Africa Eastern Africa 002 202 014

249 rows × 11 columns

In [29]:
df_regions = df_regions_raw[['alpha-2', 'region']]
df_regions.columns = ['code', 'continent']
df_regions
Out[29]:
code continent
0 AF Asia
1 AX Europe
2 AL Europe
3 DZ Africa
4 AS Oceania
... ... ...
244 WF Oceania
245 EH Africa
246 YE Asia
247 ZM Africa
248 ZW Africa

249 rows × 2 columns

In [30]:
# Merge df_ef2016_pivot with df_regions to get continents for all
# countries and territories
df_ef2016 = pd.merge(df_ef2016_pivot, df_regions, how='left', on='code')
df_ef2016
Out[30]:
country code Earths HDI population continent
0 Afghanistan AF 0.44 0.49 3.47e+07 Asia
1 Albania AL 1.23 0.79 2.93e+06 Europe
2 Algeria DZ 1.48 0.75 4.06e+07 Africa
3 Angola AO 0.57 0.57 2.88e+07 Africa
4 Antigua and Barbuda AG 2.69 0.77 1.01e+05 Americas
... ... ... ... ... ... ...
183 Venezuela VE 1.55 0.75 3.16e+07 Americas
184 Viet Nam VN 1.37 0.69 9.46e+07 Asia
185 Yemen YE 0.40 0.48 2.76e+07 Asia
186 Zambia ZM 0.60 0.58 1.66e+07 Africa
187 Zimbabwe ZW 0.65 0.55 1.62e+07 Africa

188 rows × 6 columns

In [31]:
# Check that merge is successful
df_ef2016.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 188 entries, 0 to 187
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     188 non-null    object 
 1   code        188 non-null    object 
 2   Earths      184 non-null    float64
 3   HDI         175 non-null    float64
 4   population  188 non-null    float64
 5   continent   188 non-null    object 
dtypes: float64(3), object(3)
memory usage: 10.3+ KB

The merging appears to be successful as all countries and territories have an attributed continent.

3. Data analysis

Now that the dataset is ready for further analysis, it is time to look at the distribution of the country scores for the EF and HDI grouped by continent and to visualize the relationship between these two variables.

3.1 Ecological Footprint by continent

In [32]:
data=df_ef2016
var_cat='continent'
var_num='Earths'

# Draw boxplots overlaid with stripplots of EF scores grouped by continent
# Note: the boxplots are drawn with whiskers that reach out to farthest
# data point within the interval contained by the 5th and 95th percentiles
fig, ax = plt.subplots(figsize=(7, 7))
sns.boxplot(data=data, x=var_cat, y=var_num, whis=(5, 95), linewidth=1,
            width=0.3, fliersize=False, saturation=1)
sns.stripplot(data=data, x=var_cat, y=var_num, color="k", alpha=0.6, size=3)

# Add horizontal line for one planet limit and additional formatting
ax.axhline(1, color='grey', ls='dashed', alpha=0.5)
ax.set_xlabel('Continent', size=12, labelpad=15)
ax.set_ylabel('Earths', size=12, labelpad=15)
sns.despine();

The dashed line represents the one planet limit. The boxplots are drawn with whiskers that reach out to farthest data point within the interval contained by the 5th and 95th percentiles. Countries from both Europe and Oceania are all consuming in excess of Earth's resources, as well as nearly all of North and South American countries with a few exceptions:

In [33]:
df_ef2016[(df_ef2016['Earths']<=1) & (df_ef2016['continent']=='Americas')]
Out[33]:
country code Earths HDI population continent
78 Haiti HT 0.41 0.50 1.08e+07 Americas
79 Honduras HN 0.96 0.62 9.11e+06 Americas

Many Asian and African countries appear to be within the planet's limits. These can be identified in the scatterplot further below.

African countries are very frugal in their consumption, very few exceed the 2 Earths mark:

In [34]:
df_ef2016[(df_ef2016['Earths']>=2) & (df_ef2016['continent']=='Africa')]
Out[34]:
country code Earths HDI population continent
100 Libya LY 2.24 0.69 6.29e+06 Africa
111 Mauritius MU 2.18 0.79 1.26e+06 Africa
140 Reunion RE 2.98 NaN 8.70e+05 Africa

Asian countries on the other hand show a very wide distribution in resource consumption. Their EF ranges from 0.3 Earth equivalents up to 8.8 with a median of 1.3:

In [35]:
df_ef2016[df_ef2016['continent']=='Asia']['Earths'].describe()
Out[35]:
count    47.00
mean      2.12
std       1.78
min       0.31
25%       0.83
50%       1.30
75%       2.91
max       8.91
Name: Earths, dtype: float64

3.2 Human Development Index by continent

In [36]:
data=df_ef2016.dropna(axis=0, how='any') # Drop rows with NaN (present only in HDI column)
var_cat='continent'
var_num='HDI'

# Draw boxplots overlaid with stripplots of HDI scores grouped by continent
fig, ax = plt.subplots(figsize=(7, 7))
sns.boxplot(data=data, x=var_cat, y=var_num, whis=(5, 95), linewidth=1,
            width=0.3, fliersize=False, saturation=1)
sns.stripplot(data=data, x=var_cat, y=var_num, color="k", alpha=0.6, size=3)

# Additional formatting
ax.set_xlabel('Continent', size=12, labelpad=15)
ax.set_ylabel('HDI', size=12, labelpad=15)
sns.despine();

The HDI scores are widely spread within each continent and a few countries stand out as high-performers while some others lag behind the rest. These outliers are the following:

In [37]:
# HDI scores bottom 5% of each continent
df_ef2016.groupby('continent')\
         .apply(lambda df: df[df['HDI'] <= df['HDI'].quantile(0.05)])
Out[37]:
country code Earths HDI population continent
continent
Africa 35 Central African Republic CF 0.73 0.37 4.59e+06 Africa
36 Chad TD 1.10 0.40 1.45e+07 Africa
125 Niger NE 1.00 0.37 2.07e+07 Africa
Americas 78 Haiti HT 0.41 0.50 1.08e+07 Americas
79 Honduras HN 0.96 0.62 9.11e+06 Americas
Asia 0 Afghanistan AF 0.44 0.49 3.47e+07 Asia
166 Syria SY 0.72 0.54 1.84e+07 Asia
185 Yemen YE 0.40 0.48 2.76e+07 Asia
Europe 113 Moldova MD 1.07 0.70 4.06e+06 Europe
178 Ukraine UA 1.74 0.75 4.44e+07 Europe
Oceania 133 Papua New Guinea PG 1.13 0.54 8.08e+06 Oceania
In [38]:
# HDI scores top 5% of each continent
df_ef2016.groupby('continent')\
         .apply(lambda df: df[df['HDI'] >= df['HDI'].quantile(0.95)])
Out[38]:
country code Earths HDI population continent
continent
Africa 2 Algeria DZ 1.48 0.75 4.06e+07 Africa
111 Mauritius MU 2.18 0.79 1.26e+06 Africa
174 Tunisia TN 1.35 0.74 1.14e+07 Africa
Americas 33 Canada CA 5.10 0.92 3.63e+07 Americas
181 United States US 5.11 0.92 3.22e+08 Americas
Asia 86 Israel IL 2.99 0.90 8.19e+06 Asia
89 Japan JP 2.80 0.91 1.28e+08 Asia
150 Singapore SG 3.96 0.93 5.62e+06 Asia
Europe 128 Norway NO 3.42 0.95 5.25e+06 Europe
165 Switzerland CH 2.88 0.94 8.40e+06 Europe
Oceania 8 Australia AU 4.02 0.94 2.41e+07 Oceania

3.3 Ecological Footprint vs Human Development Index

In [39]:
# Draw interactive scatterplot of EF vs HDI using Altair library

base = alt.Chart(df_ef2016)

# Draw circles
circle = base.mark_circle(clip=True).encode(
    x=alt.X('HDI', title='Human Development Index 2016'),
    y=alt.Y('Earths', title='Ecological Footprint (Earths equivalent)'),
    color='continent',
    tooltip=['country',
             alt.Tooltip('HDI:Q', format='.2f'),
             alt.Tooltip('Earths:Q', format='.2f'),
             alt.Tooltip('population:Q', format=',')]
).properties(
    width=600,
    height=400
).interactive()

# Draw horizontal dashed line
Earth_limit = alt.Chart(pd.DataFrame({'y': [1]})).mark_rule(
    color="#808080",
    strokeDash=[4,4]
).encode(
    y='y')

# Draw vertical dashed line
HDI_median = base.mark_rule(
    color="#808080",
    strokeDash=[4,4]
).encode(
    x='median(HDI)'
)

circle + Earth_limit + HDI_median
Out[39]:

The horizontal dashed line marks the one planet limit. The vertical dashed line marks the median HDI. Nearly all countries consuming resources within Earth's total biocapacity score low on the HDI. The only exception is Sri Lanka which has an HDI score of 0.77 while maintaining consumption at a sustainable level. It is followed by Jamaica with an HDI score of 0.73 just below the median HDI. All other countries in the the top half of HDI scores are consuming resources at an unsustainable level.

The duty of policy-makers and other development stakeholders is to shift countries to the bottom-right quadrant. This means that those in high-consumption countries must put in place policies to decrease resource use, while those in low-consumption countries must find alternative ways to increase health, education and standard of living, as they cannot follow the development path of high-consumption countries.

In facing these challenges, it can be interesting to look more closely at which countries are the most efficient in resource consumption relative to their HDI score. Knowing which countries are most resource-efficient per HDI point may bring to light successful policies that may serve as examples for others. The development efficiency indicator is computed in the next section to identify those countries.

3.4 Development efficiency

In [40]:
df_ef2016['development_efficiency'] = df_ef2016['HDI']/df_ef2016['Earths']
df_ef2016['development_efficiency'].describe()
Out[40]:
count    172.00
mean       0.52
std        0.29
min        0.10
25%        0.33
50%        0.45
75%        0.65
max        2.04
Name: development_efficiency, dtype: float64
In [41]:
# Draw development efficiency histogram
hist_align_bins(variable=df_ef2016['development_efficiency'],
                xlabel='Development Efficiency (HDI/Earths)',
                title='Distribution of Development Efficiency scores',
                bin_width=0.1)
In [42]:
# Development efficienty top 10
df_ef2016.nlargest(10, 'development_efficiency')
Out[42]:
country code Earths HDI population continent development_efficiency
170 Timor-Leste TL 0.31 0.63 1.27e+06 Asia 2.04
131 Palestine PS 0.47 0.69 4.79e+06 Asia 1.45
58 Eritrea ER 0.32 0.43 4.95e+06 Africa 1.35
78 Haiti HT 0.41 0.50 1.08e+07 Americas 1.22
143 Rwanda RW 0.43 0.52 1.19e+07 Africa 1.22
185 Yemen YE 0.40 0.48 2.76e+07 Asia 1.19
13 Bangladesh BD 0.51 0.60 1.63e+08 Asia 1.18
0 Afghanistan AF 0.44 0.49 3.47e+07 Asia 1.12
167 Tajikistan TJ 0.59 0.65 8.73e+06 Asia 1.10
130 Pakistan PK 0.52 0.56 1.93e+08 Asia 1.08

Among the top 10 countries in terms of development efficiency, none are even close to the median HDI, only Timor-Leste and Tajikistan make it above the 0.6 mark. To get a list of potential standard-setters, we can further refine the list by setting a minimum HDI.

In [43]:
# Select top 10 countries with near and above median HDI
df_ef2016[df_ef2016['HDI'] >= 0.7].nlargest(10, 'development_efficiency')
Out[43]:
country code Earths HDI population continent development_efficiency
136 Philippines PH 0.82 0.70 1.03e+08 Asia 0.86
158 Sri Lanka LK 0.95 0.77 2.08e+07 Asia 0.82
88 Jamaica JM 1.01 0.72 2.88e+06 Americas 0.72
54 Ecuador EC 1.06 0.76 1.64e+07 Americas 0.71
53 Dominican Republic DO 1.05 0.74 1.06e+07 Americas 0.70
113 Moldova MD 1.07 0.70 4.06e+06 Europe 0.66
47 Cuba CU 1.18 0.77 1.15e+07 Americas 0.65
6 Armenia AM 1.17 0.75 2.92e+06 Asia 0.64
1 Albania AL 1.23 0.79 2.93e+06 Europe 0.64
10 Azerbaijan AZ 1.23 0.75 9.73e+06 Asia 0.61

As observed in the scatterplot, only Sri Lanka and Jamaica are within the one planet limit, with a near-median HDI. Uruguay stands out as the only country reaching the 0.8 HDI mark. Albania makes it in the list as the only European country and, along with the last three countries in the list, it is considered as an economy in transition by the United Nations country classification in the World Economic Situation and Prospects report from 2019. The other 6 countries are considered to be developing economies.

Each of these countries would have to be studied further to better understand how they score so high on the HDI in such a resource-efficient manner. Some of them may have policies in place that could serve as examples of good practice for other countries facing similar challenges.

Finally, we can have a look at what countries are the least resource efficient relative to the HDI:

In [44]:
# Development efficiency bottom 10
df_ef2016.nsmallest(10, 'development_efficiency')
Out[44]:
country code Earths HDI population continent development_efficiency
139 Qatar QA 8.91 0.85 2.57e+06 Asia 0.10
102 Luxembourg LU 7.99 0.90 5.76e+05 Europe 0.11
114 Mongolia MN 4.77 0.73 3.03e+06 Asia 0.15
93 Kuwait KW 5.29 0.81 4.05e+06 Asia 0.15
12 Bahrain BH 5.47 0.84 1.43e+06 Asia 0.15
179 United Arab Emirates AE 5.52 0.86 9.27e+06 Asia 0.16
173 Trinidad and Tobago TT 4.90 0.80 1.36e+06 Americas 0.16
181 United States US 5.11 0.92 3.22e+08 Americas 0.18
33 Canada CA 5.10 0.92 3.63e+07 Americas 0.18
59 Estonia EE 4.40 0.88 1.31e+06 Europe 0.20

These are mainly fossil fuel producing countries with the exception of Luxembourg. These countries would have to be investigated further to better understand why they make it into this list.

4. Conclusion

This analysis has shed some light on the situation of countries regarding their ecological footprint and their level of development. Sri Lanka stands out as being the only country with a sustainable level resource consumption while achieving an above-median score on the HDI. None of the countries among what are considered as developed economies according to the United Nations are even close to the one planet limit, most are well above the 2 Earths mark, with the exception of Romania.

This goes to show that all countries still have a lot of work to do in order to achieve a sustainable level of development on both social and environmental issues. As the UN Sustainable Development Goals are becoming the new benchmark for assessing the level of development of countries, one could question the use of the classification 'developed', 'in transition', and 'developing', as the scores on the ecological footprint clearly show that the so-called developed countries still have a way to go in order to achieve a sustainable level of consumption while maintaining a high standard of living.