%%html
<style>
p {
max-width: 45em;
word-wrap: break-word;
}
li{
max-width: 40em;
word-wrap: break-word;
margin:0 0 5px 0;
}
.output {
margin:0px 0px 0px 0px;
}
.output_png {
display: table-cell;
text-align: center;
vertical-align: middle;
margin:10px 0px 30px -30px;
}
</style>
Ecological Footprint and Human Development Index 2016
# Import packages and set package parameters
import os
import requests
from bs4 import BeautifulSoup
import json
import re
from math import sqrt, floor, ceil
import numpy as np
import pandas as pd
pd.set_option('precision',2) # Limit number of significant figures displayed in outputs
# pd.set_option('display.max_rows', None) # Display full pd.DataFrame
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10,6) # Set default figure size in inches (width, height)
import seaborn as sns
import altair as alt
pyv = !python --version
print(f'{"PACKAGE":<20} VERSION')
print(f'{"Python":<20} {str(pyv[0]).split()[1]}')
print('\n'.join(f'{m.__name__:<20} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
The goal of this data analysis is to study the relationship between the ecological footprint (EF) and the Human Development Index (HDI) of countries worldwide using the latest data available which dates from 2016 (as of October 2020).
The EF measures the surface area needed to provide the resources necessary to meet consumer demand in any given country. This includes the surface area used for food crops, fiber production, timber regeneration, absorption of carbon dioxide emissions from fossil fuel burning, and built infrastructure. Imports and exports are taken into account. Further details can be found here. This indicator is provided by the Global Footprint Network (GFN), a non-profit organization whose mission is to help end ecological overshoot by making ecological limits central to decision-making.
The HDI measures the level of development of a country by combining indices that assess three key dimensions: health, education and standard of living. Health is assessed by life expectancy at birth, education is measured by mean of years of schooling for adults aged 25 years and more and expected years of schooling for children of school entering age. Standard of living is measured by gross national income per capita. Further details can be found here. The HDI is calculated by the United Nations Development Programme using data from various sources.
This notebook presents the entire data analysis process starting with the extraction of the raw data from online sources, followed by data preparation operations and finishing with a few descriptive graphs, the one of key interest being the scatterplot showing the relationship betweenship ecological footprint and the HDI for countries worldwide.
The data concerning both the EF and the HDI can be accessed on the GFN website as an excel file containing all the data, as individual datasets through the Ecological Footprint Explorer open data platform or through the website's API for which an API key must be requested using their API Key Request Form to be able to access the data. The data is provided by the GFN under a Creative Commons Attribution-ShareAlike 4.0 International License.
For this data analysis, the data will be imported through the API as it makes the whole process easier to reproduce and requires fewer files.
# Set API key and username needed for authentication to access the GFN API
import config # Script file containing confidential authentication details
api_key = config.api_key # API key received from the GFN by email
user_agent = config.user_agent # User name which can be any string
Different datasets can be accessed through the API, the types dataset lists all the indicators that are available.
# Extract list of available types of indicators and check that API
# access is successful
url = 'http://api.footprintnetwork.org/v1/types'
request_types = requests.get(url, auth=(user_agent, api_key))
print(f'API HTTP response code: {request_types.status_code}') # Check that request is successful: HTTP code = 200
# Display json file contents listing available indicators
types_json = request_types.json()
df_types = pd.DataFrame(types_json)
df_types
For this data analysis, these three variables will be extracted: Earths, Population and Human Development Index. Earths measures how many Earths would be required to provide the surface required to meet consumption needs if everyone on the planet were to consume at the same level as the given country.
The HDI can also be accessed through the API of the United Nations Human Development Report Office where the data may be slightly more up to date as it is recalculated retroactively each year with the integration of new methodologies and updated data.
# Extract data on population, ecological footprint (in terms of number
# of Earths), and the Human Development Index
url = 'http://api.footprintnetwork.org/v1/data/all/2016/pop,earth,hdi'
request_ef2016 = requests.get(url, auth=(user_agent, api_key))
print(f'API HTTP response code: {request_ef2016.status_code}')
df_ef2016_raw = pd.DataFrame(request_ef2016.json())
df_ef2016_raw
The record column shows that the values of three variables of interest are contained on different lines of the value column. Country names and the ISO 3166-1 alpha-2 country codes are also needed for merging with other datasets later on.
# Select columns of interest
df_ef2016_select = df_ef2016_raw[['shortName', 'isoa2', 'record', 'value']]
df_ef2016_select
df_ef2016_select.info()
The record column needs to be split into separate columns containing the variables of interest and the columns can be renamed for better readability.
# Pivot table, reset index and rename columns
df_ef2016_pivot = df_ef2016_select.pivot_table(index=['shortName','isoa2'], columns=['record'], values='value').reset_index()
df_ef2016_pivot.columns = ['country', 'code', 'Earths', 'HDI', 'population']
df_ef2016_pivot
df_ef2016_pivot.info()
The dataset contains data on 187 countries, of which 12 are missing a value for the HDI (shown in a later section).
We can now look at how Earths and HDI are distributed.
# Create histogram function
def hist_align_bins(variable, xlabel, title, bin_width):
"""
Create a histogram with bins of a selected width and overlaid with a
kernel density estimator.
A list of bins for the histogram 'bins' argument is created based on
the range of the variable scores and on the selected width of bins.
The bins are automatically aligned with round units making the
histogram more readable.
Parameters
----------
variable: pandas Series with dtype integer or float
Numerical variable to plot in histogram.
xlabel: str
Label x axis.
title: str
Figure title.
bin_width: integer, float
Width of histogram bins in the units of the variable.
"""
leftmost_bin_edge = variable.min() - variable.min()%bin_width
rightmost_bin_edge = variable.max() - variable.max()%bin_width + 2*bin_width
bins_list = np.arange(leftmost_bin_edge, rightmost_bin_edge, bin_width)
ax_hist = sns.histplot(data=variable, bins=bins_list, kde=True, alpha= 0.9,
edgecolor='white', linewidth=0.5,
line_kws=dict(alpha=0.5, linewidth=1.5,
label='kernel density estimator'))
ax_hist.get_lines()[0].set_color('black') # manually edit line color due to bug with line_kws
# Additional formatting
ax_hist.set_xlabel(xlabel, size=12, labelpad=15)
ax_hist.set_ylabel('Count', size=12, labelpad=15)
plt.title(title, size=14, pad=30)
plt.legend(frameon=False)
sns.despine()
# Draw Earths histogram
hist_align_bins(variable=df_ef2016_pivot['Earths'],
xlabel='Earths',
title='Distribution of Ecological Footprint scores',
bin_width=0.5)
# Earths top 10 countries
df_ef2016_pivot.nlargest(10, 'Earths')
The list contains mainly fossil fuel producing countries with the exceptions of Luxembourg and Bermuda. As a wealthy country, Luxembourg has a high per capita consumption level and it also experiences a high level of tank tourism. Bermuda also is a wealthy country with a high per capita consumption level and it additionaly relies heavily on fossil fuels for all its energy needs.
# Earths bottom 10 countries
df_ef2016_pivot.nsmallest(10, 'Earths')
# Draw HDI histogram
hist_align_bins(variable=df_ef2016_pivot['HDI'],
xlabel='HDI',
title='Distribution of HDI scores',
bin_width=0.025)
# HDI top 10
df_ef2016_pivot.nlargest(10, 'HDI')
# HDI bottom 10
df_ef2016_pivot.nsmallest(10, 'HDI')
To get a better understanding of how much of the world is covered in the dataset, we can compare the countries listed in the dataset to a comprehensive list of countries considered to be independent States.
A list of independent States will be used as reference to better understand the coverage of the available data, with the assumption that countries or territories that are not independent are small in size and therefore have less impact on the global consumption of resources.
The list of Independent States in the World published by the United States Department of State (USDS) will be used as reference. It includes GENC country codes which are based on the ISO 3166 country code standard and will be used to merge with the other dataset. This data is in the public domain and may be copied and distributed without permission.
The USDS allows scraping of its website but for some reason the pandas function pd.read_html(url) does not work on this page so the BeautifulSoup package is used here instead.
# Check that webpage access is successful
url = 'https://www.state.gov/independent-states-in-the-world/'
response = requests.get(url)
print(f'API HTTP response code: {response.status_code}')
# Scrape table, code based on pluralsight.com/guides/extracting-data-html-beautifulsoup
# and kite.com/python/examples/4420/beautifulsoup-parse-an-html-table-and-write-to-a-csv
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, 'lxml')
table = soup.find('table')
output_rows = []
for table_row in table.findAll('tr'):
columns = table_row.findAll('td')
output_row = []
for column in columns:
output_row.append(column.text)
output_rows.append(output_row)
df_USDS_raw = pd.DataFrame(data=output_rows[1:], columns=output_rows[0])
df_USDS_raw
There are 195 independent States according to the USDS.
df_USDS = df_USDS_raw[['Short-form name', 'GENC 2A Code (see Note 2)']].copy()
df_USDS.columns = ['country_USDS', 'code']
df_USDS[30:60]
This sample of the table shows that country names contain extra characters and notes which can be removed for better readability.
# Remove extra characters and notes from country names
# Remove '(see note [digit])'
for name in df_USDS.loc[:,'country_USDS']:
if re.findall(r'[0-9]', name):
df_USDS.loc[:,'country_USDS'][df_USDS.loc[:,'country_USDS'] == name] = name[:-13]
# Remove '\n' '*+' white spaces and no-break space left of Eswatini
df_USDS.loc[:,'country_USDS'] = df_USDS.loc[:,'country_USDS']\
.str.replace('\n','')\
.str.lstrip('\xa0')\
.str.rstrip(' *+')
# Remove no-break space left of Eswatini country code
df_USDS.loc[:,'code'] = df_USDS.loc[:,'code'].str.lstrip('\xa0 ')
# Print subset of column to check that strings have been formatted correctly
df_USDS[30:60]
# Merge list of independent States with list of countries in EF dataset for comparison
df_ef2016_merge_outer = pd.merge(df_USDS, df_ef2016_pivot[['country', 'code']], how='outer', on='code')
df_ef2016_merge_outer
# Check that merge is successful
df_ef2016_merge_outer.info()
The following cell displays the independent States not included in the EF dataset.
missing_states = df_ef2016_merge_outer[df_ef2016_merge_outer['country'].isna()]
print(missing_states)
print()
print(f'Number of independent countries not included in EF dataset:\
{len(missing_states)}')
It is also interesting to see which are the 10 out of the 187 countries listed in the EF dataset that are not independent.
additional_territories = df_ef2016_merge_outer[df_ef2016_merge_outer['country_USDS'].isna()]
print(additional_territories)
print()
print(f'Number of dependent territories included in EF dataset:\
{len(additional_territories)}')
10 dependent territories are included in the EF dataset with a computed ecological footprint. As they are not independent states they have not been included in the HDI, along with North Korea and Somalia, as shown in the table below.
# Display EF dataset countries/territories with no HDI
df_ef2016_pivot[df_ef2016_pivot['HDI'].isna()]
Now that we better understand the coverage of the dataset, it would be interesting to include information about the region to which each country belongs as it would provide a better overview of the geographical distribution of the scores of both indicators.
This information can be obtained from different places. A list built by GitHub user lukes last updated on 19 March 2019 will be imported here. This list has been obtained by merging two sources, the Wikipedia ISO 3166-1 article table containing the alpha and numeric country codes, and the United Nations Statistics Division table containing regional, and sub-regional names and codes. The information on regions may also be obtained by downloading a CSV or Excel file from the United Nations Statistics Division page.
# Note: set keep_default_na to False to avoid Namibia code NA from being
# interpreted as NaN when merging dataframes later on
url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df_regions_raw = pd.read_csv(url, keep_default_na=False)
df_regions_raw
df_regions = df_regions_raw[['alpha-2', 'region']]
df_regions.columns = ['code', 'continent']
df_regions
# Merge df_ef2016_pivot with df_regions to get continents for all
# countries and territories
df_ef2016 = pd.merge(df_ef2016_pivot, df_regions, how='left', on='code')
df_ef2016
# Check that merge is successful
df_ef2016.info()
The merging appears to be successful as all countries and territories have an attributed continent.
Now that the dataset is ready for further analysis, it is time to look at the distribution of the country scores for the EF and HDI grouped by continent and to visualize the relationship between these two variables.
data=df_ef2016
var_cat='continent'
var_num='Earths'
# Draw boxplots overlaid with stripplots of EF scores grouped by continent
# Note: the boxplots are drawn with whiskers that reach out to farthest
# data point within the interval contained by the 5th and 95th percentiles
fig, ax = plt.subplots(figsize=(7, 7))
sns.boxplot(data=data, x=var_cat, y=var_num, whis=(5, 95), linewidth=1,
width=0.3, fliersize=False, saturation=1)
sns.stripplot(data=data, x=var_cat, y=var_num, color="k", alpha=0.6, size=3)
# Add horizontal line for one planet limit and additional formatting
ax.axhline(1, color='grey', ls='dashed', alpha=0.5)
ax.set_xlabel('Continent', size=12, labelpad=15)
ax.set_ylabel('Earths', size=12, labelpad=15)
sns.despine();
The dashed line represents the one planet limit. The boxplots are drawn with whiskers that reach out to farthest data point within the interval contained by the 5th and 95th percentiles. Countries from both Europe and Oceania are all consuming in excess of Earth's resources, as well as nearly all of North and South American countries with a few exceptions:
df_ef2016[(df_ef2016['Earths']<=1) & (df_ef2016['continent']=='Americas')]
Many Asian and African countries appear to be within the planet's limits. These can be identified in the scatterplot further below.
African countries are very frugal in their consumption, very few exceed the 2 Earths mark:
df_ef2016[(df_ef2016['Earths']>=2) & (df_ef2016['continent']=='Africa')]
Asian countries on the other hand show a very wide distribution in resource consumption. Their EF ranges from 0.3 Earth equivalents up to 8.8 with a median of 1.3:
df_ef2016[df_ef2016['continent']=='Asia']['Earths'].describe()
data=df_ef2016.dropna(axis=0, how='any') # Drop rows with NaN (present only in HDI column)
var_cat='continent'
var_num='HDI'
# Draw boxplots overlaid with stripplots of HDI scores grouped by continent
fig, ax = plt.subplots(figsize=(7, 7))
sns.boxplot(data=data, x=var_cat, y=var_num, whis=(5, 95), linewidth=1,
width=0.3, fliersize=False, saturation=1)
sns.stripplot(data=data, x=var_cat, y=var_num, color="k", alpha=0.6, size=3)
# Additional formatting
ax.set_xlabel('Continent', size=12, labelpad=15)
ax.set_ylabel('HDI', size=12, labelpad=15)
sns.despine();
The HDI scores are widely spread within each continent and a few countries stand out as high-performers while some others lag behind the rest. These outliers are the following:
# HDI scores bottom 5% of each continent
df_ef2016.groupby('continent')\
.apply(lambda df: df[df['HDI'] <= df['HDI'].quantile(0.05)])
# HDI scores top 5% of each continent
df_ef2016.groupby('continent')\
.apply(lambda df: df[df['HDI'] >= df['HDI'].quantile(0.95)])
# Draw interactive scatterplot of EF vs HDI using Altair library
base = alt.Chart(df_ef2016)
# Draw circles
circle = base.mark_circle(clip=True).encode(
x=alt.X('HDI', title='Human Development Index 2016'),
y=alt.Y('Earths', title='Ecological Footprint (Earths equivalent)'),
color='continent',
tooltip=['country',
alt.Tooltip('HDI:Q', format='.2f'),
alt.Tooltip('Earths:Q', format='.2f'),
alt.Tooltip('population:Q', format=',')]
).properties(
width=600,
height=400
).interactive()
# Draw horizontal dashed line
Earth_limit = alt.Chart(pd.DataFrame({'y': [1]})).mark_rule(
color="#808080",
strokeDash=[4,4]
).encode(
y='y')
# Draw vertical dashed line
HDI_median = base.mark_rule(
color="#808080",
strokeDash=[4,4]
).encode(
x='median(HDI)'
)
circle + Earth_limit + HDI_median
The horizontal dashed line marks the one planet limit. The vertical dashed line marks the median HDI. Nearly all countries consuming resources within Earth's total biocapacity score low on the HDI. The only exception is Sri Lanka which has an HDI score of 0.77 while maintaining consumption at a sustainable level. It is followed by Jamaica with an HDI score of 0.73 just below the median HDI. All other countries in the the top half of HDI scores are consuming resources at an unsustainable level.
The duty of policy-makers and other development stakeholders is to shift countries to the bottom-right quadrant. This means that those in high-consumption countries must put in place policies to decrease resource use, while those in low-consumption countries must find alternative ways to increase health, education and standard of living, as they cannot follow the development path of high-consumption countries.
In facing these challenges, it can be interesting to look more closely at which countries are the most efficient in resource consumption relative to their HDI score. Knowing which countries are most resource-efficient per HDI point may bring to light successful policies that may serve as examples for others. The development efficiency indicator is computed in the next section to identify those countries.
df_ef2016['development_efficiency'] = df_ef2016['HDI']/df_ef2016['Earths']
df_ef2016['development_efficiency'].describe()
# Draw development efficiency histogram
hist_align_bins(variable=df_ef2016['development_efficiency'],
xlabel='Development Efficiency (HDI/Earths)',
title='Distribution of Development Efficiency scores',
bin_width=0.1)
# Development efficienty top 10
df_ef2016.nlargest(10, 'development_efficiency')
Among the top 10 countries in terms of development efficiency, none are even close to the median HDI, only Timor-Leste and Tajikistan make it above the 0.6 mark. To get a list of potential standard-setters, we can further refine the list by setting a minimum HDI.
# Select top 10 countries with near and above median HDI
df_ef2016[df_ef2016['HDI'] >= 0.7].nlargest(10, 'development_efficiency')
As observed in the scatterplot, only Sri Lanka and Jamaica are within the one planet limit, with a near-median HDI. Uruguay stands out as the only country reaching the 0.8 HDI mark. Albania makes it in the list as the only European country and, along with the last three countries in the list, it is considered as an economy in transition by the United Nations country classification in the World Economic Situation and Prospects report from 2019. The other 6 countries are considered to be developing economies.
Each of these countries would have to be studied further to better understand how they score so high on the HDI in such a resource-efficient manner. Some of them may have policies in place that could serve as examples of good practice for other countries facing similar challenges.
Finally, we can have a look at what countries are the least resource efficient relative to the HDI:
# Development efficiency bottom 10
df_ef2016.nsmallest(10, 'development_efficiency')
These are mainly fossil fuel producing countries with the exception of Luxembourg. These countries would have to be investigated further to better understand why they make it into this list.
This analysis has shed some light on the situation of countries regarding their ecological footprint and their level of development. Sri Lanka stands out as being the only country with a sustainable level resource consumption while achieving an above-median score on the HDI. None of the countries among what are considered as developed economies according to the United Nations are even close to the one planet limit, most are well above the 2 Earths mark, with the exception of Romania.
This goes to show that all countries still have a lot of work to do in order to achieve a sustainable level of development on both social and environmental issues. As the UN Sustainable Development Goals are becoming the new benchmark for assessing the level of development of countries, one could question the use of the classification 'developed', 'in transition', and 'developing', as the scores on the ecological footprint clearly show that the so-called developed countries still have a way to go in order to achieve a sustainable level of consumption while maintaining a high standard of living.