This repository contains data analysis projects stored as notebooks created with Jupyter Notebook.
View notebook with code, without code
Developed in October-December 2020, updated in March 2021
This is a template notebook created to greatly speed up the exploratory data analysis process of data science projects. Once a dataset is imported and cleaned, running all the cells in the notebook prints out an overview of the variables it contains. The plots implement the visualization principles of Edward Tufte and Stephen Few and are inspired by the capabilities of the seaborn package which provides the very useful pairplot function that displays all numerical-numerical relationships in a single plot. Unfortunately, the package does not provide similar functions for visualizing numerical-categorical and categorical-categorical relationships.
There exists other packages such as Pandas Profiling, Sweetviz and DataPrep which provide useful EDA tools, but they also lack these types of visualizations. So this notebook attempts to fill this gap by making use of advanced plotting functions that provide a concise visual overview of any dataset within a few seconds.
Currently, this notebook handles datasets which contain up to about 15 variables of which at least one must be numerical. It designed to be viewed on a white background. It is under regular development as I work on analyzing new datasets. Future updates may include converting plots to interactive plots using packages such as plotly, adding steps to deal with datasets containing many variables, and processing time series more thoroughly.
Alternative links:
View notebook with code, without code
Developed in September 2020
This notebook analyzes the ecological footprint of countries worldwide in relation to their score on the Human Development Index. Data is imported through the Global Footprint Network API and merged with other data sources for the analysis.
Alternative links:
View notebook with code, without code
Developed in September 2020
This notebook illustrates the process of exploratory data analysis with Jupyter Notebook using Python. A sample dataset from the seaborn package is analyzed using a systematic approach to visualize the distribution of variables and associations between variables. This approach can be applied to any other dataset containing similar types of variables.
Alternative links:
The code in this repository is released under a MIT license. Read more here.
The text content contained in the Jupyter Notebook files is released under a Creative Commons Attribution 4.0 International License. Read more at Creative Commons.