Skip to content

davidschmieg/life-expectancy

Repository files navigation

##Executive Summary:

Life expectancy is a key indicator of a population’s overall health and quality of life. It reflects not only the effectiveness of a country’s healthcare system, but also broader social, economic, and environmental conditions. Yet, life expectancy varies significantly across countries and regions — and even between genders within the same nation. To better understand these disparities, our team analyzed a Life Expectancy dataset from Kaggle, which draws on data from the World Health Organization (WHO). The dataset includes a wide range of variables, such as economic indicators, healthcare access, education levels, and demographic factors. Our objective is to answer the question: Why do some countries achieve higher life expectancy than others, and what are the most influential factors driving these differences? Through data analysis, we explore how factors such as economic strength, healthcare investment, education systems, and public health policies influence longevity. This project aims to identify the key drivers of life expectancy and uncover actionable insights that can inform global health strategies and policy interventions. Our findings will help clarify which factors have the greatest impact on life expectancy, offering a data-driven perspective on how nations can improve the well-being and longevity of their populations. We have put together a Tableau workbook that includes various charts and graphs such as a life expectancy heatmap, the trend for life expectancy over time, and a life expectancy versus GDP scatterplot. We have also included an additional three scatterplots that compare life expectancy versus alcohol, income composition and schooling. Life Expectancy Heatmap: This map provides a global snapshot, illustrating how life expectancy varies by country. It highlights stark geographic disparities, with certain regions in Africa and South Asia lagging behind their global counterparts. Life Expectancy Over Time: This line chart shows life expectancy trends across different regions and countries over the past two decades. It reveals both global improvements and persistent regional gaps, offering a temporal view of progress and stagnation. Life Expectancy vs. GDP Scatterplot: This chart explores the relationship between a country's economic output and its average lifespan. The trend suggests a strong positive correlation between GDP per capita and life expectancy, though with diminishing returns at higher income levels. Life Expectancy vs. Alcohol Consumption: This scatterplot examines the impact of alcohol consumption on longevity. While moderate consumption varies by region, countries with higher levels often show reduced life expectancy, pointing to behavioral health as a factor. Life Expectancy vs. Income Composition of Resources: This visualization investigates the role of income equality and resource distribution in shaping health outcomes. A higher income composition index tends to align with longer life expectancy. Life Expectancy vs. Schooling: Education is a powerful social determinant of health. This scatterplot shows a clear trend: countries with higher average years of schooling tend to have higher life expectancies, reinforcing the link between education and long-term well-being.

Basic Info: • Project Title: Life Expectancy Unlocked: Patterns, Correlations, and Global Insights • Team: The LifeLine Analysts (Harpreet Singh & David Schmieg)

Data: We are using the Life Expectancy data available from Kaggle. The data is compiled from reliable sources (WHO) and is available in 2 versions, one with a split by gender to analyze additional details by gender. We plan on using these in combination. Below are the sources of the data. https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who/data https://www.kaggle.com/datasets/maryalebron/life-expectancy-data?resource=download The data has 25 attributes (columns) and 2938 rows for 193 countries. Below we have provided the data dictionary. Country: The country to which the data belongs. Year: The year in which the data was collected. Status: Whether the country is classified as "Developing" or "Developed". Life expectancy (combined): The average life expectancy of men and women in that country for that year. Adult Mortality (men): The mortality rate amongst adult men in that country for that year. Adult Mortality (women): The mortality rate amongst adult women in that country for that year. Infant deaths: The number of infant deaths in that country for that year. Alcohol: Per capita alcohol consumption (in litres of pure alcohol) in that country for that year. Percentage expenditure: Expenditure on health as a percentage of Gross Domestic Product per capita(%). Hepatitis B (men): Hepatitis B vaccination coverage in men (%). Hepatitis B (women): Hepatitis B vaccination coverage in women (%). Measles: Number of reported cases of measles in that country for that year. BMI: Average Body Mass Index of the country's population. Under-five deaths: Number of deaths under five years old. Polio: Polio (Pol3) immunization coverage among 1-year-olds (%). Total expenditure: General government expenditure on health as a percentage of total government expenditure (%). Diphtheria: Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%). HIV/AIDS: Deaths per 1 000 live births HIV/AIDS (0-4 years). GDP: Gross Domestic Product per capita (in USD). Population: Population of the country. thinness 1-19 years: Prevalence of thinness among children and adolescents for Age 10 to 19 (%). thinness 5-9 years: Prevalence of thinness among children for Age 5 to 9(%). Income composition of resources: Human Development Index in terms of income composition of resources (index ranging from 0 to 1). Schooling: Number of years of Schooling(years). We performed several data preprocessing steps to ensure the dataset was complete and suitable for analysis. We observed that some countries were missing values for key variables such as GDP per capita and average years of schooling. To address this, we filled in missing values using regional averages or interpolation methods, depending on the availability and pattern of the data. This allowed us to retain more countries in the analysis without compromising the overall integrity of the trends. The original dataset included separate life expectancy values for men and women. To simplify the analysis and focus on broader trends, we calculated and used combined (total population) life expectancy values rather than analyzing gender-specific life spans. The dataset we used was obtained from Kaggle, and it was compiled by the World Health Organization (WHO). It included a wide range of variables from a single data source, so we did not need to join multiple datasets. We used Microsoft Excel and Python for initial data cleaning and preprocessing. Final visualizations and exploratory analysis were performed in Tableau, where we created an interactive dashboard to explore the relationships between life expectancy and other variables. Visualization: For our interactive dashboards, we built user-friendly interfaces that enables dynamic exploration of life expectancy data across countries and over time. The central view is a Life Expectancy Heatmap, where each country is colored based on its average life expectancy. We used a green color palette to highlight countries with relatively high life expectancy, and red to highlight countries with relatively low life expectancy, leveraging luminance and hue to support intuitive comparisons. The map features a link and brush interaction: when a user selects a country, related year-by-year data, such as GDP and alcohol consumption is automatically highlighted, supporting a coordinated multi-view analysis.

Users can also filter the data by year, enabling temporal comparisons.

We chose circle marks for all scatterplots (e.g., life expectancy vs. GDP, alcohol, income composition, and schooling) because they are visually distinctive, allow for high data density, and prevent overlap when layered across multiple countries. For the scatterplots, we encoded life expectancy as the vertical y-axis, and other attributes as the horizontal x-axis, and used position as the primary channel, which is highly effective for showing quantitative relationships. This design helps users identify non-linear and linear trends. Our visual encoding choices prioritize clarity, comparability, and perceptual effectiveness, aligning with best practices in information visualization.

Reflection: The most enjoyable part of this project was exploring global life expectancy patterns and bringing the data to life through interactive visualizations. Seeing complex relationships, like the link between education, income, and longevity, emerge in a clear, visual way was both satisfying and insightful. The least enjoyable part was cleaning and preprocessing the dataset, particularly dealing with missing values for key variables like GDP and schooling. Our project evolved significantly from the initial proposal. Initially, we envisioned a basic comparison of life expectancy across regions, but as the project progressed, from proposal to first submission to the final product, we expanded our scope to include multi-variable comparisons and added interactive features like filtering by year and link-and-brush highlighting. Our visualization goals shifted from simply presenting data to enabling user-driven exploration of the most influential factors behind life expectancy. Technically, we aimed higher as we became more comfortable with Tableau’s features. While our original proposal was mostly realistic, we did encounter some limitations, particularly in customizing certain interactive behaviors like conditional formatting based on multiple variables. As a workaround, we focused on coordinated views and clear visual encodings to achieve our goals. If we were to start again, we would better structure our data in the preprocessing stage to streamline integration into Tableau and allow for more advanced calculations and visual interactions. Overall, this project deepened our understanding of both the data and the tools, and it highlighted the value of design iteration in building effective visualizations. Project Management & Team Assessment: As the project progressed, some tasks took longer than expected, especially data cleaning and preprocessing, due to missing values and the need for interpolation. Additionally, we added new tasks related to enhancing interactivity and final polishing of the dashboard, which were not in the original plan. Both David and Harpreet collaborated on data cleaning. Harpreet led the development of most visualizations, including the heatmap and scatterplots, while David contributed to a few charts and focused on writing the final report and documentation. Below is the updated table:

Task Estimated Hours Actual Hours Team Member(s) Notes Project proposal & planning 4 hrs 3 hrs David & Harpreet Initial project scoping and task planning Dataset exploration & selection 3 hrs 2.5 hrs David & Harpreet Chose WHO life expectancy dataset from Kaggle Data cleaning & preprocessing 15 hrs 30 hrs David & Harpreet Filled missing GDP/schooling data, combined life expectancy values Initial visualizations 8 hrs 10 hrs Harpreet Created heatmap, trend chart, scatterplots Additional visualizations 6 hrs 7 hrs David Contributed alternate scatterplots Tableau interactivity (filter, link & brush) 4 hrs 6 hrs Harpreet Added year filter, link & brush to heatmap Visual design refinement 3 hrs 4 hrs Harpreet Color palette, circle marks, layout tuning Written report and analysis 6 hrs 8 hrs David Executive summary, visual encoding rationale, reflection Final QA and polish 2 hrs 2 hrs David & Harpreet Final review of dashboard and report formatting Recorded presentation – 20 min David & Harpreet Static visualizations and captioning for submission

Credits: We used a Life Expectancy dataset from Kaggle, originally compiled by the World Health Organization (WHO). For inspiration, we consulted Tableau Public dashboards related to health and socioeconomic data. While we did not build upon any existing code, we utilized functionality additions in Tableau, including clustering, forecasting, interactive filtering by year, link-and-brush features, and customized visual encoding using color, shape, and position to enhance clarity and usability.

About

Life expectancy is a key indicator of a population’s overall health and quality of life. It reflects not only the effectiveness of a country’s healthcare system, but also broader social, economic, and environmental conditions. Yet, life expectancy varies significantly across countries and regions — and even between genders within the same nation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors