Statistical dashboard · real data · real inference

Does national income predict how long people live?

An OLS regression of life expectancy on log⁢10(GDP per capita) across country-year observations from the public Gapminder dataset (1952–2007).

Slope (years of life / 10× income) 95% CI …
p-value two-sided t-test, H₀: slope = 0
variance explained
n observations in current selection

Every number above is computed live from data/findings.json / client-side OLS — nothing on this page is hand-typed.

Continent
1952 – 2007

Income vs. life expectancy

Each point is one country in one year. X-axis is log-scaled GDP per capita. Line is the OLS fit for the current selection.

Scatter plot of GDP per capita versus life expectancy Log-scaled GDP per capita on the x-axis, life expectancy in years on the y-axis, points colored by continent, with an overlaid OLS regression line.
Filtered fit: slope years per 10× income, 95% CI , p , R² , n

Mean life expectancy by continent

Bars show the mean for the current selection; error bars are 95% confidence intervals (t-distribution).

Bar chart of mean life expectancy by continent Five bars, one per continent, showing mean life expectancy with 95 percent confidence interval error bars for the current filter selection.
One-way ANOVA across continents (current selection): F , p

Methodology

Model

Headline model: lifeExp = β₀ + β₁ · log⁢10(gdpPercap) + ε, fit by ordinary least squares. log⁢10 is used because income is heavily right-skewed and its effect on lifespan is plausibly multiplicative, not additive — common practice in the development-economics literature.

A robustness check controlling for survey year (+ β₂·year_centered) is also computed server-side; see data/findings.json → headline_year_controlled. The slope on log GDP stays large and significant after adding year, so the relationship is not purely an artifact of the whole world getting richer and living longer over time together.

What the CI and p-value mean

The 95% confidence interval for the slope is the range of values that would contain the true slope in 95% of repeated samples drawn the same way, under the model's assumptions. The p-value is the probability of observing a slope this far from zero, by chance, if the true slope were exactly zero (the null hypothesis). Small p-values here reflect the very large sample size (n ≈ 1700) as well as a genuinely strong association.

Client-side recomputation

When you change the continent or year filters, the regression is not looked up from a table — it is refit in your browser using the closed-form OLS normal equations (β = (XᵀX)⁻¹Xᵀy) with a t-distribution-based 95% CI and two-sided p-value, implemented in app.js. This is the same statistical method the Python analysis uses, just re-derived in JavaScript so the inference for any subset is honest rather than interpolated.

Dataset

Loading dataset provenance…

Limitations

  • Observational, not experimental. No country was randomly assigned an income level. The association is consistent with income causing longer life (via nutrition, healthcare, sanitation) but also with reverse causation or confounding (e.g., institutions, education, conflict) driving both.
  • Country-level aggregation. GDP per capita and life expectancy are national averages; they say nothing about within-country inequality.
  • Five-year sampling. Data are recorded at five-year intervals (1952–2007), so short-term shocks (wars, epidemics, financial crises) between sample years are invisible.
  • Historical window. The series ends in 2007 and predates recent shifts in global health and income; it should not be extrapolated to the present.
  • Continent is a coarse grouping. It blends very different national trajectories within Africa, Asia, and the Americas in particular.
  • Small-sample subsets. Filtering to a single continent and a narrow year range can leave very few observations (see the live n) — the displayed CI widens accordingly, but extreme filtering can still produce unstable estimates.