
Data Science Infrastructure for Global Health
An R package for global-health data work. Bundles country metadata
and lightweight clients for the WHO Global Health Observatory
and UN Sustainable Development
Goals APIs, plus reusable WHO-style themes for ggplot2
and flextable so that charts and tables produced from this
data look consistent across reports.
Documentation: https://shanlong-who.github.io/DSIR/
# from CRAN
install.packages("DSIR")
# or the development version from GitHub
# install.packages("remotes")
remotes::install_github("shanlong-who/DSIR")who_countries — a tibble of all 194 WHO
Member States with ISO3, ISO2, UN M49 codes, official and short names,
WHO region, and a is_pic flag for Pacific Island
Countries.
library(DSIR)
library(dplyr)
who_countries
# Filter Member States in the Western Pacific Region
who_countries |>
filter(who_region == "WPR") |>
select(iso3, name_short, is_pic)Regional ISO3 vectors — convenience character
vectors for each WHO region, derived from
who_countries:
wpro_cty # 28 Western Pacific Member States (since May 2025)
afro_cty # 47 African Region Member States
amro_cty # 35 Region of the Americas Member States
searo_cty # 10 South-East Asia Region Member States
euro_cty # 53 European Region Member States
emro_cty # 21 Eastern Mediterranean Region Member States
pic_cty # 14 Pacific Island Country Member States (subset of WPR)iso3_to_region() — quick lookup from
ISO3 codes to WHO regions. Vectorized; returns NA for codes
not matching a Member State.
iso3_to_region("PHL") # "WPR"
iso3_to_region(c("PHL", "FRA", "ZAF", "XYZ")) # "WPR" "EUR" "AFR" NAiso3_to_m49() /
m49_to_iso3() — convert between ISO3 codes and UN
M49 numeric codes in either direction. Useful for moving between GHO
(ISO3) and SDG (M49) workflows. Case-insensitive on the ISO3 side; both
accept zero-padded or bare M49 input on the M49 side; non-Member areas
return NA.
iso3_to_m49("PHL") # "608"
iso3_to_m49(c("PHL", "FRA", "JPN")) # "608" "250" "392"
iso3_to_m49(c("PRI", "PHL")) # NA "608"
m49_to_iso3(c("608", "250", "392")) # "PHL" "FRA" "JPN"
m49_to_iso3(c("076", "76")) # "BRA" "BRA"
m49_to_iso3("900") # NA (world aggregate)In practice you rarely need to call these directly:
sdg_data() and sdg_coverage() accept ISO3
codes for their area argument and convert via
iso3_to_m49() internally, and sdg_clean()
populates its iso3 column via
m49_to_iso3().
theme_dsi() and
theme_dsi_facet() — publication-ready
ggplot2 themes. Use theme_dsi() for
single-panel charts and theme_dsi_facet() for faceted
plots; the facet variant adds panel borders, light strip backgrounds,
and panel spacing tuned for multi-panel layouts.
library(ggplot2)
library(dplyr)
# Single panel — theme_dsi()
who_countries |>
count(who_region) |>
ggplot(aes(reorder(who_region, n), n)) +
geom_col(fill = "#0093D5") +
coord_flip() +
scale_y_dsi_col() +
theme_dsi() +
labs(title = "WHO Member States by region", x = NULL, y = NULL)
# Faceted — theme_dsi_facet()
who_countries |>
count(who_region, is_pic) |>
ggplot(aes(reorder(who_region, n), n, fill = is_pic)) +
geom_col() +
coord_flip() +
scale_y_dsi_col() +
facet_wrap(~ is_pic, labeller = as_labeller(
c(`TRUE` = "Pacific Island Countries", `FALSE` = "Other Member States")
)) +
scale_fill_manual(values = c(`TRUE` = "#0093D5", `FALSE` = "grey70"),
guide = "none") +
theme_dsi_facet() +
labs(title = "WHO Member States by region and PIC status",
x = NULL, y = NULL)scale_y_dsi_col() and
scale_x_dsi_col() — drop-in replacements for
scale_y_continuous() and scale_x_continuous()
that remove the default lower expansion, so columns in bar charts sit
flush with the axis instead of floating above it. Pick the one that
matches where you mapped the value: scale_y_dsi_col() when
value is the y aesthetic (including horizontal
bars made with coord_flip() — the aesthetic is still
y), and scale_x_dsi_col() when
value is the x aesthetic (e.g.
geom_col(aes(value, category))). Both accept any argument
that scale_*_continuous() accepts.
# Vertical bars
ggplot(mtcars, aes(factor(cyl))) +
geom_bar(fill = "#0093D5") +
scale_y_dsi_col() +
theme_dsi() +
labs(title = "Cars by cylinder count", x = "Cylinders", y = NULL)dsi_flextable_defaults() — one-line
setup for flextable formatting (booktabs style, bold
headers, paddings).
dsi_flextable_defaults()ggpie() — quick pie charts with
automatic percentage labels.
df <- data.frame(
region = c("AFR", "AMR", "EUR", "WPR", "SEAR", "EMR"),
countries = c(47, 35, 53, 28, 10, 21)
)
ggpie(df, "region", "countries", .offset = 1.2)geomean() — geometric mean, with
optional weights. Useful for aggregating ratio-based health indicators
where the composite is multiplicative — e.g. UHC service-coverage
tracers.
geomean(c(0.6, 0.8, 0.95)) # ~0.772
geomean(c(0.6, 0.8, 0.95), w = c(2, 1, 1)) # weighted versionCheck availability before downloading. GHO has thousands of indicators but any one of them may not cover the countries or years you need. Three lightweight helpers ask the server what is available without transferring observations:
# Yes / no for a given indicator + filter (TRUE / FALSE / NA on failure)
gho_has_data("WHOSIS_000001", area = "FRA")
# Row count the same filter would return — useful for sizing a download
gho_count("WHOSIS_000001", area = wpro_cty)
# Per-country year coverage and observation counts
gho_coverage("WHOSIS_000001", area = c("FRA", "DEU", "JPN"))
#> location year_min year_max n_obs
#> 1 DEU 2000 2021 66
#> 2 FRA 2000 2021 66
#> 3 JPN 2000 2021 66Fetch and clean. The typical workflow is search → fetch → clean:
# Search indicators by keyword
gho_indicators("mortality")
# Discover available dimensions for an indicator
gho_dimensions("NCDMORT3070")
gho_dimensions("NCDMORT3070", dimension = "Dim1")
# Fetch country-level data
gho_data("NCDMORT3070", spatial_type = "country")
# Fetch with area and year filters
gho_data(
indicator = "WHOSIS_000001",
area = wpro_cty,
year_from = 2015
)
# Tidy the raw GHO response
raw <- gho_data("NCDMORT3070", spatial_type = "country", area = wpro_cty)
gho_clean(raw)gho_clean() produces the unified DSIR
cleaned-indicator schema — the same 15-column shape produced by
sdg_clean(), so GHO and SDG output can be combined directly
with bind_indicators() (see below). Output columns:
source, id, indicator,
location, iso3, location_name,
year, value, value_num,
low, high, series,
dim1–dim3. Source columns missing from the raw
response (e.g. Low / High for indicators
without confidence intervals) are filled with typed NA.
Same pattern as GHO: search → fetch → clean.
# Browse goals, targets, indicators, and geographic areas
sdg_goals()
sdg_targets()
sdg_indicators()
sdg_areas()
# Search indicators by keyword — AND semantics, case-insensitive
# substring match on the indicator description (client-side filter)
sdg_indicators("mortality")
sdg_indicators("mortality cancer")
sdg_indicators(c("maternal", "mortality"))
# Fetch indicator data — `area` accepts ISO3 codes (converted internally)
# or UN M49 numeric codes. ISO3 lets DSIR's regional vectors be passed
# directly, the same way they work with the GHO client.
sdg_data("3.2.1", area = "PHL", year_from = 2015, year_to = 2023)
sdg_data("3.4.1", area = wpro_cty)
# M49 also works (e.g. when copy-pasting codes from sdg_areas())
sdg_data("3.2.1", area = "608", year_from = 2015, year_to = 2023)
# Tidy the SDG response
raw <- sdg_data("3.2.1", area = "PHL")
sdg_clean(raw)sdg_clean() produces the same unified 15-column
schema as gho_clean(): source,
id, indicator, location,
iso3, location_name, year,
value, value_num, low,
high, series,
dim1–dim3. SDG-side fields populate
id (the indicator code, e.g. "3.4.1"),
indicator (the human-readable series description),
location (UN M49 numeric code), iso3 (via
m49_to_iso3() — NA for region / world
aggregates and non-Member areas), location_name, and
series. The GHO-only dim1–dim3
columns are NA for SDG rows. value is kept as
character to preserve non-numeric entries ("<0.1",
aggregate notes); value_num is the numeric coercion
(NA where coercion fails).
Because both cleaners produce the same schema, you can stack their
output with bind_indicators() and use the
source column to keep track of where each row came
from:
gho <- gho_data("NCDMORT3070", area = wpro_cty) |> gho_clean()
sdg <- sdg_data("3.4.1", area = wpro_cty) |> sdg_clean()
bind_indicators(gho, sdg)Exploring series. A single SDG indicator often
contains several series — for example different vaccines, sex strata, or
causes of death — each with its own country / year coverage.
sdg_coverage() summarises the year range and observation
count per (location, series) so you can see what is
available before deciding which series to analyse.
# 3.b.1 (vaccine coverage) is published as four separate series
sdg_coverage("3.b.1", area = c("156", "608"))
#> location series year_min year_max n_obs
#> 1 156 SH_ACS_DTP3 2000 2023 24
#> 2 156 SH_ACS_HPV 2018 2023 6
#> 3 156 SH_ACS_MCV2 2000 2023 24
#> 4 156 SH_ACS_PCV3 2017 2023 7
#> 5 608 SH_ACS_DTP3 2000 2023 24
#> 6 608 SH_ACS_HPV 2017 2023 7
#> 7 608 SH_ACS_MCV2 2000 2023 24
#> 8 608 SH_ACS_PCV3 2014 2023 10GHO-style has_data() / count() helpers are
intentionally not provided for SDG because SDG data is generally
complete enough that pre-flight checks add little value.
MIT — © 2026 Shanlong Ding