Resizing Polygons by Population Using sf

tidytuesday sf ggplot2 maps

January 19, 2021 Tidy Tuesday submission.

Arthur Gailes
01-25-2021
Resizing Kenya’s county polygons

A quick demonstration of affline transformations with the Kenya Census

Data Source

Thanks as always to the TidyTuesday crew at R4DS.

The data this week comes from rKenyaCensus courtesy of Shelmith Kariuki. Shelmith wrote about these datasets on her blog.

rKenyaCensus is an R package that contains the 2019 Kenya Population and Housing Census results. The results were released by the Kenya National Bureau of Statistics in February 2020, and published in four different pdf files (Volume 1 - Volume 4).

The 2019 Kenya Population and Housing Census was the eighth to be conducted in Kenya since 1948 and was conducted from the night of 24th/25th to 31st August 2019. Kenya leveraged on technology to capture data during cartographic mapping, enumeration and data transmission, making the 2019 Census the first paperless census to be conducted in Kenya

Additional details about Kenya can be found on Wikipedia.

Kenya, officially the Republic of Kenya (Swahili: Jamhuri ya Kenya), is a country in Eastern Africa. At 580,367 square kilometres (224,081 sq mi), Kenya is the world’s 48th largest country by total area. With a population of more than 47.6 million people in the 2019 census, Kenya is the 29th most populous country. Kenya’s capital and largest city is Nairobi, while its oldest city and first capital is the coastal city of Mombasa.

Load the weekly Data

   tidyverse tidytuesdayR         here      ggplot2     ggthemes 
        TRUE         TRUE         TRUE         TRUE         TRUE 
rKenyaCensus           sf    gganimate 
       FALSE         TRUE        FALSE 

Dowload the weekly data and make available in the tt object. Use the rKenyaCensus to do so, adding the polygons for each county.

# grab 3 tables of interest
crops <- rKenyaCensus::V4_T2.21
gender <- rKenyaCensus::V1_T2.2
households <- rKenyaCensus::V1_T2.3
counties <- rKenyaCensus::KenyaCounties_SHP %>% 
  st_as_sf %>% st_transform(4326)

# write them out
households %>% 
  write_csv("households.csv")

gender %>% 
  write_csv("gender.csv")

crops %>% 
  write_csv("crops.csv")

st_write(counties, "counties.geojson", delete_dsn = T)

After saving, load from the directory directly:

households <- read_csv("households.csv")
gender <- read_csv("gender.csv")
crops <- read_csv("crops.csv")
counties <- st_read("counties.geojson")
Reading layer `counties' from data source `/home/kyouma_des/GoogleDrive/AG Resume and Work Stuff/website/_posts/2021-01-19-tidy-tuesday/counties.geojson' using driver `GeoJSON'
Simple feature collection with 47 features and 7 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 33.9105 ymin: -4.679729 xmax: 41.91056 ymax: 5.466978
geographic CRS: WGS 84

Glimpse Data

Take an initial look at the format of the data available.

head(counties)
Simple feature collection with 6 features and 7 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 33.9105 ymin: -1.038021 xmax: 37.93755 ymax: 1.653522
geographic CRS: WGS 84
           County Population    Area     PD  PDR  Factor     CRI
1         BARINGO     666763 11075.3  60.20 0.71 0.00951 0.02902
2           BOMET     875689  1997.9 438.30 5.20 0.06927 0.21127
3         BUNGOMA    1670570  2206.9 756.98 8.98 0.11963 0.36487
4           BUSIA     893681  1628.4 548.81 6.51 0.08673 0.26453
5 ELGEYO/MARAKWET     454480  3049.7 149.02 1.77 0.02355 0.07183
6            EMBU     608599  2555.9 238.12 2.82 0.03763 0.11478
                        geometry
1 MULTIPOLYGON (((35.78413 1....
2 MULTIPOLYGON (((35.45192 -0...
3 MULTIPOLYGON (((34.62083 1....
4 MULTIPOLYGON (((33.91369 0....
5 MULTIPOLYGON (((35.5598 1.2...
6 MULTIPOLYGON (((37.55331 -0...
head(gender)
# A tibble: 6 x 5
  County         Male   Female Intersex    Total
  <chr>         <dbl>    <dbl>    <dbl>    <dbl>
1 Total      23548056 24014716     1524 47564296
2 Mombasa      610257   598046       30  1208333
3 Kwale        425121   441681       18   866820
4 Kilifi       704089   749673       25  1453787
5 Tana River   158550   157391        2   315943
6 Lamu          76103    67813        4   143920

Wrangle

Transform country size based on population, then append to the existing dataset. The polygon transformation is conducted in county_size.

centroids <- counties %>% st_geometry %>% 
  st_transform(3857) %>% st_centroid %>% st_transform(4326)

county_form <- transmute(counties, county=tools::toTitleCase(County), population=as.numeric(Population), 
  pop_scale=scales::rescale(population, c(0.01,1)), centroid = centroids, type='Colored'
  ) %>% st_set_crs(4326)

# shift sizes around centroids
county_size <- mutate(county_form, geometry = (geometry-centroid)*pop_scale+centroid,
  type='Sized') %>% st_set_crs(4326) 

# replicate the original shape with NA for population so the color fill can animate
county_blank <- mutate(county_form, population = NA, type='Default')

# bind together for gganimate so I can switch between fill states
county_anim <- bind_rows(county_blank, county_form, county_size) %>% 
  mutate(type = factor(type, c("Default",'Colored','Sized')))

Visualization

How is Kenya’s population distributed across counties? Resize the polygons from 1% to 100% of original size based on the population size. Gif included at the top; this is the code that generated it.

anim <- ggplot(county_anim) +
  geom_sf(aes(fill=population)) +
  # animation specs
  transition_states(
    type, transition_length=1, state_length = 2
  )+
  ease_aes('cubic-in-out')+
  enter_fade()+
  # back to ggplot; add counties with no fill so outlines stay consistent
  geom_sf(data=counties, fill=NA) +
  scale_fill_steps(n.breaks=4, low="#bdc9e1", high="#045a8d", labels=scales::comma,
    guide='legend', trans='log10') +
  labs(title="Kenya's Counties", fill='Population') +
  theme_map() +
  theme(
    title=element_text(size=16, color='black')
  )
  
animate(anim, nframes = 20)

Save

anim_save('kenya_county_size.gif', anim)

Review

An obvious limitation to this approach is that a polygon can only go up to 100% of its original size without exceeding its original boundaries. In this map, the smallest county has the largest population, so the visual impact of the shifts is really messy. A possible cleanup strategy would be to make the original polygons to make them smaller, and so that the population-scaled polygons could increase to fit to the original boundaries.