How to do Zonal Histograms in R

I couldn’t find documentation on the web on how to create a Zonal Histogram similar to ArcGIS and QGIS efficiently in R but after discovering the exactextractr library, which is a fast method of doing Zonal Statistics, I discovered this library can be used to efficiently generate zonal histograms.

The key is to do group_by and summarize inside of the function, so you never load more un-summarized pixels into memory then each individual zone of the Shapefile. This example is shown with a NLCD GeoTIFF of New York State, but it could be used for any other raster dataset where you want to calculate a Zonal Histogram.

llibrary(sf)
library(tidyverse)
rm(list=ls())

# load raster of NLCD with terra
nlcd <- terra::rast('Documents/GIS.Data/agriculture/nlcd2019_ny.tif')

# load counties in sf using tigris
cos <- tigris::counties('ny')

# exactextractr is a fast way to do zonal statistics and can be used
# for zonal histograms using this code
#
# append_cols = T  ## include columns from shapefile to allow rejoining at end
# summarize_df = T ## pass the dataframecreated by extact_extract to the function
# function = .. calculate the fraction of the area using frac (fraction of area) option in exactexractr

nlcd_co <- exactextractr::exact_extract(nlcd, cos,
                                        append_cols=T,
                                        fun='frac' )
                                    
# this next step is optional, it just takes  long format and pivots_wider
# for an output similiar to what you would get in ArcGIS or QGIS
# then inner join restores spatial data by joining on matching rows
# also we convert to acres, which is more useful then square meters.

nlcd_co %>%
    mutate(area = units::set_units(area,'acres')) %>%
    pivot_wider(names_from = 'nlcd_code', values_from = 'area',
        names_glue = "nlcd{nlcd_code}" ) %>%
    inner_join(cos, .) -> nlcd_co

# Create the Table Shown Below
nlcd_co %>% 
  arrange(NAME) %>%
  select(NAME, starts_with('nlcd'), -nlcdNaN) %>%
  mutate(across(starts_with('nlcd'), ~round(.x))) %>%
  st_drop_geometry()

Here is the output. Totals in acres, as shown from the above code.

NAMEnlcd11nlcd21nlcd22nlcd23nlcd24nlcd31nlcd41nlcd42nlcd43nlcd52nlcd71nlcd81nlcd82nlcd90nlcd95
Albany5,67428,41922,71316,4418,5791,79961,28528,20485,1952,8484,11047,3284,28821,5782,850
Allegany2,92233,0366,3021,929372525312,23644,757104,2953,2941,453117,19223,4918,1071,936
Bronx9,5352,2542,4917,23512,5991051,491329735188247182441
Broome5,96527,30715,0019,4583,5591,264157,98818,589115,5513,5341,66882,4106,2576,3402,651
Cattaraugus8,60835,53110,6414,8501,0291,307415,37433,863142,6444,3492,245121,87034,99323,6585,080
Cayuga110,14420,54012,2884,0871,35227399,6347,6179,7074,04370462,032169,44347,2713,613
Chautauqua281,03236,32117,8127,5022,131667269,36219,16281,4552,5402,525130,21662,87840,6135,526
Chemung1,69815,5109,4544,5151,819552103,5725,05952,7991,70490051,6698,2783,3801,832
Chenango3,76127,1366,4982,188544487215,26336,235114,2183,4901,716119,68819,91721,1682,812
Clinton50,35217,78215,1147,3432,8831,778224,491129,22961,85710,9579,51960,94435,98381,7985,012
Columbia8,19615,28814,3716,3321,513752183,78215,92733,6471,9201,66579,02420,16329,0603,013
Cortland1,82113,5386,0472,540709347136,07512,17329,8892,0161,35178,70926,0708,0701,615
Delaware13,15833,36710,5673,3478551,278557,92828,577145,4785,9563,263118,8913,83210,7801,760
Dutchess15,28034,12930,13119,6505,9611,660263,8235,48315,0532,9073,49573,86810,39342,5313,679
Erie117,24664,29060,61738,09219,7252,806157,07715,10463,1833,0033,419102,59266,33966,0364,010
Essex76,57620,08310,8204,8889362,139404,216372,508205,75811,3116,50033,8073,57966,6416,378
Franklin42,85823,9938,9873,9441,011575432,858218,10575,41013,7599,12060,29527,263160,0418,170
Fulton23,08811,8246,1692,9408581,007103,40971,26240,3441,5141,42131,1013,64739,5452,912
Genesee1,76313,8308,5653,2131,29073344,7028366,30492049433,642137,53457,5415,627
Greene7,14619,7337,3893,3129611,017189,47138,987100,0201,5811,84228,6203,26315,6752,134
Hamilton57,6117,4962,132942141153623,574202,608144,7937,4352,51078477100,9665,770
Herkimer29,52121,2568,0244,7591,052772461,15588,98387,6875,4955,177100,76632,35581,2904,831
Jefferson374,47525,87323,32213,7944,3191,965224,83138,66118,13524,48816,922227,11579,86199,26715,657
Kings17,0601,1382,1248,78829,7133038521019325687288101828
Lewis10,79622,2005,0282,102434543362,686112,38030,16010,2409,19675,50647,355127,4889,443
Livingston6,64820,8118,5542,819830547110,7776,94524,1353,2411,03152,942153,51913,7963,168
Madison4,55622,4457,3213,142817312144,05618,94422,3224,7644,56193,60457,36136,4252,599
Monroe453,56055,58452,79927,21111,7951,08164,30081917,6121,3511,30756,08985,10541,0954,980
Montgomery3,54911,6907,2074,1301,32270535,55213,32532,9281,1251,62499,31230,61616,9822,523
Nassau105,05931,93934,54862,92021,2522,4069,3274408,779251585649311,42010,362
New York6,6179909172,9029,8192324936115318NA652
Niagara394,45822,19420,5599,5224,6721,42747,8821504,8488381,16448,225117,93550,6283,297
Oneida29,51743,32819,15310,9674,277477277,27286,67432,9669,3919,497118,64574,73183,9224,071
Onondaga17,60539,68040,70722,03610,1092,214137,3849,46610,8887,3162,52174,27489,65148,3483,394
Ontario12,33925,41412,6075,1481,779971112,3823,72919,0662,31999359,918140,00025,8111,557
Orange13,94445,34828,79318,3927,2271,002244,1902,69820,0833,6244,14471,75814,47654,1956,692
Orleans273,26710,6956,0141,74656930237,5541754,35834436623,614120,05742,0451,965
Oswego229,60530,68412,2605,0161,798621260,54554,53620,8118,8213,19274,13827,487103,2866,918
Otsego9,68229,2409,6133,622853162252,72036,40090,0102,4972,343143,28023,16642,2874,208
Putnam9,09215,0268,9174,8981,06520599,7455772,9295157253,191579,833686
Queens45,7852,4616,26023,05233,1261,0251,28823314324824844401,773
Rensselaer7,54918,26316,36612,6554,1961,122114,78428,503115,6022,8783,18356,23517,93323,4452,974
Richmond28,7772,9434,92013,9566,0333343,62352025095994372,1771,617
Rockland16,17722,06617,80111,7923,45954646,2161171,0215725281,21915,097825
Saratoga20,28134,66525,22411,3733,6911,683124,41295,69170,8694,2933,91748,48716,58573,4995,367
Schenectady2,02912,61910,3887,2033,00429119,33411,92433,3909941,59419,5591,4739,0401,203
Schoharie3,18317,4375,8662,485578720106,95147,616102,9213,0224,09679,50010,25813,9672,249
Schuyler8,92910,2283,6151,03828317973,3565,36839,7661,75066550,81015,1705,8842,055
Seneca41,51211,7936,6732,27572243330,6309094,8342,2841,20027,00197,11219,3633,128
St. Lawrence90,95852,27215,4418,3842,2612,302745,095162,46976,52730,75718,545198,02159,287315,41727,830
Steuben9,19147,91912,4545,2071,376947316,91229,103169,9694,3002,911202,19677,08213,4295,302
Suffolk933,426118,182120,90282,10825,58613,29887,78332,70425,9692,1327,97423,98011,56710,88321,549
Sullivan15,82933,17110,1734,5341,2071,047277,44942,717174,9884,0313,42638,83278126,3183,181
Tioga2,87617,6855,8962,088701776105,11013,39382,1932,2961,12681,96811,7124,9421,813
Tompkins9,98816,4859,3373,6921,18450587,9307,48950,7822,31184475,22131,92114,7032,197
Ulster21,68039,26616,2727,6712,5741,460357,13128,665156,8292,7562,57747,4697,22048,6812,632
Warren40,45318,6939,7125,0301,629928214,894165,46588,3303,7073,5516,79433232,9973,708
Washington8,21515,23512,1996,1331,3861,207176,80881,08741,8303,3174,488109,33240,79931,9217,081
Wayne499,86023,9918,9033,1321,12345997,4812,06014,0761,37388953,483115,03857,9675,317
Westchester42,32461,07338,77128,33711,463327107,6591,58110,1367741,4225,875359,291803
Wyoming2,67218,3854,8051,62447265109,5738,93728,6272,24874237,180139,71923,8622,711
Yates24,39612,6853,6471,10124511565,5094,24318,5142,19053825,78173,0027,4611,069

4 Comments

  • Ichsan says:

    Thanks for sharing your code

    Now I can analyze the land use for each polygon from European Space Agency (ESA) World cover 2021

  • Gab says:

    Dear Andy, thanks for sharing your code on how to calculate zonal histogram in r. However, i found it to be exremely slow and i have to shut it down several times after waiting for hours. What do you think is responsible for this?

    • Andy says:

      The problem is your dataset is too large. You could use aggregate(fact=2, fun=’modal’) to reduce the size of the raster by 50% to speed things up.

      Also, since I wrote this post I discovered that you don’t have to use the group by function of dplyr you can do this natively in extactextractr using the frac fun parameter:

      nlcd_co <- exactextractr::exact_extract(nlcd, cos, append_cols=T,  fun='frac' )
      

Leave a Reply

Your email address will not be published. Required fields are marked *