R Programming Language

R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis.

I often think about all the interesting — and time saving things — I’ve learned in R over the past year and a half

I often think about all the interesting — and time saving things — I’ve learned in R over the past year and a half. On Election Night, I wrote a script that used a headless Firefox browser and Selenium to pull Election Results in real-time and process it into an a Google Spreadsheet. I was pleasantly surprised how well it worked, and how with a few extra lines of code I could pipe it into ggplot and make all kinds of maps.

The New York City Gubernatorial maps I posted where much the same way. I took data from several NYC Election Night Results pages, aggregated it into a data frame, cleaned up and formatted the data, then piped it through to ggplot with some nice styling.

Every day, I try to learn and experiment in new directions, build my skills further. As I only become faster and more talented at using R the more I use it and learn the many shortcuts and libraries out there for writing better code, quicker.

How to do Zonal Histograms in R

I couldn’t find documentation on the web on how to create a Zonal Histogram similar to ArcGIS and QGIS efficiently in R but after discovering the exactextractr library, which is a fast method of doing Zonal Statistics, I discovered this library can be used to efficiently generate zonal histograms.

The key is to do group_by and summarize inside of the function, so you never load more un-summarized pixels into memory then each individual zone of the Shapefile. This example is shown with a NLCD GeoTIFF of New York State, but it could be used for any other raster dataset where you want to calculate a Zonal Histogram.

llibrary(sf)
library(tidyverse)
rm(list=ls())

# load raster of NLCD with terra
nlcd <- terra::rast('Documents/GIS.Data/agriculture/nlcd2019_ny.tif')

# load counties in sf using tigris
cos <- tigris::counties('ny')

# exactextractr is a fast way to do zonal statistics and can be used
# for zonal histograms using this code
#
# append_cols = T  ## include columns from shapefile to allow rejoining at end
# summarize_df = T ## pass the dataframecreated by extact_extract to the function
# function = .. calculate the fraction of the area using frac (fraction of area) option in exactexractr

nlcd_co <- exactextractr::exact_extract(nlcd, cos,
                                        append_cols=T,
                                        fun='frac' )
                                    
# this next step is optional, it just takes  long format and pivots_wider
# for an output similiar to what you would get in ArcGIS or QGIS
# then inner join restores spatial data by joining on matching rows
# also we convert to acres, which is more useful then square meters.

nlcd_co %>%
    mutate(area = units::set_units(area,'acres')) %>%
    pivot_wider(names_from = 'nlcd_code', values_from = 'area',
        names_glue = "nlcd{nlcd_code}" ) %>%
    inner_join(cos, .) -> nlcd_co

# Create the Table Shown Below
nlcd_co %>% 
  arrange(NAME) %>%
  select(NAME, starts_with('nlcd'), -nlcdNaN) %>%
  mutate(across(starts_with('nlcd'), ~round(.x))) %>%
  st_drop_geometry()

Here is the output. Totals in acres, as shown from the above code.

NAMEnlcd11nlcd21nlcd22nlcd23nlcd24nlcd31nlcd41nlcd42nlcd43nlcd52nlcd71nlcd81nlcd82nlcd90nlcd95
Albany5,67428,41922,71316,4418,5791,79961,28528,20485,1952,8484,11047,3284,28821,5782,850
Allegany2,92233,0366,3021,929372525312,23644,757104,2953,2941,453117,19223,4918,1071,936
Bronx9,5352,2542,4917,23512,5991051,491329735188247182441
Broome5,96527,30715,0019,4583,5591,264157,98818,589115,5513,5341,66882,4106,2576,3402,651
Cattaraugus8,60835,53110,6414,8501,0291,307415,37433,863142,6444,3492,245121,87034,99323,6585,080
Cayuga110,14420,54012,2884,0871,35227399,6347,6179,7074,04370462,032169,44347,2713,613
Chautauqua281,03236,32117,8127,5022,131667269,36219,16281,4552,5402,525130,21662,87840,6135,526
Chemung1,69815,5109,4544,5151,819552103,5725,05952,7991,70490051,6698,2783,3801,832
Chenango3,76127,1366,4982,188544487215,26336,235114,2183,4901,716119,68819,91721,1682,812
Clinton50,35217,78215,1147,3432,8831,778224,491129,22961,85710,9579,51960,94435,98381,7985,012
Columbia8,19615,28814,3716,3321,513752183,78215,92733,6471,9201,66579,02420,16329,0603,013
Cortland1,82113,5386,0472,540709347136,07512,17329,8892,0161,35178,70926,0708,0701,615
Delaware13,15833,36710,5673,3478551,278557,92828,577145,4785,9563,263118,8913,83210,7801,760
Dutchess15,28034,12930,13119,6505,9611,660263,8235,48315,0532,9073,49573,86810,39342,5313,679
Erie117,24664,29060,61738,09219,7252,806157,07715,10463,1833,0033,419102,59266,33966,0364,010
Essex76,57620,08310,8204,8889362,139404,216372,508205,75811,3116,50033,8073,57966,6416,378
Franklin42,85823,9938,9873,9441,011575432,858218,10575,41013,7599,12060,29527,263160,0418,170
Fulton23,08811,8246,1692,9408581,007103,40971,26240,3441,5141,42131,1013,64739,5452,912
Genesee1,76313,8308,5653,2131,29073344,7028366,30492049433,642137,53457,5415,627
Greene7,14619,7337,3893,3129611,017189,47138,987100,0201,5811,84228,6203,26315,6752,134
Hamilton57,6117,4962,132942141153623,574202,608144,7937,4352,51078477100,9665,770
Herkimer29,52121,2568,0244,7591,052772461,15588,98387,6875,4955,177100,76632,35581,2904,831
Jefferson374,47525,87323,32213,7944,3191,965224,83138,66118,13524,48816,922227,11579,86199,26715,657
Kings17,0601,1382,1248,78829,7133038521019325687288101828
Lewis10,79622,2005,0282,102434543362,686112,38030,16010,2409,19675,50647,355127,4889,443
Livingston6,64820,8118,5542,819830547110,7776,94524,1353,2411,03152,942153,51913,7963,168
Madison4,55622,4457,3213,142817312144,05618,94422,3224,7644,56193,60457,36136,4252,599
Monroe453,56055,58452,79927,21111,7951,08164,30081917,6121,3511,30756,08985,10541,0954,980
Montgomery3,54911,6907,2074,1301,32270535,55213,32532,9281,1251,62499,31230,61616,9822,523
Nassau105,05931,93934,54862,92021,2522,4069,3274408,779251585649311,42010,362
New York6,6179909172,9029,8192324936115318NA652
Niagara394,45822,19420,5599,5224,6721,42747,8821504,8488381,16448,225117,93550,6283,297
Oneida29,51743,32819,15310,9674,277477277,27286,67432,9669,3919,497118,64574,73183,9224,071
Onondaga17,60539,68040,70722,03610,1092,214137,3849,46610,8887,3162,52174,27489,65148,3483,394
Ontario12,33925,41412,6075,1481,779971112,3823,72919,0662,31999359,918140,00025,8111,557
Orange13,94445,34828,79318,3927,2271,002244,1902,69820,0833,6244,14471,75814,47654,1956,692
Orleans273,26710,6956,0141,74656930237,5541754,35834436623,614120,05742,0451,965
Oswego229,60530,68412,2605,0161,798621260,54554,53620,8118,8213,19274,13827,487103,2866,918
Otsego9,68229,2409,6133,622853162252,72036,40090,0102,4972,343143,28023,16642,2874,208
Putnam9,09215,0268,9174,8981,06520599,7455772,9295157253,191579,833686
Queens45,7852,4616,26023,05233,1261,0251,28823314324824844401,773
Rensselaer7,54918,26316,36612,6554,1961,122114,78428,503115,6022,8783,18356,23517,93323,4452,974
Richmond28,7772,9434,92013,9566,0333343,62352025095994372,1771,617
Rockland16,17722,06617,80111,7923,45954646,2161171,0215725281,21915,097825
Saratoga20,28134,66525,22411,3733,6911,683124,41295,69170,8694,2933,91748,48716,58573,4995,367
Schenectady2,02912,61910,3887,2033,00429119,33411,92433,3909941,59419,5591,4739,0401,203
Schoharie3,18317,4375,8662,485578720106,95147,616102,9213,0224,09679,50010,25813,9672,249
Schuyler8,92910,2283,6151,03828317973,3565,36839,7661,75066550,81015,1705,8842,055
Seneca41,51211,7936,6732,27572243330,6309094,8342,2841,20027,00197,11219,3633,128
St. Lawrence90,95852,27215,4418,3842,2612,302745,095162,46976,52730,75718,545198,02159,287315,41727,830
Steuben9,19147,91912,4545,2071,376947316,91229,103169,9694,3002,911202,19677,08213,4295,302
Suffolk933,426118,182120,90282,10825,58613,29887,78332,70425,9692,1327,97423,98011,56710,88321,549
Sullivan15,82933,17110,1734,5341,2071,047277,44942,717174,9884,0313,42638,83278126,3183,181
Tioga2,87617,6855,8962,088701776105,11013,39382,1932,2961,12681,96811,7124,9421,813
Tompkins9,98816,4859,3373,6921,18450587,9307,48950,7822,31184475,22131,92114,7032,197
Ulster21,68039,26616,2727,6712,5741,460357,13128,665156,8292,7562,57747,4697,22048,6812,632
Warren40,45318,6939,7125,0301,629928214,894165,46588,3303,7073,5516,79433232,9973,708
Washington8,21515,23512,1996,1331,3861,207176,80881,08741,8303,3174,488109,33240,79931,9217,081
Wayne499,86023,9918,9033,1321,12345997,4812,06014,0761,37388953,483115,03857,9675,317
Westchester42,32461,07338,77128,33711,463327107,6591,58110,1367741,4225,875359,291803
Wyoming2,67218,3854,8051,62447265109,5738,93728,6272,24874237,180139,71923,8622,711
Yates24,39612,6853,6471,10124511565,5094,24318,5142,19053825,78173,0027,4611,069

A Year with R

A Year with R

A year ago I stumbled upon the R programming language, mostly by accident on YouTube. I wanted a better platform for making graphics and maps and was running up against a lot of limitations in Python with matplotlib. Matplotlib is powerful but it often requires a lot of explicit code to make elegant well thought out graphics.

R has proven to be a very worthwhile skill to learn. While I consider myself to be a fairly experienced Python programmer, R has proven a lot more valuable especially when it comes to making good basic, attractive maps in SVG files. Simply said, R defaults with ggplot just make sense and are attractive. The pipe mechanism in R based around maggitir is fantastic for complicated data wrangling in a single line of code. Pipes are a wonderful thing in Unix and they make a lot of sense for processing data.

R is a werid language to get the hang of at first. It’s not necessarily bad – it’s actually pretty awesome for manipulating data with pipes. But it is different with strange operators and syntax, based around 1 indexing rather than 0 indexing of most C derived languages like Python. But I’ve really gotten the hang of it by doing a lot of reading and watching videos on R and just digging through the commands, reading help files and even the raw R code on objects.

R really excels with automating GIS processes and being a one stop shop from extract, transform, load to render. Interestingly, outside of academia it seems like R doesn’t get the credit it deserves – especially with Census data and tidycensus its a one stop shop from obtaining data to manipulating it to rendering it on a map, often with just a single pipeline of code. It’s pretty neat.

I’m glad I taught myself R and it’s a technology I will probably continue to use daily for exploring my world.