R Programming Language

R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis.

How to fix rselenium / wdman unable to start error

If you are getting an error starting geckodriver when using rselenium, you may want to try deleting the LICENSE.chromedriver file which accidentally attempts to be executed by wdman::geckodriver. You can do this simply in the Linux terminal by using this command. The xargs -r command only executes the rm command when there is a file matched to delete.

find ~/.local/share/ -name LICENSE.chromedriver -print | xargs -r rm

There was a time when PHP was my go to language for everything

Sure, I used other languages – Javascript, Python, C and even C++ but I really liked PHP for there being so many libraries and it being so easy to tie into web apps.

Python was fine but it was something that I used when choices were limited – like for QGIS scripting and automation. Then over summer vacation I got that book out of the library about data science and discovered the benefits of an interactive intrepter with Jupyter. Python’s data frames model with pandas is quite powerful. I enjoy working with them. But I also found matplotlib, the graphing and mapping libraries for Python to be often lacking when trying to output quality, professional looking graphics.

I heard a lot of good things about R so I decided to take it up last autumn, and I haven’t looked back…

R has a werid syntax but the pipe model that can be used throughout is incredibly powerful and ggplot2 with only a little bit of tweaking makes amazing graphs and maps that look like they came out of a professional GIS program. Things I used to do in QGIS, I’m increasingly doing with the R programming language and I even converting a lot of my PHP and Python code now over to R. I am really hooked on this language after struggling with it a bit at first. 

R Programming – IDW interpolation of Missing / NA Census Tract Data

You can do IDW interpolation of missing Census Tracts fairly easily in R using the gstat library. The key is to make sure you use a projected dataset. Other interpolation methods are covered here: https://rspatial.org/raster/analysis/4-interpolation.html

library(tidycensus)
library(tidyverse)
library(raster)
library(gstat)

# obtain census data on veteran status by tract and then
# reproject the shapefile geometery into a projected coordinate system

acs <- get_acs("tract",
state='ny',
survey='acs5',
var=
c('Total'='B21001_001',
'Veteran'='B21001_002'
),
cache_table = T,
geometry = T,
resolution='20m',
year = 2020,
output = "wide"
) %>% st_transform(26918)

# calculate the percentage of veterans per census tract
acs <- mutate(acs, vet_per = VeteranE/TotalE)

# create a copy of census tracts, dropping any NA values
# from vet_per field
vetNA <- acs %>% drop_na(vet_per)

# a raster should be created to do interpolation into
r <- raster(vetNA, res=1000)


# set the foruma based on field (vet_per) that contains
# the veterans percent to interpolate. This use IDW interpolation
# for all points, weighting farther ones less
gs <- gstat(formula = vet_per~1, locations = vetNA)

# interpolate the data (average based on nearby points)
nn <- interpolate(r, gs)

# extract the median value of the raster interpolation from the original shapefile,
# into a new column set as est
acs<- cbind(acs, est=exactextractr::exact_extract(nn, acs, fun='median'))

# replace any NA values with interpolated data so the map doesn't contain
# holes. You should probably mention that missing data was interpolated when
# sharing your map.
acs <- acs %>% mutate(vet_per = ifelse(is.na(vet_per), est, vet_per))

Create Mile Points from a LINESTRING or MULTILINESTRING in R

Here is an R function that takes a LINESTRING or touching MULTILINESTRING and returns a series of points along the line string at each mile, much like the “tombstone” mile markers along an expressway. This is helpful for making maps when you want to plot distance for hikers or drivers looking at their odometer of their car. It has the option to “reverse” the linestring so you can have the points going the opposite direction of the linestring, such as south to north. The code can be modified for quarter mile points or however you find useful.

library(tidyverse)
library(sf)
library(units)

make_milepoint <- \(linestring, reverse = FALSE) {
  # make sure were using a projected coordinate systm
  linestring <- st_transform(linestring, 5070)
  
  # merge parts together into a multilinestring
  linestring <- linestring %>% group_by() %>% summarise()
  
  # if multiline string, then attempt to join together
  # this will raise an exception if the linestring is not contiguous 
  if (st_geometry_type(linestring) == 'MULTILINESTRING')
    linestring <- st_line_merge(linestring )
  
  # reverse the string if we want to 
  # go from the other end
  if (reverse) 
    linestring <- st_reverse(linestring)
  
  # length of string in miles
  linestring.distance = st_length(linestring) %>% set_units('mi') %>% 
    drop_units()
  
  # percent of string equals each mile including start and end
  linestring.sample.percent <- seq(0, 1, 1/linestring.distance) %>% c(1)
  
  # sample line string, convert multi-points to points, convert to sf
  # add a column listing the mile-points, round to two digits
  st_line_sample(linestring, sample = linestring.sample.percent) %>%
    st_cast('POINT') %>%
    st_as_sf() %>%
    mutate(
      mile = round(linestring.sample.percent * linestring.distance, digits = 2))
}

Here is an example of the 1 mile points it outputs along Piseco-Powley Road.

And here is a static (paper) map I created using this code plus adding coordinates and elevation for this hiking map.

Soda Range Trail

An Introduction to Apache Arrow for R Users

Been learning about Apache Arrow for loading and processing extremely large datasets in R quickly without having to set up and use something like a PostgresSQL database. DMV vehicle file, you know what I'm thinking.

How to access WMS Servers in R Programming Language

How to access WMS Servers in R Programming Language

I couldn’t find a good package to read WMS data into a SpatialRaster or for downloading. However I discovered that you can use sf’s built-in version of gdal_util query the WMS server’s information then use the sf built in gdal_translate to download the WMS image into a temporary file then load it into memory.

library(sf)
library(tidyverse)
library(terra)
rm(list=ls())

# obtain layer info from WMS server using sf built-in version of gdal_info
wms_url <- 'https://orthos.its.ny.gov/ArcGIS/services/wms/Latest/MapServer/WMSServer?'
ginfo <- sf::gdal_utils('info', str_c('WMS:',wms_url), quiet=T)

# extract layers and layer urls from returned layer info, 
# create a data table from this data
ldesc <- ginfo %>% str_match_all('SUBDATASET_(\\d)_DESC=(.*?)\n')
ldsc <- ldesc[[1]][,3]
lurl <- ginfo %>% str_match_all('SUBDATASET_(\\d)_NAME=(.*?)\n')
lurl <- lurl[[1]][,3]
wms_layers <- cbind(ldesc, lurl)
rm(ldesc, lurl)

# obtain from census bureau a shapefile of the Town of Cazenovia,
# then find the bounding box around it
bbx<- tigris::county_subdivisions('ny') %>% 
  filter(NAME == 'Cazenovia') %>% 
  st_transform(3857) %>% 
  st_bbox()

# Revise the WMS URL based on the above bounding box
# Web Mercator (3857) is best to use with WMS Servers as
# that is the native projection. As I wanted multiple layers
# I also revised the URL string to include all layers desired
# Seperated by a comma (explore the wms_layer table for details)
url <- wms_layers[4,2] %>%
  str_replace('LAYERS=(.*?)&', 'LAYERS=3,2,1,0&') %>%
  str_replace('SRS=(.*?)&', 'SRS=EPSG:3857&') %>%
  str_replace('BBOX=(.*?)$', str_c('BBOX=',paste(bbx, collapse=',')) ) 

# Use sf built-in gdal_utils to download the image of Cazenovia based
# on the URL, with an output size listed below. You can also download
# based on resolution by using the tr option that is commented out below.
# then load the temporary file into rast as a Spatial Raster for further 
# processing. In addition, adding -co COMPRESS=JPEG greatly reduces the size
# of aerial photography with minimal impacts on quality or loading speed.

t <- tempfile()
reso <-  c('-outsize','1920','1080','-co','COMPRESS=JPEG')
#reso <- c('-tr', '1','1','-co','COMPRESS=JPEG') # "meters" in the 3857 projection
sf::gdal_utils('translate', url, t, reso) 
r <- rast(t)

# display the spatial raster or whatever you would like to do with it
# the tiff is stored in the temporary location in the t variable
plot(r)