R Programming Language

R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis.

How to Build a Table in R with Fill Color Style Info from an ArcGIS Feature Server

The R code is fairly simple — just grab the JSON from the server and use purrr to map over the various styles. The returned tibble contains the value returned from the server, along with the label and fill color.

# this builds a dataframe of styles/colors from feature server
feature.info <- jsonlite::read_json('https://services.arcgis.com/cJ9YHowT8TU7DUyn/arcgis/rest/services/AirNowLatestContoursCombined/FeatureServer/0/?f=pjson')
style <- map(feature.info$drawingInfo$renderer$uniqueValueInfos,
    \(x) {
      tibble(value = x$value %>% as.numeric, 
             label = x$label,
             color = rgb(x$symbol$color[[1]], x$symbol$color[[2]], x$symbol$color[[3]], x$symbol$color[[4]], maxColorValue = 255))
    }
  ) %>% list_rbind()

R isn’t that awful πŸ—Ί

I keep telling myself that I should do more Python programming as it’s the future and R is a dying language. R isn’t the most popular language compared to Python.

But the thing is Python remains far behind R when it comes to map making and graphics. And there is a ton of useful packages out there for R, sometimes much better packages for R then Python especially when it comes to graphics and light manipulation of data, especially Census data. PANDAS might be better for heavy lifting then tidyverse but for many things the tidyverse is simpler.

Yet I concede R is a like adopting the Macintosh System 7 platform decades ago in the era of Windows 95. Your simply not using what the masses are using and you are somewhat locked out of benefits of a popular platform. Moreover, the underlying code in R is often slow and inefficient, with a legacy of 50 year old designs unlike the relatively modern clean and elegant of Python. Much like Macintosh System 7 compared to Windows 95. Macintosh System 7 did a lot of things good in graphics and user interface but the underpinning were a hot mess of hacks built on code from the early 1980s. Windows 95 had protected memory and preemptive multitasking while System 7 was stuck in the era of shared memory and cooperative multitasking.

But R is different than Macintosh System 7. R might be creaky and old but it’s actively maintained and unlikely to be killed off with a single shot by a corporation like Apple did with Macintosh System 7 with the release of Mac OS X. R programming will last forever even if it eventually dies out to Python as it’s open source and not controlled by a profit seeking corporation. Old R code is unlikely to stop working, as there is enough existing code base that interpretive environments are likely to be maintained just like how GNU FORTRAN still is a thing despite little new FORTRAN code written anymore.

Yet my bigger fear is that every time I use R programming language not only am I not writing truly future compatible code, I’m not practicing a skill that is beneficial for my future. I’ve read a lot of books on Python code and I’ve written a lot of Python but the way to be truly good at something is to use it a lot and practice. It’s great to be a skilled R programmer but if Python is the future it’s what for naught. Yet, I constantly find when I write code in Python the weakness of the graphics, geospatial and even data wrangling capacities come back to bite my compared to what I can easily do in R no matter how much research I do into libraries and best practices. And that troubles me to keep going back to the second fiddle known as R programming.

Scraping SeeThroughNY Data using R

Here is the R code I am using to scrape SeeThroughNY.net to download state and local government employment wage data.

library(tidyverse)
library(RSelenium)
library(netstat)
library(rvest)

# Load Selenium browser. This code should automatically open a Firefox window
# from r, downloading the latest GeckoDriver if neccessary.
#
# If this doesn't work, you should delete the LICENSE.chromedriver which
# sometimes causes rSelenium to not load.
## find ~/.local/share/ -name LICENSE.chromedriver | xargs -r rm

rs <- rsDriver(
  remoteServerAddr = "localhost",
  port = free_port(random = T),
  browser = "firefox",
  verbose = F
)

rsc <- rs$client
rsc$navigate("https://seethroughny.net/payrolls")

# STOP !!!
# While you could automate this step, you should now manually choose your 
# search items on SeeThroughNY browser window that has opened. Then 
# you should execute the following lines.

# Next you want to load all of the results. We limit it to 30 attempts,
# which will pull most reasonably sized queries. Too big and you could crash
# your browser due to excessive memory needed.

for (i in seq(1,30)) {
  rsc$findElement(using='css', '#data_loader')$clickElement()
  
  if (rsc$findElement(using='css', '#data_loader')$getElementAttribute('style')[[1]] == 'display: none;')
    break;
 
  Sys.sleep(2)
}

# Next you need to pull and clean the HTML table that
# contains the data
rsc$getPageSource() %>%
  unlist() %>%
  read_html() %>%
  html_table() %>%
  .[[1]] %>% 
  janitor::clean_names() -> employees

# Some of the data is located in the (+) tab, but this is just a
# table field located every other row, which split up into the appropiate
# field values

employees %>%
  filter(row_number() %% 2 == 0) %>%
  select(name) %>%
  separate(name, sep='\n', into=c(NA,'subagency',NA,NA,NA,'title',NA,NA,NA,'rateofpay',NA,NA,NA,'payyear',NA,NA,NA,'paybasis',NA,NA,NA,'branch') ) %>%
  cbind(employees %>% filter(row_number() %% 2 != 0), .) %>%
  mutate(across(everything(), str_trim),
         total_pay = parse_number(total_pay)) %>%
  select(-x, -x_2, -subagency_type) -> employees

### Then you can pipe this data into ggplot or any other program.
### Or export it to CSV or Excel file
employees %>% write_csv('/tmp/employee_data.csv')

R 4.30 Was Released

With R 4.3.0 released on Friday, you can now use an “underscore” with the built-in pipe, like you could use a “period” in maggittr pipe. While there are still some reasons to use maggittr, like T-pipes, assignment pipes and exposition pipes, I’ve never used them and they aren’t exported by default in the tidyverse.

For example, in maggittr you could do:

df %>% inner_join(states, .)

And now with the native pipe you can do the same thing:

df |> inner_join(states, _)

That was a major oversight when the created the native pipe, I’m not sure why it wasn’t originally implemented when 4.0.0 came out but it wasn’t.

Also, you can use _$value to extract something from R:

mtcars |> lm(mpg ~ disp, data = _) |> _$coef

Although, I’m not totally sure why you want to use a pipe like that when you can put the extractor directly on the lm:

mtcars |> lm(mpg ~ disp, data = _)$coef

Learn more about the changes in R 4.30: https://www.jumpingrivers.com/blog/whats-new-r43/

Get R version 4.3.0 (Already Tomorrow) which was released on 2023-04-21.

And here is the full list of changes in R 4.30.