R Programming Language

R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis.

Show Only ...
Maps - Photos - Videos

How to Build a Table in R with Fill Color Style Info from an ArcGIS Feature Server

The R code is fairly simple โ€” just grab the JSON from the server and use purrr to map over the various styles. The returned tibble contains the value returned from the server, along with the label and fill color.

1# this builds a dataframe of styles/colors from feature server
3style <- map(feature.info$drawingInfo$renderer$uniqueValueInfos,
4    \(x) {
5      tibble(value = x$value %>% as.numeric,
6             label = x$label,
7             color = rgb(x$symbol$color[[1]], x$symbol$color[[2]], x$symbol$color[[3]], x$symbol$color[[4]], maxColorValue = 255))
8    }
9  ) %>% list_rbind()

R isnโ€™t that awful ๐Ÿ—บ

I keep telling myself that I should do more Python programming as itโ€™s the future and R is a dying language. R isnโ€™t the most popular language compared to Python.

But the thing is Python remains far behind R when it comes to map making and graphics. And there is a ton of useful packages out there for R, sometimes much better packages for R then Python especially when it comes to graphics and light manipulation of data, especially Census data. PANDAS might be better for heavy lifting then tidyverse but for many things the tidyverse is simpler.

Yet I concede R is a like adopting the Macintosh System 7 platform decades ago in the era of Windows 95. Your simply not using what the masses are using and you are somewhat locked out of benefits of a popular platform. Moreover, the underlying code in R is often slow and inefficient, with a legacy of 50 year old designs unlike the relatively modern clean and elegant of Python. Much like Macintosh System 7 compared to Windows 95. Macintosh System 7 did a lot of things good in graphics and user interface but the underpinning were a hot mess of hacks built on code from the early 1980s. Windows 95 had protected memory and preemptive multitasking while System 7 was stuck in the era of shared memory and cooperative multitasking.

But R is different than Macintosh System 7. R might be creaky and old but itโ€™s actively maintained and unlikely to be killed off with a single shot by a corporation like Apple did with Macintosh System 7 with the release of Mac OS X. R programming will last forever even if it eventually dies out to Python as itโ€™s open source and not controlled by a profit seeking corporation. Old R code is unlikely to stop working, as there is enough existing code base that interpretive environments are likely to be maintained just like how GNU FORTRAN still is a thing despite little new FORTRAN code written anymore.

Yet my bigger fear is that every time I use R programming language not only am I not writing truly future compatible code, Iโ€™m not practicing a skill that is beneficial for my future. Iโ€™ve read a lot of books on Python code and Iโ€™ve written a lot of Python but the way to be truly good at something is to use it a lot and practice. Itโ€™s great to be a skilled R programmer but if Python is the future itโ€™s what for naught. Yet, I constantly find when I write code in Python the weakness of the graphics, geospatial and even data wrangling capacities come back to bite my compared to what I can easily do in R no matter how much research I do into libraries and best practices. And that troubles me to keep going back to the second fiddle known as R programming.

Scraping SeeThroughNY Data using R

Here is the R code I am using to scrape SeeThroughNY.net to download state and local government employment wage data.

01library(tidyverse)
02library(RSelenium)
03library(netstat)
04library(rvest)
05 
06# Load Selenium browser. This code should automatically open a Firefox window
07# from r, downloading the latest GeckoDriver if neccessary.
08#
09# If this doesn't work, you should delete the LICENSE.chromedriver which
10# sometimes causes rSelenium to not load.
11## find ~/.local/share/ -name LICENSE.chromedriver | xargs -r rm
12 
13rs <- rsDriver(
14  remoteServerAddr = "localhost",
15  port = free_port(random = T),
16  browser = "firefox",
17  verbose = F
18)
19 
20rsc <- rs$client
22 
23# STOP !!!
24# While you could automate this step, you should now manually choose your
25# search items on SeeThroughNY browser window that has opened. Then
26# you should execute the following lines.
27 
28# Next you want to load all of the results. We limit it to 30 attempts,
29# which will pull most reasonably sized queries. Too big and you could crash
30# your browser due to excessive memory needed.
31 
32for (i in seq(1,30)) {
33  rsc$findElement(using='css', '#data_loader')$clickElement()
34   
35  if (rsc$findElement(using='css', '#data_loader')$getElementAttribute('style')[[1]] == 'display: none;')
36    break;
37  
38  Sys.sleep(2)
39}
40 
41# Next you need to pull and clean the HTML table that
42# contains the data
43rsc$getPageSource() %>%
44  unlist() %>%
45  read_html() %>%
46  html_table() %>%
47  .[[1]] %>%
48  janitor::clean_names() -> employees
49 
50# Some of the data is located in the (+) tab, but this is just a
51# table field located every other row, which split up into the appropiate
52# field values
53 
54employees %>%
55  filter(row_number() %% 2 == 0) %>%
56  select(name) %>%
57  separate(name, sep='\n', into=c(NA,'subagency',NA,NA,NA,'title',NA,NA,NA,'rateofpay',NA,NA,NA,'payyear',NA,NA,NA,'paybasis',NA,NA,NA,'branch') ) %>%
58  cbind(employees %>% filter(row_number() %% 2 != 0), .) %>%
59  mutate(across(everything(), str_trim),
60         total_pay = parse_number(total_pay)) %>%
61  select(-x, -x_2, -subagency_type) -> employees
62 
63### Then you can pipe this data into ggplot or any other program.
64### Or export it to CSV or Excel file
65employees %>% write_csv('/tmp/employee_data.csv')

R 4.30 Was Released

With R 4.3.0 released on Friday, you can now use an โ€œunderscoreโ€ with the built-in pipe, like you could use a โ€œperiodโ€ in maggittr pipe. While there are still some reasons to use maggittr, like T-pipes, assignment pipes and exposition pipes, Iโ€™ve never used them and they arenโ€™t exported by default in the tidyverse.

For example, in maggittr you could do:

1df %>% inner_join(states, .)

And now with the native pipe you can do the same thing:

1df |> inner_join(states, _)

That was a major oversight when the created the native pipe, Iโ€™m not sure why it wasnโ€™t originally implemented when 4.0.0 came out but it wasnโ€™t.

Also, you can use _$value to extract something from R:

1mtcars |> lm(mpg ~ disp, data = _) |> _$coef

Although, Iโ€™m not totally sure why you want to use a pipe like that when you can put the extractor directly on the lm:

1mtcars |> lm(mpg ~ disp, data = _)$coef

Learn more about the changes in R 4.30: https://www.jumpingrivers.com/blog/whats-new-r43/

Get R version 4.3.0 (Already Tomorrow) which was released on 2023-04-21.

And here is the full list of changes in R 4.30.