R Programming Language

R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis.

Why am I doing a new R or Python graph, chart or map every morning?

Why am I doing a new R or Python graph, chart or map every morning?

Over the past few months, I have been trying to post a new graph, chart or map that I’ve been creating in the R programming language or Python using public data and other sources from the Interwebs. The more you do anything, the better you can get at it — more speed, precision and beauty. Doing it over and over again, means I’m learning a valuable skill that can both benefit myself personally and professionally. Not every computer graphic has to look like the default, somewhat cringe-worthy look of ggplot or matplotlib.

I don’t have computing resources or the skills to get into big data, but even simpler data processing and visualizations can be valuable. Being able to query and plot data tax rolls, census data, economics data, and agricultural data all has it’s uses — even if far below the processing levels that a professional programmer at Google or IBM might be able to do. And it’s often fascinating to gain insights into the world, not filtered through the media, which often reports just the toplines fed to them by the government — without diving deep into the data.

It’s hard to dispute that many of visualizations contain biases both intentional or otherwise. But that’s the nature of data when you graph or map it out. You project the data on paper, you have to omit certain characteristics which bias the data one way or another. But I think it a good idea to find my own insights into the data, and present it how I think it looks best.

Setting boundaries in ggplot2

I was trying to figure out how to set the boundaries of maps created in ggplot2 to focus on a region of the state or a few counties. It’s actually not that hard to do in ggplot2 and R.

Basically you create a bounding box based on a shape like a few counties.

county <- counties(state=’ny’, cb=TRUE)

bbox <- county %>% filter(NAME %in% c(‘Oswego’,’Madison’)) %>% st_bbox()

Then add this line to the ggplot statement to set the boundaries of the plot. Simple enough.

coord_sf(xlim=c(bbox[1], bbox[3]), ylim=c(bbox[2], bbox[4])) +

I’ve been brushing up on my statistics lately, and realizing it’s kind of important to include on my graphs with linear models the linear equation, p-value and r^2-value

I’ve been brushing up on my statistics lately, and realizing it’s kind of important to include on my graphs with linear models the linear equation, p-value and r^2-value. I was hoping to find a simple R library to produce this text, but I ended up using a StackOverflow answer, along with some R code I wrote myself

get_equation <- function(model) {
  pValue <- paste('p_value=',  format(summary(model)$coefficients[,4], scientific=T), sep='')  
  r2Value <- paste('r^2_value=', round(summary(model)$r.squared,4), sep='') 
  
  formula <- broom::tidy(model)[, 1:2] %>%
    mutate(sign = ifelse(sign(estimate) == 1, '+', '-')) %>% #coeff signs
    mutate_if(is.numeric, ~ abs(round(., 7))) %>% #for improving formatting
    mutate(a = ifelse(term == '(Intercept)', paste0('y=', estimate), paste0(sign, estimate, 'x'))) %>%
    summarise(formula = paste(a, collapse = '')) %>%
    as.character 
  
  paste(formula, pValue, r2Value, sep='n')
}

To add this code to the upper right-hand side of the graph, you can use this code in ggplot2:

annotate('text', x=Inf,y=Inf, hjust=1, vjust=1, label=get_formula(lm(comb$avg ~ comb$Value))[1] ) 

Here is a rather silly example: