May 17, 2023 | Andy Arthur.org

Show Only ...
Maps - Photos - Videos

Pond Along The Windhall River

Andy | May 17, 2023

Saturday May 26, 2018 — Stewart's

Scraping SeeThroughNY Data using R

Andy | May 17, 2023

Here is the R code I am using to scrape SeeThroughNY.net to download state and local government employment wage data.

01library(tidyverse)
02library(RSelenium)
03library(netstat)
04library(rvest)
05 
06# Load Selenium browser. This code should automatically open a Firefox window
07# from r, downloading the latest GeckoDriver if neccessary.
08#
09# If this doesn't work, you should delete the LICENSE.chromedriver which
10# sometimes causes rSelenium to not load.
11## find ~/.local/share/ -name LICENSE.chromedriver | xargs -r rm
12 
13rs <- rsDriver(
14  remoteServerAddr = "localhost",
15  port = free_port(random = T),
16  browser = "firefox",
17  verbose = F
18)
19 
20rsc <- rs$client
21rsc$navigate("https://seethroughny.net/payrolls")
22 
23# STOP !!!
24# While you could automate this step, you should now manually choose your 
25# search items on SeeThroughNY browser window that has opened. Then 
26# you should execute the following lines.
27 
28# Next you want to load all of the results. We limit it to 30 attempts,
29# which will pull most reasonably sized queries. Too big and you could crash
30# your browser due to excessive memory needed.
31 
32for (i in seq(1,30)) {
33  rsc$findElement(using='css', '#data_loader')$clickElement()
34   
35  if (rsc$findElement(using='css', '#data_loader')$getElementAttribute('style')[[1]] == 'display: none;')
36    break;
37  
38  Sys.sleep(2)
39}
40 
41# Next you need to pull and clean the HTML table that
42# contains the data
43rsc$getPageSource() %>%
44  unlist() %>%
45  read_html() %>%
46  html_table() %>%
47  .[[1]] %>% 
48  janitor::clean_names() -> employees
49 
50# Some of the data is located in the (+) tab, but this is just a
51# table field located every other row, which split up into the appropiate
52# field values
53 
54employees %>%
55  filter(row_number() %% 2 == 0) %>%
56  select(name) %>%
57  separate(name, sep='\n', into=c(NA,'subagency',NA,NA,NA,'title',NA,NA,NA,'rateofpay',NA,NA,NA,'payyear',NA,NA,NA,'paybasis',NA,NA,NA,'branch') ) %>%
58  cbind(employees %>% filter(row_number() %% 2 != 0), .) %>%
59  mutate(across(everything(), str_trim),
60         total_pay = parse_number(total_pay)) %>%
61  select(-x, -x_2, -subagency_type) -> employees
62 
63### Then you can pipe this data into ggplot or any other program.
64### Or export it to CSV or Excel file
65employees %>% write_csv('/tmp/employee_data.csv')

Its wonderful…

Andy | May 17, 2023

To be walking down the street and to run into random people who say, you’ve lost a lot of weight. Literal strangers but also long term acquaintances who are noticing.

That said, what really feels wonderful is how much better these days I feel and how I’ve learned to eat much healthier and diverse food choices, things that are interesting but not overcooked and loaded with fat, salt and sugar.

There is always more to do but I think I am making permanent changes in my life. But probably the hardest thing remains friends, colleagues and family – when you find a good way to live – others want to pull you back as they don’t understand your new way of living.

I’m reminded of these lyrics which ring true with doing so much of the right thing in your life.

My buddies tell me that I should have waited
They say I’m missing a whole world of fun
But I am happy and I sing with pride
I like the christian life

I won’t lose a friends by heeding God’s call
For what is a friend who’d want me to fall
Otheres find pleasure in things I despise
I like the christian life

My buddies shun me since I turned to Jesus
But I am happy though it burdens my soul
And I’ll try to lead them to walk in the night
I like the christian life