Sister Counties

Sister counties — these are counties that have most similar land cover by percentage, i.e. similar amounts of urbanization, crop land, hay fields, trees. Often, but not always these counties border each other. Sometimes this is true in both directions but not always. Fun with R.
 
Albany – Schenectady
Allegany – Cattaraugus
Bronx – Queens
Broome – Tioga
Cattaraugus – Allegany
Cayuga – Seneca
Chautauqua – Oswego
Chemung – Chenango
Chenango – Otsego
Clinton – Fulton
Columbia – Cortland
Cortland – Columbia
Delaware – Cattaraugus
Dutchess – Orange
Erie – Onondaga
Essex – Warren
Franklin – Lewis
Fulton – Clinton
Genesee – Seneca
Greene – Sullivan
Hamilton – Herkimer
Herkimer – St. Lawrence
Jefferson – Chautauqua
Kings – New York
Lewis – St. Lawrence
Livingston – Wyoming
Madison – Oneida
Monroe – Niagara
Montgomery – Tompkins
Nassau – Richmond
New York – Kings
Niagara – Wayne
Oneida – Madison
Onondaga – Ontario
Ontario – Livingston
Orange – Dutchess
Orleans – Niagara
Oswego – Chautauqua
Otsego – Chenango
Putnam – Dutchess
Queens – Bronx
Rensselaer – Schoharie
Richmond – Nassau
Rockland – Westchester
Saratoga – Fulton
Schenectady – Albany
Schoharie – Tioga
Schuyler – Steuben
Seneca – Cayuga
St. Lawrence – Lewis
Steuben – Schuyler
Suffolk – Monroe
Sullivan – Greene
Tioga – Broome
Tompkins – Schuyler
Ulster – Greene
Warren – Essex
Washington – Oneida
Wayne – Niagara
Westchester – Rockland
Wyoming – Livingston
Yates – Livingston
 
 
The Zonal Histogram was created in QGIS using the NLCD ’19 data. Here is the R script:
library(tidyverse)
rm(list=ls())

# read exported zonal histogram
hist <- read_csv('Desktop/county.csv') 

# calculate rowwise percentages of land use
hist <- hist %>% rowwise() %>% mutate(total = sum(across(contains('HISTO_')))) %>%
  mutate(across(contains('HISTO_'), ~(./total)*100 )) 

# include only relevant rows -- those in the histogram
hist <- hist %>% select(NAME10, contains('HISTO_'))


# go through each county
for (county in sort(hist$NAME10)) {
  searchCounty <- hist %>% filter(NAME10 == county)
  
  
  # calculate distance between search county and others
  # make our searchCounty dataframe the same size as the histogram table
  # subtract from histogram dataframe, taking absolute value 
  # sum rows to calculate the distance from the county
  # bind to histogram dataframe
  # better explanation: https://stackoverflow.com/questions/55681573/how-can-i-find-the-record-from-a-data-set-that-is-most-similar-to-a-test-record
  
  bd <- cbind(hist, dist=rowSums(abs(hist[,-1] - searchCounty[rep(1, nrow(hist)), -1]))) %>%
    arrange(dist)

  print(paste(county,'-',bd[2,1]))
}

Leave a Reply

Your email address will not be published. Required fields are marked *