Mapping

Show Only ...
Maps - Photos - Videos

The Real Powers of R Statistical Language for Map Making

There are two major — and distinct ways that the R Statistical language is a powerful tool for map making — either paired with QGIS or without.

When you should use R Statistical Language with QGIS

If you are looking for a more complicated map, one of the best ways to process and wrangle map data, Shapefiles and other geographic data is within R Studio. Much of the prep work for making a map is best down in R as you can type commands quickly, pipe and mutate data, dissolve and buffer shapes easily with R and the sf package. No mouse clicks required. It’s easy to save repeated processes in an R script.

Then once the data is ready to be mapped, export it to a geopackage or shapefile, usually which I store in the /tmp directory, as I don’t need to save the data as I can quickly re-create it with the R script. From there, I can load it into QGIS, then make the map look pretty, manipulate the styling and labeling in QGIS.

When you should use R Statistical Language Alone

R with ggplot is good for simple maps, and those you want to run off frequently with updated data, and aren’t concerned with pretty labeling. Basic, clear thematic maps where labels aren’t needed or the automatic labeling of ggrepel can be tolerated due to relatively few label overlaps. In many cases, for basic maps, ggplot makes a cleaner output, lines things up better and produces better output out of the box then what you might quickly throw together in QGIS from exported data in R.

Why I don’t use R scripting in QGIS directly

While it’s possible to do R scripting in QGIS, I would rather work in RStudio with it’s preview windows, code completion and other tools rather the use QGIS. It’s just easier to do the data wrangling and manipulation of shapefiles in RStudio, and then export to a Geopackage and load the final data in QGIS.

Python code, takes a list of addresses, runs them through NYS Address Management system, then uses GeoPandas to calculate the municipality and Assembly district from GeoPackages then writes out to a CSV file

Python, the State Geocoder and Outputting Assembly Districts

Python code, takes a list of addresses, runs them through NYS Address Management system, then uses GeoPandas to calculate the municipality and Assembly district from GeoPackages then writes out to a CSV file. Nothing fancy, but it does a job. I could use it with any other shapefile I want. GeoPandas is nice because it’s fast and doesn’t require loading QGIS.

#!/usr/bin/python

import requests,sys,json,os,csv

import pandas as pd
import geopandas as gpd

lines=[]

# read list of addresses
with open(sys.argv[-1], newline='') as csvfile:	
	for line in csv.DictReader(csvfile):
		lines.append(line)
		
# build address query
query = '{"records": ['
i=0
for line in lines:
	query += '{ "attributes": { "OBJECTID":'+str(i)+', "SINGLELINE": "'+line['Address'].rstrip()+'"} },'+"\n"
	i+=1	
query += ']}'

post = { 'f':'pjson', 'outSR': 4326, 'addresses': query }
url = 'https://gisservices.its.ny.gov/arcgis/rest/services/Locators/Street_and_Address_Composite/GeocodeServer/geocodeAddresses'

# send request to state geocoder
req = requests.post(url, data = post)
locations = json.loads(req.text)['locations']

# parse response
for loc in locations:
	i = loc['attributes']['ResultID']
	lines[i]['y'] = loc['location']['y']
	lines[i]['x'] = loc['location']['x']
	lines[i]['Match_addr'] = loc['attributes']['Match_addr']
	
	# hackish, might cause problems but keeps joins from erroring
	if (lines[i]['x'] == 'NaN'):
		lines[i]['x'] = 0
		lines[i]['y'] = 0

# convert to pandas
locPd = pd.DataFrame(lines,columns=lines[0].keys())
locPd = gpd.GeoDataFrame(locPd,  geometry=gpd.points_from_xy(locPd.x.astype('float32'), locPd.y.astype('float32')))

# add county municipality column
cosub = gpd.read_file(r'/home/andy/Documents/GIS.Data/geocode/cosub.gpkg')
locPd = gpd.sjoin(locPd, cosub, op="within")

del locPd['index_right']

# add ads column
ad = gpd.read_file(r'/home/andy/Documents/GIS.Data/geocode/ad.gpkg')
locPd = gpd.sjoin(locPd, ad, op="within")

del locPd['index_right']

# add sd column
sd = gpd.read_file(r'/home/andy/Documents/GIS.Data/geocode/sd.gpkg')
locPd = gpd.sjoin(locPd, sd, op="within")

del locPd['index_right']

# add cd column
sd = gpd.read_file(r'/home/andy/Documents/GIS.Data/geocode/cd.gpkg')
locPd = gpd.sjoin(locPd, sd, op="within")


# remove added geometery and index columns
del locPd['geometry']
del locPd['index_right']

# write pandas back to out csv
locPd.to_csv (os.path.splitext(sys.argv[-1])[0]+'-output.csv', index = False, header=True)