Mapping

I decide to create a PostGIS table with the complete NY State Tax database on my computer

I decide to create a PostGIS table with the complete NY State Tax database on my computer. πŸ€–

Querying the state’s ArcGIS server is fine, but it can be slow or impossible when you are trying to do calculations with thousands or millions of records. The ESRI Geodatabase file is fine, but nothing beats PostGIS for speed with a large dataset like that, especially on a solid state drive.

http://gis.ny.gov/gisdata/inventories/details.cfm?DSID=1300

Are you looking for FIPS or GeoID codes to match against NY SWIS codes?

You can find them in this shapefile here: http://gis.ny.gov/gisdata/inventories/details.cfm?DSID=927 or using the state’s ArcMap server: https://gisservices.its.ny.gov/arcgis/rest/services/NYS_Civil_Boundaries/FeatureServer/6

The relevant fields as SWIS and FIPS_CODE, the later which is the GeoID used by the Census bureau.

To link state tax data to that file, you may need to zero the last two digits of SWIS codes with code like this in R or your favorite language: taxroll$swis <- gsub('.{2}$','00', taxroll$swis)

When you shouldn’t use GIS software for mapping

When you shouldn’t use GIS software for mapping

Many GIS professionals and hobbyists will turn to their favorite GIS program, namely QGIS or ArcGIS for making maps. I did that for many years. But I’m seeing the real disadvantages to using GIS software for creating basic cholorpath maps and diagrams compared to using a programming language like R or Python.

GIS software is point and click. You don’t need to learn a programming language, although certainly knowing Python can help automate map creation. But the problem with GIS software is its point and click. You can create maps, but it’s often involves many extra steps including going to a website to download data, then manipulate data in an external program like Microsoft Excel or Open Office Calc, or maybe Python and Pandas then open and join the data in your GIS program. Sure you can save your data, your project and templates for reuse but it’s a lot of extra steps. Atlases and scripting can automate part of the process but still it’s a lot of pointing and clicking and manual work.

The other problem with GIS created maps is it so tempting to lard them up with excessive layers, formatting and details that are totally unnecessary for basic cholorpath maps that are about readability and not styling. Just because it’s easy to add a ton of layers and details with your GIS program doesn’t mean you should. Often good chloropath maps are simple, easy to read at a glance. They don’t need a million layers.

QGIS can export maps as SVG vector files for fast and compact web display. But the SVG files it creates isn’t nearly as fast or simple as those created by ggplot2. It’s easy to mess up layouts slightly in the layout composer in QGIS and not find it out until later. Often maps created by R and ggplot2 just look good out of the box with minimal tweaking. And it’s super easy to reuse code between R projects by just cutting and pasting.

For a lot of mapping and diagrams you can’t beat using R and ggplot2. But it’s not for everything. Complicated topographic maps with many layeds or those that involve custom feature creation and plotting are best done in GIS software. Advanced labeling, formatting or 3D rendering is best done in GIS software. But for basic map plotting for figure creation nothing really beat R and ggplot2. 

Another Example of GeoPandas and Python Spatial Joins

Another Example of GeoPandas and Python Spatial Joins πŸ—Ί

Yesterday, I posted a much more complicated piece of code that pulled addresses from the SAM (State Address Management) database and did a spatial join to add a column to the file with Assembly District and Municipality. This was a bit too complex, so I made a simpler one for other purposes that doesn’t require the coordinates to be obtained from SAM.

This python script takes two parameters:

  • Path to a CSV file that contains an X and Y coordinate
  • Path to a Shapefile or Geopackage to Join Against

Then the code will create a new CSV file with the spatially joined attributes pulled from the Shapefile. I have only run it on a few large data sets, but I found it took roughly 1 second to join 1,000 records from call to end of end of script.

#!/usr/bin/python

import requests,sys,json,os,csv

import pandas as pd
import geopandas as gpd

lines=[]

# read list of addresses from parameter 1
with open(sys.argv[-2], newline='') as csvfile:	
	for line in csv.DictReader(csvfile):
		lines.append(line)

# convert to pandas
locPd = pd.DataFrame(lines,columns=lines[0].keys())
locPd.convert_dtypes()

locPd = gpd.GeoDataFrame(locPd,  geometry=gpd.points_from_xy(locPd.x.astype('float32'), locPd.y.astype('float32')))

# run spatial joins against parameter 2
ad = gpd.read_file(sys.argv[-1])
locPd = gpd.sjoin(locPd, ad, op="within")

# remove added geometery and index columns
del locPd['geometry']
del locPd['index_right']

# write pandas back to out csv
locPd.to_csv (os.path.splitext(sys.argv[-2])[0]+'-output.csv', index = False, header=True)