Population Maths

2020 Population Maths !

These require Python, PANDAS and GeoPandas. You will also need the PL 94-171 redistricting files, specifically the 2020 TIGER Line Shapefiles and the nyplgeo2020.pl which is in a zip file. That nyplgeo2020.pl contains the population, households, and area from the 2020 census file — among other things for all census summary levels. It’s really handy to have.

This document is very helpful in understanding the Census files when you load them into PANDAS: 2020 Census State (P.L. 94-171) Redistricting Summary File Technical Documentation.

For all of these scripts, you will need to adjust the variables for the actual paths on your computer where they are saved. The overlay shape file can be anything, but you will need to update the catField to match the actual field in the shapefile that you want to calculate the population.

Population of an Area

The below code calculates the area of overlay layer, if you have an overlay shapefile with a series of rings extending out from the NYS Capitol. As this covers a large area, we use blockgroup sums to calculate, and then the cumulative sum of each ring.

01import pandas as pd
02import geopandas as gpd
03 
04# path to overlay shapefile
05overlayshp = r'/tmp/dis_to_albany.gpkg'
06 
07# summary level -- 750 is tabulation block, 150 is blockgroup
08# large areas over about 50 miles much faster to use bg
09summaryLevel = 150
10#summaryLevel = 750
11 
12# path to block or blockgroup file
13if summaryLevel == 150:
14    blockshp = r'/home/andy/Documents/GIS.Data/census.tiger/36_New_York/tl_2020_36_bg20.shp.gpkg'
15else:
16    blockshp = r'/home/andy/Documents/GIS.Data/census.tiger/36_New_York/tl_2020_36_tabblock20.shp.gpkg'
17 
18# path to PL 94-171 redistricting geoheader file
19pl94171File = '/home/andy/Desktop/nygeo2020.pl'
20 
21# field to categorize on (such as Ward -- required!)
22catField = 'Name'
23 
24# geo header contains 2020 census population in column 90 
25# per PL 94-171 documentation, low memory chunking disabled 
26# as it causes issues with the geoid column being mixed types
27df=pd.read_csv(pl94171File,delimiter='|',header=None, low_memory=False )
28 
29# column 2 is summary level 
30population=df[(df.iloc[:,2] == summaryLevel)][[9,90]]
31 
32# load overlay
33overlay = gpd.read_file(overlayshp).to_crs(epsg='3857')
34 
35# shapefile of nys 2020 blocks, IMPORTANT (!) mask by output file for speed
36blocks = gpd.read_file(blockshp,mask=overlay).to_crs(epsg='3857')
37 
38# geoid for linking to shapefile is column 9
39joinedBlocks=blocks.set_index('GEOID20').join(population.set_index(9))
40 
41# store the size of unbroken blocks
42# in case overlay lines break blocks into two
43joinedBlocks['area']=joinedBlocks.area
44 
45# run union
46unionBlocks=gpd.overlay(overlay, joinedBlocks, how='union')
47 
48# drop blocks outside of overlay
49unionBlocks=unionBlocks.dropna(subset=[catField])
50 
51# create population projection when a block crosses
52# an overlay line -- avoid double counting -- this isn't perfect
53# as we loose a 0.15 percent due to floating point errors
54unionBlocks['sublock']=unionBlocks[90]*(unionBlocks.area/unionBlocks['area'])
55 
56# sum blocks in category
57unionBlocks=pd.DataFrame(unionBlocks.groupby(catField).sum()['sublock'])
58 
59# rename columns
60unionBlocks=unionBlocks.rename({'sublock': '2020 Census Population'},axis=1)
61 
62# calculate cumulative sum as you go out each ring
63unionBlocks['millions']=unionBlocks.cumsum(axis=0)['2020 Census Population']/1000000
64 
65# each ring is 50 miles
66unionBlocks['miles']=unionBlocks.index*50
67 
68# output
69unionBlocks

Redistricting / Discrepancy from Ideal Districts

This is a variant of the above script, calculating the deviation in population from an ideal district. As this covers a small area, we use data from the block level. See below and the comments.

01import pandas as pd
02import geopandas as gpd
03 
04# path to overlay shapefile
05overlayshp = r'/home/andy/Documents/GIS.Data/election.districts/albany wards 2015.gpkg'
06 
07# summary level -- 750 is tabulation block, 150 is blockgroup
08# large areas over about 50 miles much faster to use bg
09#summaryLevel = 150
10summaryLevel = 750
11 
12# path to block or blockgroup file
13if summaryLevel == 150:
14    blockshp = r'/home/andy/Documents/GIS.Data/census.tiger/36_New_York/tl_2020_36_bg20.shp.gpkg'
15else:
16    blockshp = r'/home/andy/Documents/GIS.Data/census.tiger/36_New_York/tl_2020_36_tabblock20.shp.gpkg'
17 
18# path to PL 94-171 redistricting geoheader file
19pl94171File = '/home/andy/Desktop/nygeo2020.pl'
20 
21# field to categorize on (such as Ward -- required!)
22catField = 'Ward'
23 
24# geo header contains 2020 census population in column 90 
25# per PL 94-171 documentation, low memory chunking disabled 
26# as it causes issues with the geoid column being mixed types
27df=pd.read_csv(pl94171File,delimiter='|',header=None, low_memory=False )
28 
29# column 2 is summary level 
30population=df[(df.iloc[:,2] == summaryLevel)][[9,90]]
31 
32# load overlay
33overlay = gpd.read_file(overlayshp).to_crs(epsg='3857')
34 
35# shapefile of nys 2020 blocks, IMPORTANT (!) mask by output file for speed
36blocks = gpd.read_file(blockshp,mask=overlay).to_crs(epsg='3857')
37 
38# geoid for linking to shapefile is column 9
39joinedBlocks=blocks.set_index('GEOID20').join(population.set_index(9))
40 
41# store the size of unbroken blocks
42# in case overlay lines break blocks into two
43joinedBlocks['area']=joinedBlocks.area
44 
45# run union
46unionBlocks=gpd.overlay(overlay, joinedBlocks, how='union')
47 
48# drop blocks outside of overlay
49unionBlocks=unionBlocks.dropna(subset=[catField])
50 
51# create population projection when a block crosses
52# an overlay line -- avoid double counting -- this isn't perfect
53# as we loose a 0.15 percent due to floating point errors
54unionBlocks['sublock']=unionBlocks[90]*(unionBlocks.area/unionBlocks['area'])
55 
56# sum blocks in category
57unionBlocks=pd.DataFrame(unionBlocks.groupby(catField).sum()['sublock'])
58 
59# rename columns
60unionBlocks=unionBlocks.rename({'sublock': '2020 Census Population'},axis=1)
61 
62# calculate ideal ward based on 15 districts, 2020 albany population 99,224
63unionBlocks['Ideal']=99224/15
64 
65# calculate departure from ideal
66unionBlocks['Departure']=unionBlocks['2020 Census Population']-unionBlocks['Ideal']
67 
68# calculate percent departure
69unionBlocks['Percent Departure']=unionBlocks['Departure']/unionBlocks['2020 Census Population']*100
70 
71# output
72unionBlocks

Population Maths

2020 Population Maths !

Population of an Area

Redistricting / Discrepancy from Ideal Districts

1 Trackback or Pingback

Leave a Reply Cancel reply