Population Maths

2020 Population Maths !

These require Python, PANDAS and GeoPandas. You will also need the PL 94-171 redistricting files, specifically the 2020 TIGER Line Shapefiles and the nyplgeo2020.pl which is in a zip file. That nyplgeo2020.pl contains the population, households, and area from the 2020 census file — among other things for all census summary levels. It’s really handy to have.

This document is very helpful in understanding the Census files when you load them into PANDAS: 2020 Census State (P.L. 94-171) Redistricting Summary File Technical Documentation.

For all of these scripts, you will need to adjust the variables for the actual paths on your computer where they are saved. The overlay shape file can be anything, but you will need to update the catField to match the actual field in the shapefile that you want to calculate the population.

Population of an Area

The below code calculates the area of overlay layer, if you have an overlay shapefile with a series of rings extending out from the NYS Capitol. As this covers a large area, we use blockgroup sums to calculate, and then the cumulative sum of each ring.

import pandas as pd
import geopandas as gpd

# path to overlay shapefile
overlayshp = r'/tmp/dis_to_albany.gpkg'

# summary level -- 750 is tabulation block, 150 is blockgroup
# large areas over about 50 miles much faster to use bg
summaryLevel = 150
#summaryLevel = 750

# path to block or blockgroup file
if summaryLevel == 150:
    blockshp = r'/home/andy/Documents/GIS.Data/census.tiger/36_New_York/tl_2020_36_bg20.shp.gpkg'
else:
    blockshp = r'/home/andy/Documents/GIS.Data/census.tiger/36_New_York/tl_2020_36_tabblock20.shp.gpkg'

# path to PL 94-171 redistricting geoheader file
pl94171File = '/home/andy/Desktop/nygeo2020.pl'

# field to categorize on (such as Ward -- required!)
catField = 'Name'

# geo header contains 2020 census population in column 90 
# per PL 94-171 documentation, low memory chunking disabled 
# as it causes issues with the geoid column being mixed types
df=pd.read_csv(pl94171File,delimiter='|',header=None, low_memory=False )

# column 2 is summary level 
population=df[(df.iloc[:,2] == summaryLevel)][[9,90]]

# load overlay
overlay = gpd.read_file(overlayshp).to_crs(epsg='3857')

# shapefile of nys 2020 blocks, IMPORTANT (!) mask by output file for speed
blocks = gpd.read_file(blockshp,mask=overlay).to_crs(epsg='3857')

# geoid for linking to shapefile is column 9
joinedBlocks=blocks.set_index('GEOID20').join(population.set_index(9))

# store the size of unbroken blocks
# in case overlay lines break blocks into two
joinedBlocks['area']=joinedBlocks.area

# run union
unionBlocks=gpd.overlay(overlay, joinedBlocks, how='union')

# drop blocks outside of overlay
unionBlocks=unionBlocks.dropna(subset=[catField])

# create population projection when a block crosses
# an overlay line -- avoid double counting -- this isn't perfect
# as we loose a 0.15 percent due to floating point errors
unionBlocks['sublock']=unionBlocks[90]*(unionBlocks.area/unionBlocks['area'])

# sum blocks in category
unionBlocks=pd.DataFrame(unionBlocks.groupby(catField).sum()['sublock'])

# rename columns
unionBlocks=unionBlocks.rename({'sublock': '2020 Census Population'},axis=1)

# calculate cumulative sum as you go out each ring
unionBlocks['millions']=unionBlocks.cumsum(axis=0)['2020 Census Population']/1000000

# each ring is 50 miles
unionBlocks['miles']=unionBlocks.index*50

# output
unionBlocks


Redistricting / Discrepancy from Ideal Districts

This is a variant of the above script, calculating the deviation in population from an ideal district. As this covers a small area, we use data from the block level. See below and the comments.

import pandas as pd
import geopandas as gpd

# path to overlay shapefile
overlayshp = r'/home/andy/Documents/GIS.Data/election.districts/albany wards 2015.gpkg'

# summary level -- 750 is tabulation block, 150 is blockgroup
# large areas over about 50 miles much faster to use bg
#summaryLevel = 150
summaryLevel = 750

# path to block or blockgroup file
if summaryLevel == 150:
    blockshp = r'/home/andy/Documents/GIS.Data/census.tiger/36_New_York/tl_2020_36_bg20.shp.gpkg'
else:
    blockshp = r'/home/andy/Documents/GIS.Data/census.tiger/36_New_York/tl_2020_36_tabblock20.shp.gpkg'

# path to PL 94-171 redistricting geoheader file
pl94171File = '/home/andy/Desktop/nygeo2020.pl'

# field to categorize on (such as Ward -- required!)
catField = 'Ward'

# geo header contains 2020 census population in column 90 
# per PL 94-171 documentation, low memory chunking disabled 
# as it causes issues with the geoid column being mixed types
df=pd.read_csv(pl94171File,delimiter='|',header=None, low_memory=False )

# column 2 is summary level 
population=df[(df.iloc[:,2] == summaryLevel)][[9,90]]

# load overlay
overlay = gpd.read_file(overlayshp).to_crs(epsg='3857')

# shapefile of nys 2020 blocks, IMPORTANT (!) mask by output file for speed
blocks = gpd.read_file(blockshp,mask=overlay).to_crs(epsg='3857')

# geoid for linking to shapefile is column 9
joinedBlocks=blocks.set_index('GEOID20').join(population.set_index(9))

# store the size of unbroken blocks
# in case overlay lines break blocks into two
joinedBlocks['area']=joinedBlocks.area

# run union
unionBlocks=gpd.overlay(overlay, joinedBlocks, how='union')

# drop blocks outside of overlay
unionBlocks=unionBlocks.dropna(subset=[catField])

# create population projection when a block crosses
# an overlay line -- avoid double counting -- this isn't perfect
# as we loose a 0.15 percent due to floating point errors
unionBlocks['sublock']=unionBlocks[90]*(unionBlocks.area/unionBlocks['area'])

# sum blocks in category
unionBlocks=pd.DataFrame(unionBlocks.groupby(catField).sum()['sublock'])

# rename columns
unionBlocks=unionBlocks.rename({'sublock': '2020 Census Population'},axis=1)

# calculate ideal ward based on 15 districts, 2020 albany population 99,224
unionBlocks['Ideal']=99224/15

# calculate departure from ideal
unionBlocks['Departure']=unionBlocks['2020 Census Population']-unionBlocks['Ideal']

# calculate percent departure
unionBlocks['Percent Departure']=unionBlocks['Departure']/unionBlocks['2020 Census Population']*100

# output
unionBlocks
SVGZ Graphic:  Real Median Household Income, 1984-2020
SVGZ Graphic: 2020 Center of Population
SVGZ Graphic: 2020 Mississippi Household Income
SVGZ Graphic: 2020 Mississippi Population
SVGZ Graphic: 2020 State Population
SVGZ Graphic: 2020 State Population Density
SVGZ Graphic: 2020 US Population by Census Region
SVGZ Graphic: 2024 Population Born Out of State [Expires July 10 2026]
SVGZ Graphic: 25 Counties with the Fewest College Graduates (Bachelors or Post-Graduate)
SVGZ Graphic: 25 Most Densely Populated Municipalites and CCDs in America
SVGZ Graphic: 25 Poorest Counties in America
SVGZ Graphic: African Americans - Percent of County Population
SVGZ Graphic: Area of US Counties
SVGZ Graphic: Classifying Similiar States for Income Distribution
SVGZ Graphic: Counties with fewer then 100 people per sq mile
SVGZ Graphic: County Population Estimates, from 2020 to 2025 [Expires July 10 2026]
SVGZ Graphic: Domestic Migration, June 2020 to 2022
SVGZ Graphic: Half of All Americans Live Within 79 MIles of the Ocean
SVGZ Graphic: Highest Population Density Counties in Each State
SVGZ Graphic: Hispanic Population in Texas
SVGZ Graphic: How Asians Voted in NYS in 2020
SVGZ Graphic: How Many People Reside in Each Time Zone
SVGZ Graphic: Kentucky - 2020 Population
SVGZ Graphic: Land Area of States
SVGZ Graphic: Median Age by State
SVGZ Graphic: Median Household Income in Michigan
SVGZ Graphic: Median Household Income, NY and Surrounding States
SVGZ Graphic: Median Income by State
SVGZ Graphic: Median Income Distribution by State
SVGZ Graphic: Median Value of Owner Occupied Homes
SVGZ Graphic: Most Similiar Income Distributions to New York
SVGZ Graphic: Most White Counties in America
SVGZ Graphic: New York State 1790 Census
SVGZ Graphic: Ohio US Senate GOP Primary
SVGZ Graphic: Percent Male - US Congressional District
SVGZ Graphic: Percent of America's Population by State
SVGZ Graphic: Percentage of Irish Residents per State
SVGZ Graphic: Percentage of the Population that Was Born In-State
SVGZ Graphic: Population Born Out of State
SVGZ Graphic: Population Change by State, 2020 to 2022
SVGZ Graphic: Population Density in Texas
SVGZ Graphic: Population of State Capital Metro Regions
SVGZ Graphic: Population Reporting Somalian Ancestry
SVGZ Graphic: State Population Change, 2020-2025
SVGZ Graphic: Texas County Population
SVGZ Graphic: The Rich Men North of Richmond (Median Household Income)
SVGZ Graphic: Where New York State Residents Moved To in 2015-20
Thematic Map: 2020 County Population Dot Density
Thematic Map: Albany Population Density
Thematic Map: BOCES Districts
Thematic Map: Census Blocks With More then 100 People Per Square Mile
Thematic Map: Census Tracts in America with Fewer then 100 People Per Square Mile
Thematic Map: County Population Under 25k
Thematic Map: Each dot represents a census block with 100 New Yorkers
Thematic Map: Households Making Less then $100,000/yr in 2023
Thematic Map: Households Making Less then $50,000 in Each State
Thematic Map: Metro Areas in New York
Thematic Map: NY Population Change 1970 to 2022
Thematic Map: Untitled
Thematic Map: US Median Household Income by Census Tract
Thematic Map: US Population Change - 2020 to 2024
Thematic Map: US Population Density
Thematic Map: US Population Non Contiguous Cartogram
Thematic Map: us-state-pop-density-1900