Here is a list of the ten most Hispanic counties in New York State from the 2020 US Census Here is a list of the ten most Hispanic counties in New York State from the 2020 US Census. CountyPercent HispanicBronx54.7625579396111Queens27.7643315385306Westchester26.8138904900857New York23.7650737700612Orange22.3627619545987Suffolk21.8202133794694Rockland19.6409412140254Richmond19.5583634394157Kings18.8747087980808Nassau18.3715271956635 Here is how you can create this list using PANDAS. You will need to get the PL-94 171 Redistricting data, the Legacy File Format Header Records, and expand the ZIP file and place in the appropriate directory described below. view sourceprint?01import pandas as pd02import geopandas as gpd03 04# path where 2020_PLSummaryFile_FieldNames.xlsx XX=State Code05# and XXgeo2020.pl, xx000012020.pl through XX000032020.pl06# reside on your hard drive07path='/home/andy/Desktop/2020pl-94-171/'08 09# state code10state='ny'11 12# header file, open with all tabs as an dictionary of dataframes13field_names=pd.read_excel(path+'2020_PLSummaryFile_FieldNames.xlsx', sheet_name=None)14 15# load the geoheader, force as str type to mixed types on certain fields16# ensure GEOIDs are properly processed avoids issues with paging17gh=pd.read_csv( path+state+'geo2020.pl',delimiter='|',18 header=None, 19 names=field_names['2020 P.L. Geoheader Fields'].columns,20 index_col='LOGRECNO',21 dtype=str )22 23 # load segment 1 of 2020 PL 94-171 which is racial data 24segNum=125seg=pd.read_csv( path+state+'0000'+str(segNum)+'2020.pl',delimiter='|',26 header=None, 27 names=field_names['2020 P.L. Segment '+str(segNum)+' Fields'].columns,28 index_col='LOGRECNO',29 )30# discard FILEID, STUSAB, CHARITER, CIFSN as duplicative after join31seg=seg.iloc[:,4:]32 33# join seg to geoheader34seg=gh.join(seg)35 36# Calculate the population of New York Counties that is African American 37# using County SUMLEVEL == 50 (see Census Docts)38ql="SUMLEV=='050'"39 40# Create a DataFrame with the County and Percent Hispani41# You can get the fields list from 2020 PL Summary FieldNames.xlsx42# under the 2020 P.L. Segment 1 Definitions tab43his=pd.DataFrame({ 'County': seg.query(ql)['BASENAME'], 44 'Percent Hispanic': seg.query(ql)['P0020002'] / seg.query(ql)['P0020001'] *100})45 46# Sort and print most Hispanic Counties47his.sort_values(by="Percent Hispanic", ascending=False).head(10).to_csv('/tmp/hispanics.csv')
1 Trackback or Pingback Working PANDAS and American Community Survey Summary File | Andy Arthur.org Leave a Reply Cancel replyYour email address will not be published. Required fields are marked *Comment * Name * Email * Website Save my name, email, and website in this browser for the next time I comment. Ξ
1 Trackback or Pingback