Want to be able to work with American Community Survey data offline using your own local copy of the ACS 5-year Summary File? It’s pretty easy to do with PANDAS. If you are planning a lot of Census queries, this can be a very fast way to extract data.
Before you can use this script, you will need to download some data:
Geography Table Lookup File (Links Logical Record Number to Geography): 5_year_Mini_Geo.xlsx
Summary File (Actual Data): 5 Year Sequence by State Summary File – Each state has two files, one with all geographies except Blockgroups, and one with just Census Tracts and Blockgroups
import pandas as pd
path = '/home/andy/Desktop/acs-summary-file/'
# list of geography
geo = pd.read_excel(path+'5_year_Mini_Geo.xlsx', sheet_name='ny',index_col='Logical Record Number')
# load headers
header = pd.read_excel(path+'ACS_5yr_Seq_Table_Number_Lookup.xlsx')
# create a column with census variable headers
header['COL_NAME'] = header['Table ID'] + '_' + header['Line Number'].apply(lambda a: "{0:.0f}".format(a).zfill(3))
# segment id, along with ACS year and state
segId = 135
year = 2019
state = 'ny'
# create a list of headers for segment file
segHead = ['FILEID','FILETYPE','STUSAB','CHARITER','SEQUENCE','LOGRECNO'] \
+ header.query('`Sequence Number` == '+str(segId)).dropna(subset=['Line Number'])['COL_NAME'].to_list()
# read the segment file, including column names above
seg = pd.read_csv(path+'e'+str(year)+'5'+state+(str(segId).zfill(4))+'000.txt',header=None, names=segHead, index_col=5)
# join the segment file to geography using Logical Record number
seg = geo.join(seg)
# calculate percentage of households with internet subscriptions -- codes from ACS_5yr_Seq_Table_Number_Lookup.xlsx
seg['Internet Subscription']=seg['B28011_002']/seg['B28011_001']*100
# output the percentage of households by county with internet subscriptions
seg[seg['Geography ID'].str.startswith('050')][['Geography Name','Internet Subscription']]
Geography Name
Internet Subscription
Logical Record Number
13
Albany County, New York
83.888889
14
Allegany County, New York
76.248050
15
Bronx County, New York
75.917821
16
Broome County, New York
82.222562
17
Cattaraugus County, New York
72.431480
…
…
…
70
Washington County, New York
80.224036
71
Wayne County, New York
81.508715
72
Westchester County, New York
86.371288
73
Wyoming County, New York
78.387887
74
Yates County, New York
75.916583
# alternatively you can display human readable columns automatically
seg.rename(dict(zip(header['COL_NAME'],header['Table Title'])),axis=1)
State
Geography ID
Geography Name
FILEID
FILETYPE
STUSAB
CHARITER
SEQUENCE
Total:
Has one or more types of computing devices:
Logical Record Number
1
NY
04000US36
New York
ACSSF
201900000.0
ny
0.0
135.0
7343234.0
6581493.0
2
NY
04001US36
New York — Urban
ACSSF
201900000.0
ny
0.0
135.0
6433524.0
5771681.0
3
NY
04043US36
New York — Rural
ACSSF
201900000.0
ny
0.0
135.0
909710.0
809812.0
4
NY
040A0US36
New York — In metropolitan or micropolitan st…
ACSSF
201900000.0
ny
0.0
135.0
7189902.0
6449723.0
5
NY
040C0US36
New York — In metropolitan statistical area
ACSSF
201900000.0
ny
0.0
135.0
6796057.0
6109882.0
…
…
…
…
…
…
…
…
…
…
…
28400
NY
97000US3631920
Yonkers City School District, New York
ACSSF
201900000.0
ny
0.0
135.0
74897.0
65767.0
28401
NY
97000US3631950
York Central School District, New York
ACSSF
201900000.0
ny
0.0
135.0
2116.0
1964.0
28402
NY
97000US3631980
Yorktown Central School District, New York
ACSSF
201900000.0
ny
0.0
135.0
7068.0
6751.0
28403
NY
97000US3632010
Cuba-Rushford Central School District, New York
ACSSF
201900000.0
ny
0.0
135.0
2629.0
2186.0
28404
NY
97000US3699999
Remainder of New York, New York
ACSSF
201900000.0
ny
0.0
135.0
79779.0
75425.0
Too much work or don’t want to download the summary file yourself? You can query the Census API directly using PyPI’s censusdata library from PIP. For infrequent queries where you are online, for those with Internet at home, you would be much better off just querying the API directly.
import pandas as pd
import censusdata as cd
# attributes to load
cdcol=['B28011_001','B28011_002']
cdf = cd.download('acs5', 2019,
cd.censusgeo([('state', '36'),
('county','*')]),
cdcol)
# seperate out the geoid and geography name
geoid=[]
geoname=[]
for index in cdf.index.tolist():
geopart=''
for part in index.geo:
geopart = geopart + part[1]
geoid.append(geopart)
geoname.append(index.name)
cdf['geoid']=geoid
cdf['geoname']=geoname
# calculate percentage with internet subscriptions
cdf['Internet Subscription']=cdf['B28011_002']/cdf['B28011_001']*100
# output a similar table as above
cdf