US Census
2020 Mississippi Household Income
US Counties – Square Miles
Which counties are the largest counties in an area? This interactive map, which pulls data from the Census TIGER files is colored using Natural Breaks Jenks coloring to compare the size various counties in America. Blue counties are smaller, red counties are larger.
Data Source: Population estimates, shown in the based on the 2016 US Census Population estimates. American Fact Finder. https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk
Households That Make Less then $20k in America π΅
If Puerto Rico was a state, it would be by far the poorest in America. Significant poverty continues to exist in the south, indeed the most impoverished states are Mississippi, West Virginia, Louisiana, New Mexico and Alabama according to the 2020 American Community Survey 5 year averages.
State | Households Making Under 20k |
Puerto Rico | 48.2 |
Mississippi | 22.6 |
West Virginia | 20.6 |
Louisiana | 20.1 |
New Mexico | 19.6 |
Alabama | 19.3 |
Arkansas | 18.9 |
Kentucky | 18.8 |
South Carolina | 17.0 |
Tennessee | 16.6 |
Oklahoma | 16.2 |
North Carolina | 15.8 |
Ohio | 15.6 |
Georgia | 15.2 |
Missouri | 15.2 |
District of Columbia | 15.2 |
Montana | 15.1 |
New York | 15.0 |
Maine | 15.0 |
Michigan | 14.9 |
Florida | 14.9 |
Rhode Island | 14.6 |
Indiana | 14.6 |
Pennsylvania | 14.2 |
Texas | 13.9 |
Arizona | 13.7 |
Vermont | 13.7 |
Illinois | 13.7 |
Kansas | 13.6 |
North Dakota | 13.5 |
South Dakota | 13.5 |
Iowa | 13.4 |
Nevada | 13.4 |
Idaho | 13.3 |
Oregon | 13.1 |
Wyoming | 13.0 |
Wisconsin | 12.9 |
Nebraska | 12.7 |
Massachusetts | 12.5 |
California | 11.9 |
Delaware | 11.7 |
Connecticut | 11.6 |
Virginia | 11.3 |
New Jersey | 11.0 |
Minnesota | 10.9 |
Colorado | 10.8 |
Washington | 10.6 |
Hawaii | 10.0 |
Maryland | 10.0 |
New Hampshire | 9.9 |
Alaska | 9.9 |
Utah | 9.2 |
Here is the R code for making these statistics:
library(tidycensus)
income <- get_acs(
geography = “state”,
table = ‘B19001’,
year = 2020,
output = ‘wide’,
survey = “acs5”,
geometry = F);
perincome <- income %>% select(ends_with(‘E’), -ends_with(‘001E’)) %>%
rowwise() %>% mutate(total = sum(across(matches(‘dE’)))) %>%
mutate(across(matches(‘dE’), ~(./total)*100 )) %>% select(-total)
perincome %>% rowwise %>% mutate(under20k = sum(across(c(B19001_002E, B19001_003E, B19001_004E)))) %>% select(NAME, under20k) %>% arrange(-under20k) %>% write_csv(‘/tmp/hhunder20k.csv’)
Median Home Value
The Eastern Seaboard from Washington DC to Boston MA has some of the United States' highest home values, along with Southern California from San Francisco down to Los Angeles. Seattle, Colorado, and parts of Southern Florida are also expensive places to live. Surprisingly, Chicago IL is relatively inexpensive place to buy a home. Areas that are blue and green are less expensive to buy a home, yellows are about average, while oranges and reds are the most expensive places to own a home.
Data Source: Median Home Value, 2011-2015 American Community Survey 5-Year Estimates. https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t
Working PANDAS and American Community Survey Summary File
Want to be able to work with American Community Survey data offline using your own local copy of the ACS 5-year Summary File? It’s pretty easy to do with PANDAS. If you are planning a lot of Census queries, this can be a very fast way to extract data.
Before you can use this script, you will need to download some data:
- Sequence Table Lookup File (Data Headers): ACS_5yr_Seq_Table_Number_Lookup.xlsx
- Geography Table Lookup File (Links Logical Record Number to Geography): 5_year_Mini_Geo.xlsx
- Summary File (Actual Data): 5 Year Sequence by State Summary File – Each state has two files, one with all geographies except Blockgroups, and one with just Census Tracts and Blockgroups
import pandas as pd
path = '/home/andy/Desktop/acs-summary-file/'
# list of geography
geo = pd.read_excel(path+'5_year_Mini_Geo.xlsx', sheet_name='ny',index_col='Logical Record Number')
# load headers
header = pd.read_excel(path+'ACS_5yr_Seq_Table_Number_Lookup.xlsx')
# create a column with census variable headers
header['COL_NAME'] = header['Table ID'] + '_' + header['Line Number'].apply(lambda a: "{0:.0f}".format(a).zfill(3))
# segment id, along with ACS year and state
segId = 135
year = 2019
state = 'ny'
# create a list of headers for segment file
segHead = ['FILEID','FILETYPE','STUSAB','CHARITER','SEQUENCE','LOGRECNO'] \
+ header.query('`Sequence Number` == '+str(segId)).dropna(subset=['Line Number'])['COL_NAME'].to_list()
# read the segment file, including column names above
seg = pd.read_csv(path+'e'+str(year)+'5'+state+(str(segId).zfill(4))+'000.txt',header=None, names=segHead, index_col=5)
# join the segment file to geography using Logical Record number
seg = geo.join(seg)
# calculate percentage of households with internet subscriptions -- codes from ACS_5yr_Seq_Table_Number_Lookup.xlsx
seg['Internet Subscription']=seg['B28011_002']/seg['B28011_001']*100
# output the percentage of households by county with internet subscriptions
seg[seg['Geography ID'].str.startswith('050')][['Geography Name','Internet Subscription']]
Geography Name | Internet Subscription | |
---|---|---|
Logical Record Number | ||
13 | Albany County, New York | 83.888889 |
14 | Allegany County, New York | 76.248050 |
15 | Bronx County, New York | 75.917821 |
16 | Broome County, New York | 82.222562 |
17 | Cattaraugus County, New York | 72.431480 |
… | … | … |
70 | Washington County, New York | 80.224036 |
71 | Wayne County, New York | 81.508715 |
72 | Westchester County, New York | 86.371288 |
73 | Wyoming County, New York | 78.387887 |
74 | Yates County, New York | 75.916583 |
# alternatively you can display human readable columns automatically
seg.rename(dict(zip(header['COL_NAME'],header['Table Title'])),axis=1)
State | Geography ID | Geography Name | FILEID | FILETYPE | STUSAB | CHARITER | SEQUENCE | Total: | Has one or more types of computing devices: | |
---|---|---|---|---|---|---|---|---|---|---|
Logical Record Number | ||||||||||
1 | NY | 04000US36 | New York | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 7343234.0 | 6581493.0 |
2 | NY | 04001US36 | New York — Urban | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 6433524.0 | 5771681.0 |
3 | NY | 04043US36 | New York — Rural | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 909710.0 | 809812.0 |
4 | NY | 040A0US36 | New York — In metropolitan or micropolitan st… | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 7189902.0 | 6449723.0 |
5 | NY | 040C0US36 | New York — In metropolitan statistical area | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 6796057.0 | 6109882.0 |
… | … | … | … | … | … | … | … | … | … | … |
28400 | NY | 97000US3631920 | Yonkers City School District, New York | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 74897.0 | 65767.0 |
28401 | NY | 97000US3631950 | York Central School District, New York | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 2116.0 | 1964.0 |
28402 | NY | 97000US3631980 | Yorktown Central School District, New York | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 7068.0 | 6751.0 |
28403 | NY | 97000US3632010 | Cuba-Rushford Central School District, New York | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 2629.0 | 2186.0 |
28404 | NY | 97000US3699999 | Remainder of New York, New York | ACSSF | 201900000.0 | ny | 0.0 | 135.0 | 79779.0 | 75425.0 |
Too much work or don’t want to download the summary file yourself? You can query the Census API directly using PyPI’s censusdata library from PIP. For infrequent queries where you are online, for those with Internet at home, you would be much better off just querying the API directly.
import pandas as pd
import censusdata as cd
# attributes to load
cdcol=['B28011_001','B28011_002']
cdf = cd.download('acs5', 2019,
cd.censusgeo([('state', '36'),
('county','*')]),
cdcol)
# seperate out the geoid and geography name
geoid=[]
geoname=[]
for index in cdf.index.tolist():
geopart=''
for part in index.geo:
geopart = geopart + part[1]
geoid.append(geopart)
geoname.append(index.name)
cdf['geoid']=geoid
cdf['geoname']=geoname
# calculate percentage with internet subscriptions
cdf['Internet Subscription']=cdf['B28011_002']/cdf['B28011_001']*100
# output a similar table as above
cdf
Learn how to load into PANDAS the PL 94-171 2020 Redistricting Data, a process that is similar but different then ACS data.
Also, calculate the population of an area and it’s average demographics, including areas that don’t have Census demographics such as Election Districts or County Legislative districts.