Python — Andy Arthur.org

comparing code and outputs for vector data

Andy | September 7, 2023

geocompx – Geographic data analysis in R and Python: comparing code and outputs for vector data

PANDAS and PDF Tables

Andy | August 17, 2023

PANDAS and PDF Tables

I discovered the tabula-py library, which is an alternative to converting PDFs externally to Python using pdftotext -layout from the poppler set of utilities. This might be a good alternative to having to install and run a seperate command line program.

import tabula
import pandas as pd

pdf="/home/andy/nov16-Enrollment/AlbanyED_nov16.pdf"
df=pd.concat(tabula.read_pdf(pdf,pages='all', stream=True))

df.query("STATUS=='Active'")

	COUNTY	ELECTION DIST	STATUS	DEM	REP	CON	GRE	WOR	IND	WEP	REF	OTH	BLANK	TOTAL	COUNTY ELECTION DIST	Unnamed: 0
0	Albany	ALBANY 001001	Active	67	19	1	0	0	1	0	0	0	13	101	NaN	NaN
3	Albany	ALBANY 001002	Active	286	20	3	0	1	19	0	0	0	45	374	NaN	NaN
6	Albany	ALBANY 001003	Active	468	32	9	3	1	19	0	0	0	81	613	NaN	NaN
9	Albany	ALBANY 001004	Active	449	25	6	1	4	15	1	0	1	74	576	NaN	NaN
12	Albany	ALBANY 001005	Active	5	0	0	0	0	1	0	0	0	0	6	NaN	NaN
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
7	NaN	NaN	Active	291	59	4	0	0	31	0	1	1	89	476	Albany WATERVLIET 004004	NaN
10	NaN	NaN	Active	339	213	29	4	3	57	0	0	0	202	847	Albany WESTERLO 000001	NaN
13	NaN	NaN	Active	348	119	17	1	5	43	0	0	0	160	693	Albany WESTERLO 000002	NaN
16	NaN	NaN	Active	314	137	26	2	2	35	0	0	0	196	712	Albany WESTERLO 000003	NaN
19	NaN	NaN	Active	91,658	35,821	2,991	525	608	9,612	49	14	198	41,926	183,402	Albany County Total	NaN

There also poppler bindings for python, which you can use after installing the poppler-cpp developer library. This should work but currently it’s not parsing things correctly.

from poppler import load_from_file, PageRenderer

pdf_document = load_from_file(pdfFile)
page_1 = pdf_document.create_page(0)
page_1_text = page_1.text()

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(page_1_text),
   skiprows=3,
   sep='\s{2,}',
   engine='python',
   error_bad_lines=False)

df

				COUNTY ELECTION DIST	STATUS	DEM	REP	CON	GRE	WOR	IND	WEP REF OTH BLANK TOTAL
Albany ALBANY 001001	Active	67.0	19.0	1.0	0.0	0.0	1.0	0.0	0.0	0.0	13.0	101.0
Inactive	5.0	2.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.0	9.0
Total	72.0	21.0	1.0	0.0	0.0	1.0	0.0	0.0	0.0	15.0	110.0
Albany ALBANY 001002	Active	286.0	20.0	3.0	0.0	1.0	19.0	0.0	0.0	0.0	45.0	374.0
Inactive	27.0	8.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	12.0	50.0
Total	313.0	28.0	3.0	0.0	1.0	22.0	0.0	0.0	0.0	57.0	424.0
Albany ALBANY 001003	Active	468.0	32.0	9.0	3.0	1.0	19.0	0.0	0.0	0.0	81.0	613.0
Inactive	56.0	5.0	0.0	0.0	0.0	2.0	0.0	0.0	0.0	19.0	82.0
Total	524.0	37.0	9.0	3.0	1.0	21.0	0.0	0.0	0.0	100.0	695.0
Albany ALBANY 001004	Active	449.0	25.0	6.0	1.0	4.0	15.0	1.0	0.0	1.0	74.0	576.0
Inactive	64.0	0.0	1.0	0.0	0.0	4.0	0.0	0.0	0.0	13.0	82.0
Total	513.0	25.0	7.0	1.0	4.0	19.0	1.0	0.0	1.0	87.0	658.0
Albany ALBANY 001005	Active	5.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	6.0
Total	5.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	6.0
Albany ALBANY 001006	Active	462.0	11.0	5.0	2.0	4.0	14.0	0.0	0.0	1.0	57.0	556.0
Inactive	63.0	3.0	0.0	0.0	0.0	5.0	0.0	0.0	1.0	12.0	84.0
Total	525.0	14.0	5.0	2.0	4.0	19.0	0.0	0.0	2.0	69.0	640.0
Albany ALBANY 001007	Active	435.0	19.0	1.0	0.0	2.0	19.0	0.0	0.0	0.0	62.0	538.0
Inactive	60.0	3.0	0.0	0.0	2.0	3.0	0.0	0.0	0.0	7.0	75.0
Total	495.0	22.0	1.0	0.0	4.0	22.0	0.0	0.0	0.0	69.0	613.0
Albany ALBANY 001008	Active	160.0	7.0	1.0	0.0	2.0	2.0	0.0	0.0	1.0	37.0	210.0
Page 1 of 46	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Thematic Map: Census Tracts in America with Fewer then 100 People Per Square Mile

Thematic Map: US Median Household Income by Census Tract

Been playing around a bit with PANDAS and ArcGIS Feature Servers for Tax Data

Andy | August 5, 2023

Been playing around a bit with PANDAS and ArcGIS Feature Servers for Tax Data … 📜

For example, to find the most valuable parcels in Albany County.

from arcgis.features import FeatureLayer
lyr_url = 'https://gisservices.its.ny.gov/arcgis/rest/services/NYS_Tax_Parcel_Centroid_Points/MapServer/0'
layer = FeatureLayer(lyr_url)
query_result1 = layer.query(where="COUNTY_NAME='Albany' AND FULL_MARKET_VAL > 100000000", 
                                    out_fields='PARCEL_ADDR,CITYTOWN_NAME,FULL_MARKET_VAL,OWNER_TYPE', out_sr='4326')

df=query_result1.sdf.sort_values(by='FULL_MARKET_VAL', ascending=False)
df['Full Market Value'] = df['FULL_MARKET_VAL'].map('${:,.0f}'.format)

df

	OBJECTID	PARCEL_ADDR	CITYTOWN_NAME	FULL_MARKET_VAL	OWNER_TYPE	SHAPE	Full Market Value
11	26652	64 Eagle St	Albany	1204254925	2	{“x”: -73.75980312511581, “y”: 42.650469918250…	$1,204,254,925
3	9150	1200 Washington Ave	Albany	886298715	2	{“x”: -73.81092293494828, “y”: 42.679257168282…	$886,298,715
4	10208	1400 Washington Ave	Albany	642398287	2	{“x”: -73.82369286130952, “y”: 42.685845700657…	$642,398,287
0	885	251 Fuller Rd	Albany	440042827	2	{“x”: -73.83559002316825, “y”: 42.690208093507…	$440,042,827
5	18164	632 New Scotland Ave	Albany	377568201	8	{“x”: -73.80381341626146, “y”: 42.655758957669…	$377,568,201
1	906	141 Fuller Rd	Albany	321199143	2	{“x”: -73.83323986150171, “y”: 42.693189748928…	$321,199,143
19	108087	See Card 1067	Watervliet	280898876	1	{“x”: -73.70670724174552, “y”: 42.719628647232…	$280,898,876
15	65380	737 Alb Shaker Rd	Colonie	263916100	3	{“x”: -73.80365248218001, “y”: 42.747956678125…	$263,916,100
9	21923	304 Madison Ave	Albany	234265418	2	{“x”: -73.76227373289564, “y”: 42.648000674457…	$234,265,418
2	907	201 Fuller Rd	Albany	203426124	2	{“x”: -73.83362605353057, “y”: 42.692609131686…	$203,426,124
16	69999	515 Loudon Rd	Colonie	166065600	8	{“x”: -73.74958475282632, “y”: 42.719321807666…	$166,065,600
7	20592	47 New Scotland Ave	Albany	162276338	8	{“x”: -73.77597163421673, “y”: 42.653565689693…	$162,276,338
6	20574	132 S Lake Ave	Albany	146296360	2	{“x”: -73.77970918544908, “y”: 42.654390366929…	$146,296,360
8	20597	113 Holland Ave	Albany	143498501	2	{“x”: -73.77306688593143, “y”: 42.650762742870…	$143,498,501
17	78203	Mannsville	Colonie	142570400	1	{“x”: -73.71245452369443, “y”: 42.718124477080…	$142,570,400
18	95509	1 Crossgates Mall Rd	Guilderland	130554700	8	{“x”: -73.84702700595471, “y”: 42.687699053797…	$130,554,700
10	24521	86 S Swan St	Albany	128436403	2	{“x”: -73.75980563770365, “y”: 42.653931892804…	$128,436,403
13	46883	1916 US 9W	Coeymans	110000000	8	{“x”: -73.83388475575597, “y”: 42.488730743021…	$110,000,000
12	35152	380 River Rd	Bethlehem	105263158	8	{“x”: -73.76445503554325, “y”: 42.595925419330…	$105,263,158
14	65097	15 Wolf Rd	Colonie	101967213	8	{“x”: -73.81423716588279, “y”: 42.709939498581…	$101,967,213