Python 📍

Python is an interpreted high-level general-purpose programming language. Python’s design philosophy emphasizes code readability with its notable use of significant indentation.

📽️ Videos

PANDAS and PDF Tables

PANDAS and PDF Tables

I discovered the tabula-py library, which is an alternative to converting PDFs externally to Python using pdftotext -layout from the poppler set of utilities. This might be a good alternative to having to install and run a seperate command line program.

import tabula
import pandas as pd

pdf="/home/andy/nov16-Enrollment/AlbanyED_nov16.pdf"
df=pd.concat(tabula.read_pdf(pdf,pages='all', stream=True))

df.query("STATUS=='Active'")
COUNTYELECTION DISTSTATUSDEMREPCONGREWORINDWEPREFOTHBLANKTOTALCOUNTY ELECTION DISTUnnamed: 0
0AlbanyALBANY 001001Active6719100100013101NaNNaN
3AlbanyALBANY 001002Active286203011900045374NaNNaN
6AlbanyALBANY 001003Active468329311900081613NaNNaN
9AlbanyALBANY 001004Active449256141510174576NaNNaN
12AlbanyALBANY 001005Active50000100006NaNNaN
7NaNNaNActive291594003101189476Albany WATERVLIET 004004NaN
10NaNNaNActive339213294357000202847Albany WESTERLO 000001NaN
13NaNNaNActive348119171543000160693Albany WESTERLO 000002NaN
16NaNNaNActive314137262235000196712Albany WESTERLO 000003NaN
19NaNNaNActive91,65835,8212,9915256089,612491419841,926183,402Albany County TotalNaN

There also poppler bindings for python, which you can use after installing the poppler-cpp developer library. This should work but currently it’s not parsing things correctly.

from poppler import load_from_file, PageRenderer

pdf_document = load_from_file(pdfFile)
page_1 = pdf_document.create_page(0)
page_1_text = page_1.text()

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(page_1_text),
   skiprows=3,
   sep='\s{2,}',
   engine='python',
   error_bad_lines=False)

df
COUNTY ELECTION DISTSTATUSDEMREPCONGREWORINDWEP REF OTH BLANK TOTAL
Albany ALBANY 001001Active67.019.01.00.00.01.00.00.00.013.0101.0
Inactive5.02.00.00.00.00.00.00.00.02.09.0
Total72.021.01.00.00.01.00.00.00.015.0110.0
Albany ALBANY 001002Active286.020.03.00.01.019.00.00.00.045.0374.0
Inactive27.08.00.00.00.03.00.00.00.012.050.0
Total313.028.03.00.01.022.00.00.00.057.0424.0
Albany ALBANY 001003Active468.032.09.03.01.019.00.00.00.081.0613.0
Inactive56.05.00.00.00.02.00.00.00.019.082.0
Total524.037.09.03.01.021.00.00.00.0100.0695.0
Albany ALBANY 001004Active449.025.06.01.04.015.01.00.01.074.0576.0
Inactive64.00.01.00.00.04.00.00.00.013.082.0
Total513.025.07.01.04.019.01.00.01.087.0658.0
Albany ALBANY 001005Active5.00.00.00.00.01.00.00.00.00.06.0
Total5.00.00.00.00.01.00.00.00.00.06.0
Albany ALBANY 001006Active462.011.05.02.04.014.00.00.01.057.0556.0
Inactive63.03.00.00.00.05.00.00.01.012.084.0
Total525.014.05.02.04.019.00.00.02.069.0640.0
Albany ALBANY 001007Active435.019.01.00.02.019.00.00.00.062.0538.0
Inactive60.03.00.00.02.03.00.00.00.07.075.0
Total495.022.01.00.04.022.00.00.00.069.0613.0
Albany ALBANY 001008Active160.07.01.00.02.02.00.00.01.037.0210.0
Page 1 of 46NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
Terrain Map: Mahanoy Mountain
Map: Tenant Creek Falls Trail

Been playing around a bit with PANDAS and ArcGIS Feature Servers for Tax Data

Been playing around a bit with PANDAS and ArcGIS Feature Servers for Tax Data … 📜

For example, to find the most valuable parcels in Albany County.

from arcgis.features import FeatureLayer
lyr_url = 'https://gisservices.its.ny.gov/arcgis/rest/services/NYS_Tax_Parcel_Centroid_Points/MapServer/0'
layer = FeatureLayer(lyr_url)
query_result1 = layer.query(where="COUNTY_NAME='Albany' AND FULL_MARKET_VAL > 100000000",
                                    out_fields='PARCEL_ADDR,CITYTOWN_NAME,FULL_MARKET_VAL,OWNER_TYPE', out_sr='4326')

df=query_result1.sdf.sort_values(by='FULL_MARKET_VAL', ascending=False)
df['Full Market Value'] = df['FULL_MARKET_VAL'].map('${:,.0f}'.format)

df
  OBJECTID PARCEL_ADDR CITYTOWN_NAME FULL_MARKET_VAL OWNER_TYPE SHAPE Full Market Value
11 26652 64 Eagle St Albany 1204254925 2 {“x”: -73.75980312511581, “y”: 42.650469918250… $1,204,254,925
3 9150 1200 Washington Ave Albany 886298715 2 {“x”: -73.81092293494828, “y”: 42.679257168282… $886,298,715
4 10208 1400 Washington Ave Albany 642398287 2 {“x”: -73.82369286130952, “y”: 42.685845700657… $642,398,287
0 885 251 Fuller Rd Albany 440042827 2 {“x”: -73.83559002316825, “y”: 42.690208093507… $440,042,827
5 18164 632 New Scotland Ave Albany 377568201 8 {“x”: -73.80381341626146, “y”: 42.655758957669… $377,568,201
1 906 141 Fuller Rd Albany 321199143 2 {“x”: -73.83323986150171, “y”: 42.693189748928… $321,199,143
19 108087 See Card 1067 Watervliet 280898876 1 {“x”: -73.70670724174552, “y”: 42.719628647232… $280,898,876
15 65380 737 Alb Shaker Rd Colonie 263916100 3 {“x”: -73.80365248218001, “y”: 42.747956678125… $263,916,100
9 21923 304 Madison Ave Albany 234265418 2 {“x”: -73.76227373289564, “y”: 42.648000674457… $234,265,418
2 907 201 Fuller Rd Albany 203426124 2 {“x”: -73.83362605353057, “y”: 42.692609131686… $203,426,124
16 69999 515 Loudon Rd Colonie 166065600 8 {“x”: -73.74958475282632, “y”: 42.719321807666… $166,065,600
7 20592 47 New Scotland Ave Albany 162276338 8 {“x”: -73.77597163421673, “y”: 42.653565689693… $162,276,338
6 20574 132 S Lake Ave Albany 146296360 2 {“x”: -73.77970918544908, “y”: 42.654390366929… $146,296,360
8 20597 113 Holland Ave Albany 143498501 2 {“x”: -73.77306688593143, “y”: 42.650762742870… $143,498,501
17 78203 Mannsville Colonie 142570400 1 {“x”: -73.71245452369443, “y”: 42.718124477080… $142,570,400
18 95509 1 Crossgates Mall Rd Guilderland 130554700 8 {“x”: -73.84702700595471, “y”: 42.687699053797… $130,554,700
10 24521 86 S Swan St Albany 128436403 2 {“x”: -73.75980563770365, “y”: 42.653931892804… $128,436,403
13 46883 1916 US 9W Coeymans 110000000 8 {“x”: -73.83388475575597, “y”: 42.488730743021… $110,000,000
12 35152 380 River Rd Bethlehem 105263158 8 {“x”: -73.76445503554325, “y”: 42.595925419330… $105,263,158
14 65097 15 Wolf Rd Colonie 101967213 8 {“x”: -73.81423716588279, “y”: 42.709939498581… $101,967,213
Map: Granger State Forest
Map: Lookout State Forest