Only nine percent of New Yorkers live at elevations above 1,000 ft - probably even less if you were to do this at the block rather than the block group level
Burlington Flats is a hamlet in the Town of Burlington in Otsego County, New York, United States. It's a few miles west of Cooperstown, in a deep agricultural valley not that far from Basswood State Forest.
I like to describe myself as a data scientist at least on the blog. I think itβs an accurate term to describe what I do professionally and as a hobbyist β I put together data, tease insights out of it, use it to create outputs from the data. I link names and addresses together from various government records, clean addresses and data, do spatial calculations and render things as Excel files, CSV files, and database updates.
A data scientist is not a programmer or a database administrator. He or she doesnβt fix computers. If anything, I break them sometimes by pushing them a bit too hard. But instead, I work to get insights out of data, take one form of data and then transform it. You might say a bit portion of my work β outside of data cleaning both manually and automated β is extract, transform and load. Often Iβll pull data out of the db2 database, work on it and join it in R and then upload it using a different program that was custom written for my needs.
Sometimes I wish I was a computer programmer by training β everything I know was learned mostly by reading and practical use outside of a few classes I took twenty years ago in college on Data Structures and Statistics. But Iβm not needing it in sense I donβt write lengthy C/C++ programs, nor do I worry about user facing interfaces. Instead, I just extract value of data using common tools like SQL, R and some Bash and Python scripts. While I use some AWK, I donβt nearly as much as my predecessor did. AWK is good for simple things, but it doesnβt hold a candle to modern Python and R.
Data science is an interesting field, and one that is surprisingly accessible with relatively easy to use and powerful tools like R and Python. And itβs actually a lot of fun, as youβre not getting into the weeds of computer programming, memory allocation and the alike. A lot of things are relatively simple and clever scripts, and teasing out value of whatβs out there but may not obvious until you join the data together.
It was only in 2021, when I really got interested in Python after a friend suggested I give it a second look for doing data processing for GIS. I also got tired of the sometimes clumsy and slow processing in QGIS, and while I had used some Python to automate things in QGIS, I became quite interested in PANDAS and Python for working with data. I got every book I could get my hands on about writing Python code, with a particular focus on data science. Later that year, actually Labor Day, I stumbled upon the R programming language and tidyverse and ggplot β and with itβs strong graphics capacity and ability to quickly process geospatial data I was hooked.
Since then Iβve been using R Studio every day. Itβs not to say that I donβt occasionally use Python or other languages, or mapping tools like QGIS. But R has such a rich universe of data manipulation tools, it is so powerful and quick for processing data, manipulating spatial data and querying and exporting Census data. R Studio is the tool I use the most at work and for the blog and many other purposes. And it was all something I taught myself all just at first by watching a few Youtube videos while laying in a hammock, drinking a beer at the Perkins Clearing Conservation Easement in Adirondacks.
Maybe it was just dumb luck that the Data Services position opened up when the former director retired and I was a good fit for it. But I really love being able to clean, process and manipulate data every day using powerful tools and generating new insights that are powering government forward.