Hello again! I’m back with a new post and this time it is about Python. I’ve dabbled in some MOOCs out there and recently completed the Introduction to Programming Nanodegree offered by Udacity. It was exciting, fun, and rewarding. I look forward to more MOOC classes in my future.
In my current role with a hotel company, I’m working with geospatial data much more than I have in the past. We have over 5000 hotels in the US and thus, perform analysis based on location. My task at hand is to join two separate data sets. One data set contains Postal Zip Codes but the other does not. The second data set does have Latitude and Longitude coordinates. So before I can join on Zip Codes, I must first populate the second data set with Zip Codes based on lat/lon values. This is where Python comes in.
After some googling, I discovered a Python library called Geocoder. This library provides geocoding services by leveraging various providers such as Google Maps, Bing, Mapquest, etc. One of its capabilities is the user can provide a lat/lon and it can return the Zip Code of that coordinate. Perfect!
So as test, I wrote the following program which submits a list of coordinates.
I have a CSV file that contains the lat/lon values. I pass this into Geocoder and it returns the Zip Code. The program prints each lat/lon and its Zip Code. Here’s a view of what it looks like when I run it in command line.
In my test script, I’m only looking up 3 coordinates and you can see, it works! Yay!! My first Python program written for something in the real world. So what is next? Well, ultimately my task involves over 100k coordinates. Google’s API caps its free offerings so if I’d like to maintain this program, it is going to cost a little money to complete. Secondly, I need to adjust the program so that it writes back to a CSV file with three columns (lat/lon/zip) instead of just printing on command line. Let me know if you can help otherwise, back to Googling and Stack Overflow I go!