SQL Join Daily and Monthly

This week I was doing some analysis and came across a common issue when using multiple data sets. I work with a variety of tables in our data warehouse, which vary in granularity. My task required blending a table that contains daily records with a table that contained monthly records. If I just LEFT JOIN on month(date) and year(date), and the foreign key, I would end up with duplicate records because the join would pull in monthly records at the daily level. My solution to this was to leverage SQL Analytic Functions.

Analytic Functions vary between systems and versions. At my office, we use HP Vertica as our data warehouse. But the one I used should be available on most, if not all, SQL products. ROW_NUMBER is an easy way to create a column that will count rows based on your input. It takes two parameters. The first, PARTITION BY, is optional. You can reset the counter on one or multiple columns or partitions. The second parameter is ORDER BY, which structures the results so your partition counter is as you desire.

As I work in the hotel industry, my daily table contains consumed room nights for our properties. The monthly table has other key metrics at the property level.

Here’s a basic representation of my tables:

ScreenHunter_29 Jun. 09 14.20

I needed to pull one of those metrics to the daily level so I could compare two values Year over Year, aggregated monthly. I used ROW_NUMBER to initiate a counter that resets for every unique year(date), month(date) and property code. The actual syntax for this is as follows:

ROW_NUMBER() OVER (PARTITION BY year(date), month(date), property_code ORDER BY date, property_code)

If I use this statement on my daily table, the result would look as follows:

ScreenHunter_23 Jun. 09 13.47

The result is a counter column that resets every distinct year/month and property code. This can be used to mark where I pull in the value from my monthly table, knowing it will not duplicate anywhere else in the result set. Taking it one step further, I used this function inside a CASE statement. In my SELECT query, I wrote:

CASE WHEN ROW_NUMBER() OVER (PARTITION BY year(date), month(date), property_code ORDER BY date, property_code) = 1 then monthly.value END AS column_name

Using my sample data from above, here’s how the results would appear:

ScreenHunter_30 Jun. 09 14.22

The key here is the equal to 1. Each time the counter resets to 1, I pull in the desired value from my monthly table, else the column is null. My results are still at the daily level. Remember, I still need to LEFT JOIN the tables on month(date) and year(date), and property_code. My data is now ready to bring it all into Tableau and I can start my visualization.

Share Button

Tabcmd Part 2

Last year I wrote about Tableau Server’s tabcmd feature here. Well just this week, I’ve been working on a solution that leverages tabcmd. The problem is that we need a PDF copy of one dashboard with more than 5000 options on one filter. Our company has over 5000 franchise hotels in the US, so one per property. Luckily, this can be achieved rather painlessly with tabcmd.

The best way to handle this would be to create a batch file that contains all the commands. This is as easy as launching a text editor and saving the file with a .bat extension. Start by login into your server.

tabcmd login -s http://localhost -t site -u username -p p@ssw0rd!

The -t site is optional. It allows a users to log into a specific site within the tableau server installation. If you’re sharing this code and don’t want your password sent around (good InfoSec practice), you can create just a text file with your password and refer to that file with –password-file pwd.txt.

Next is the meat of the script, all 5000+ iterations. Tableau Server allows for filters to be passed into the URL. I used the export tabcmd to generate a PDF and save to a location.

tabcmd export “workbook/view?Filter=val” –pdf –pagelayout landscape -f “C:\filename.pdf”

There are some options that can be included in the export command, like landscape. Be sure to check the documentation for other possibilities. I used Excel to quickly create each line of the tabcmd, changing the Filter value.

The last thing to remember is to log out of the server. So close the batch file with tabcmd logout. Simply double click the .bat file that was created and your script will start. Thanks for reading and please let me know what you’ve been able to do with tabcmd.

Share Button

Python Texting

So I’m going to get a little personal with this post. I’ve been having fun getting more into Python and found a great use case for a pet project. By the way, two Python posts in a row?!? Hack all the things! Ok, my lovely wife has a bad habit which I just don’t understand. She doesn’t like drinking water and forgets to do it. I know, right! Very strange. I drink water constantly so it’s very foreign concept to me.

In order to help her out, I decided to write a program to remind her to drink water. The best way to do this was to send her a text message. There’s a fantastic service called Twilio that provides messaging capabilities. If you’ve ever ordered an Uber, you’ve gotten a text via Twilio. So I created an account and got an API key (free) and my own Twilio phone number ($1). You can do testing for free though.

Next, I installed Twilio’s Python library on my machine. Their library makes its super simple to write programs.  I’m using python 2.7.10 so I simply installed with pip.

ScreenHunter_42 Feb. 15 10.52

Check out Twilio’s API libraries here for your desired language. After reviewing Twilio’s documentation, I was able to quickly send myself a test text message. Now I just needed to script out what I’d like to send to my wife. I created a Python List that contains the messages and the program will randomly select an item from that list to text her. I keep adding to this list as I think of new witty comments. :)

You can view the full code on my GitHub page.

So how is this automated you may ask? Every Windows machine has a Task Scheduler. You can create your own tasks and assign the schedule desired. I created a task that launches Python and executes the .py file for this program every two hours. Yes, it only runs when my computer is on. The code also checks for day of week and time of day so it will execute a text message if it is Monday – Friday, between 10am and 5 pm. When we’re at home, I can just verbally remind her to drink water.

This was a very fun project and even was well received when my wife Facebooked about it. It got many Likes and even other users who were interested in the same. Perhaps I can take this even further and offer it as a full-fledged app. Looking forward to more code adventures!

Share Button

My First Python Program

Hello again! I’m back with a new post and this time it is about Python. I’ve dabbled in some MOOCs out there and recently completed the Introduction to Programming Nanodegree offered by Udacity. It was exciting, fun, and rewarding. I look forward to more MOOC classes in my future.

In my current role with a hotel company, I’m working with geospatial data much more than I have in the past. We have over 5000 hotels in the US and thus, perform analysis based on location. My task at hand is to join two separate data sets. One data set contains Postal Zip Codes but the other does not. The second data set does have Latitude and Longitude coordinates. So before I can join on Zip Codes, I must first populate the second data set with Zip Codes based on lat/lon values. This is where Python comes in.

After some googling, I discovered a Python library called Geocoder. This library provides geocoding services by leveraging various providers such as Google Maps, Bing, Mapquest, etc. One of its capabilities is the user can provide a lat/lon and it can return the Zip Code of that coordinate. Perfect!

So as test, I wrote the following program which submits a list of coordinates.

I have a CSV file that contains the lat/lon values. I pass this into Geocoder and it returns the Zip Code. The program prints each lat/lon and its Zip Code. Here’s a view of what it looks like when I run it in command line.

ScreenHunter_37 Dec. 23 09.50

In my test script, I’m only looking up 3 coordinates and you can see, it works! Yay!! My first Python program written for something in the real world. So what is next? Well, ultimately my task involves over 100k coordinates. Google’s API caps its free offerings so if I’d like to maintain this program, it is going to cost a little money to complete. Secondly, I need to adjust the program so that it writes back to a CSV file with three columns (lat/lon/zip) instead of just printing on command line. Let me know if you can help :) otherwise, back to Googling and Stack Overflow I go!

Share Button

Org Chart in Tableau

Since starting my new position, I’ve been talking about my work and department with my wife. I set out to create an organizational chart to give her more context when I speak about certain things. There are plenty of tools that facilitate org charts. Microsoft alone has Visio and Powerpoint. Heck, you could even use Excel or Word. But hey, I’m a Tableau user, so let make this more interesting.

Org charts in Tableau are not native. I searched the web and came across this thread on Tableau’s Ideas section. It would be interesting if Tableau added this as a Show Me feature. Perhaps they will someday. So I used this thread as a starting point on how to accomplish my goal.

The key to an org chart is connecting points. Let’s also not forget that the data structure is important when it comes to Tableau. Each data point needs two rows of data. You’re drawing a line to connect dots, so you need to have one row for the starting location (direct report) and one row for the end location (manager). Here’s how I set up my data:

ScreenHunter_01 Sep. 12 10.47

I have three individuals in this sample. Name1 is the top of the chart, so note the blank X, Y coordinates in the repeated row. This is because Name1 does not have a line to connect to its manager. I assigned each person an ID and mapped their manager ID. Only records in Order 1 have a Display Name and Display Title. This is used in the Label shelf so the records aren’t displayed twice. The hardest part of this is the X and Y coordinates. Depending on the size and branches of your org chart, you have to play around with their values. On a simple chart, you could probably automate the generation of these with Excel formulas.

The entire data file is stored here.

On to building the chart. Place the X values on the column shelf and the Y values on the row shelf. Go to Analysis and deselect Aggregate Measures. You should have something like this.

ScreenHunter_02 Sep. 12 11.06

Add the Y values to the rows again and enable dual axis with synchronized axis. On the Marks card, I set the first Y values to Line and the second to Square. Now you have something like this. It’s close but obviously needs a little more work.

ScreenHunter_06 Sep. 12 16.48

Each pair of records needs an unique ID. This ID will connect the hierarchy in the proper manner, instead of the mess above. We have this unique value as the ID column. Place the ID column in the Details shelf. From here you’re pretty much done. All that is left is formatting and labeling the values.

ScreenHunter_07 Sep. 12 16.53

Here’s my finished product. I placed this single sheet into a dashboard and set it to legal landscape. Click the image to navigate to the Tableau Public version if you wish to download the workbook.

Organizational Chart

Thanks for tuning in. Hopefully one day, I’ll be the at the top of this chart :).

Share Button

Update

It has been a few months since I last posted. I wanted to take a quick minute to give an update. I took some time off from blogging as I began a job search. Like many others in the data blogging community, I’ve been very lucky to have turned my data passion into a career. I have moved to Choice Hotels International as a Tableau Developer and Analyst in the Business Intelligence Group. After 10 years with Discovery Communications, I decided it was time for a new challenge. I will miss many Discovery colleagues but have been making new friends at Choice. Discovery has so many talented individuals so I know everything I was part of there is in great hands.

It’s been a little intimidating moving industries. You take for granted the institutional knowledge that is acquired after 10 years of employment in one location. Now moving to the hospitality industry, I have so much to learn. RevPar, RevPar Index, ADR and the like are the foundations of hospitality reporting and analytics. It’s a been a fun ride so far and I look forward to much more.

So that is my quick update. I plan on being more active again now that things are settling down. Thanks for listening.

Share Button

Tableau in the office – episode 3

I’ve been revising some old dashboards at work that definitely are due for a refresh. One of the updates is to demonstrate Year over Year for a given metric. I wanted the calculation for current year to compare Jan 1 to date against previous year. But if the user selects to view YoY of previous years, I wanted to show full year difference, not Jan 1 to date. Below was my approach.

For sample data, I created an Excel file with random values per date from 1/1/2011 to 5/31/2015. You can download a copy of it here. To get my desired result, it will take one parameter and three calculated fields in the Filter shelf.

First, I created a Parameter for Year.

ScreenHunter_01 May. 20 16.07

Then I created a formula to only show the year selected in the Parameter and the year before it. This is a boolean expression and I place it on the Filter shelf, selecting the True result.

[Year] = year([Date]) or [Year] – 1 = year([Date])

In my scenario, I only want to show YoY of a year at a time. And since YoY is created with a table calculation for percent difference from previous, I needed a way to keep the result but hide the previous year in the view since it is blank. I used a simple index() calculation and placed this in the Filter shelf, selecting the value “2”.

Next, the following formula creates a calculated field which again is used in the Filter shelf. I’ve broken down the formula with the //comments.

//If a data point’s date is this year and in past months, keep it.
if [Year] = datepart(‘year’, today())
and datepart(‘month’,[Date]) < datepart(‘month’, today())
then ‘keep’
//if a data point’s date is in this year and in current or future months, filter it.
elseif [Year] = datepart(‘year’, today())
and datepart(‘month’,[Date]) > datepart(‘month’, today())
then ‘filter’
//if a data point’s date is in past years, keep it.
elseif [Year] < datepart(‘year’, today())
then ‘keep’
else ‘filter’
end

Place this calculated field in Filter shelf and select “keep”. Here are the three calculated fields in the Filter shelf. Any field I create strictly for filtering, I name with a “f.” prefix so I know its intended purpose.

ScreenHunter_02 May. 21 09.57

Now I place the Units measure in the Text shelf, and change the Sum(Units) to a Percent Difference table calculation.

ScreenHunter_03 May. 21 10.02

I also created a “label” field so the text changes depending on the Year parameter. This is to make it clear to the audience that YoY for current year is a Jan 1 to date comparison over previous year. But previous year to previous year minus 1 comparisons are Full Year over Full Year.

if [Year] = datepart(‘year’,today()) then “YoY” else “FYoFY” END

Add this “label” field to the Text shelf and lastly, add Year(Date) to the Text shelf too. Once all my text fields are in place, I edit the label as follows:

ScreenHunter_06 May. 21 10.09

The final result is below. Change the Parameter to view the differing YoY or FYoFY calculations. I included a table with the sample data.

Thanks for tuning in!

Share Button

Tableau server tip – Email Action

A quick and easy way to provide your user base the ability to email any questions about a given dashboard is to use a URL action. If you’re unfamiliar with actions, definitely check out this Knowledge Base article.

First, I decided to use an image of a question mark as a flag for the user. After a quick google image search, I found the one below that I liked. Download the image and save it to your …\My Tableau Repository\Shapes folder. I created a “Custom” sub-folder in this path to keep my custom shapes together.

ScreenHunter_01 Mar. 31 14.38

In your workbook, create a new sheet. The data source used is unimportant as you’re just going to create a viz with this shape in it.

Change the Marks to Shape, click the Shape pill, click More Shapes and choose the Custom folder (or what ever you named it) from the Select Shape Palette drop down. Click your shape and click OK.

ScreenHunter_02 May. 19 16.19

Now drop the omnipresent Number of Records measure into the Shape pill. You should now have a sheet that looks like this:

ScreenHunter_03 May. 19 16.23

You can enlarge the view by dragging along the sides of the sheet. Use the Size pill to make the shape larger or smaller as desired. I also format the sheet to remove Row and Column Dividers. In the Tooltip, replace existing text with something like “If you have questions about this data, please email:”.

Place this sheet anywhere in your dashboard. I prefer mine at the upper corner, right of the dashboard title.

ScreenHunter_04 May. 19 16.30

Create a URL Action. Name it your desired receiving email address. In the URL section, type: “mailto: email_address”

ScreenHunter_06 May. 19 16.36

Now when a user hovers over the image, the Tooltip will appear with the text and hyperlink. In my organization we use MS Outlook. Clicking the link launches a new email which contains your receiving email address already populated.

ScreenHunter_07 May. 19 16.40

Pretty simple trick. Hope you enjoyed it.

 

Share Button

Tableau Server Tip – Tabcmd

It’s been a little while since I have been able to post. But here I am back for a quick tip. I’m embarrassed to say that I only recently learned that Tabcmd (I pronounce it “Tab Command”) utilities can be installed and used from other machines, not just the one running your Tableau Server installation. If you’re unfamiliar with Tabcmd, these are a set of command line utilities that allow for automation and batch processing of tasks. This can be very handy when compiled into a BAT file and scheduled as a task.

Or just fun to play with, but be careful. There are delete commands that can be used. For the full list of Tabcmd options, check out the following link.

For you Alteryx users, Tabcmd can be implemented into your module. Interworks published a great article on how to take an output of a TDE and publish it to Tableau Server with Tabcmd. This is where being able to install Tabcmd on your machine shines. You don’t need to have Alteryx installed on your machine running Tableau Server. Check it out here.

Thanks for tuning in.

Share Button

March Madness Upsets

March Madness is one of my favorite sporting events. The excitement, the format, the emotion, the buzzer beaters, it encapsulates everything that is great about sports. So as I was watching the games these past two days, I started to think about all the upsets I have seen in my life. Naturally, a viz idea was born.

The below is summary of all March Madness upsets since 1985, the current format. I grabbed the data from this wikipedia page using Kimono, a great and simple tool to extract structured data from the web. Please note, the lowest seed considered an upset is 11. So no 10 vs. 7 or 9 vs. 8 match ups are present.

 

As for my personal brackets this year, the word of the day is #busted.

Share Button