Category Archives: Case Statement

SQL Join Daily and Monthly

This week I was doing some analysis and came across a common issue when using multiple data sets. I work with a variety of tables in our data warehouse, which vary in granularity. My task required blending a table that contains daily records with a table that contained monthly records. If I just LEFT JOIN on month(date) and year(date), and the foreign key, I would end up with duplicate records because the join would pull in monthly records at the daily level. My solution to this was to leverage SQL Analytic Functions.

Analytic Functions vary between systems and versions. At my office, we use HP Vertica as our data warehouse. But the one I used should be available on most, if not all, SQL products. ROW_NUMBER is an easy way to create a column that will count rows based on your input. It takes two parameters. The first, PARTITION BY, is optional. You can reset the counter on one or multiple columns or partitions. The second parameter is ORDER BY, which structures the results so your partition counter is as you desire.

As I work in the hotel industry, my daily table contains consumed room nights for our properties. The monthly table has other key metrics at the property level.

Here’s a basic representation of my tables:

ScreenHunter_29 Jun. 09 14.20

I needed to pull one of those metrics to the daily level so I could compare two values Year over Year, aggregated monthly. I used ROW_NUMBER to initiate a counter that resets for every unique year(date), month(date) and property code. The actual syntax for this is as follows:

ROW_NUMBER() OVER (PARTITION BY year(date), month(date), property_code ORDER BY date, property_code)

If I use this statement on my daily table, the result would look as follows:

ScreenHunter_23 Jun. 09 13.47

The result is a counter column that resets every distinct year/month and property code. This can be used to mark where I pull in the value from my monthly table, knowing it will not duplicate anywhere else in the result set. Taking it one step further, I used this function inside a CASE statement. In my SELECT query, I wrote:

CASE WHEN ROW_NUMBER() OVER (PARTITION BY year(date), month(date), property_code ORDER BY date, property_code) = 1 then monthly.value END AS column_name

Using my sample data from above, here’s how the results would appear:

ScreenHunter_30 Jun. 09 14.22

The key here is the equal to 1. Each time the counter resets to 1, I pull in the desired value from my monthly table, else the column is null. My results are still at the daily level. Remember, I still need to LEFT JOIN the tables on month(date) and year(date), and property_code. My data is now ready to bring it all into Tableau and I can start my visualization.

Share Button

Manual UTC Conversion

During a project last year I was creating a FACT table from one of many databases in my organization used to track program information. Using SQL Server, one of the fields that I wanted to pull was Air Date/Time of a program. When I first created my fact table, I noticed that the Air Date/Time field was in UTC. I suspect this database was created with a time OFFSET field but I didn’t have schema documentation for it.

I decided create my own conversion calculation to East Coast time. I’ve been working with SQL for a little time now but I’m no expert. So the challenge I faced was adjusting for Standard versus Day Light Savings because UTC does not change. I looked up the dates when we adjust our clocks from 2011 – 2025. The rest was the following SQL CASE statement:

CASE
WHEN [Table].[Date Field] BETWEEN ‘2011-03-13 07:00:00:000’ AND ‘2011-11-06 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2012-03-11 07:00:00:000’ AND ‘2012-11-04 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2013-03-10 07:00:00:000’ AND ‘2013-11-03 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2014-03-09 07:00:00:000’ AND ‘2014-11-02 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2015-03-08 07:00:00:000’ AND ‘2015-11-01 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2016-03-13 07:00:00:000’ AND ‘2016-11-06 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2017-03-12 07:00:00:000’ AND ‘2017-11-05 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2018-03-11 07:00:00:000’ AND ‘2018-11-04 06:00:00:000 THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2019-03-10 07:00:00:000’ AND ‘2019-11-03 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2020-03-08 07:00:00:000’ AND ‘2020-11-01 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2021-03-14 07:00:00:000’ AND ‘2021-11-07 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2022-03-13 07:00:00:000’ AND ‘2022-11-06 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2023-03-12 07:00:00:000’ AND ‘2023-11-05 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2024-03-10 07:00:00:000’ AND ‘2024-11-03 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
WHEN [Table].[Date Field] BETWEEN ‘2025-03-09 07:00:00:000’ AND ‘2025-11-02 06:00:00:000’ THEN DATEADD(hour,-4,[Table].[Date Field])
ELSE DATEADD(hour,-5,[Table].[Date Field]) END AS [Date Name]

I first created a parameter in Tableau to let the user decide to view the data between standard and daylight savings. But I wanted the users to avoid any confusion between the viz and any other area where one can see program air date/time. Secondly, for Tableau best data practices, do heavy data lifting calculations before connection to Tableau. Software like SQL Server are built for this type of data manipulation. Especially when working with millions of records, long winded calculations can slow Tableau performance.

Thanks to my manager for encouraging this post. I hope it saves anyone time who needs to convert UTC date fields. 

UPDATE: To perform the same in Tableau click here.

Share Button