March Madness Upsets

March Madness is one of my favorite sporting events. The excitement, the format, the emotion, the buzzer beaters, it encapsulates everything that is great about sports. So as I was watching the games these past two days, I started to think about all the upsets I have seen in my life. Naturally, a viz idea was born.

The below is summary of all March Madness upsets since 1985, the current format. I grabbed the data from this wikipedia page using Kimono, a great and simple tool to extract structured data from the web. Please note, the lowest seed considered an upset is 11. So no 10 vs. 7 or 9 vs. 8 match ups are present.

 

As for my personal brackets this year, the word of the day is #busted.

Share Button

You’re Welcome

This past week I was at a happy hour for a departing co-worker. While there I started talking to someone who I recently pointed her to a server viz for data. In our Post Production facility, we have an application called ScheduALL. It is a resource management system used by many media companies. The system has reporting capabilities via Crystal Reports. Our users generally provide volume/cost reports of network activity to others who do not have access to ScheduALL. Last year I had setup a variety of server vizzes for anyone to view network spend in our facility based on their network assignment. Our application runs on SQL Server and we have setup nightly jobs to create FACT tables of transaction data.

So back to the happy hour. The lady I was speaking this week had asked for assistance on how to generate these reports out of the system. This is a task where users run the data, export to Excel and create a pivot table of the results. The problem is our network reps are asked for this constantly. I instead sent her a link to a server viz that already contained the results and filters for more precise output. She now uses the tableau server viz instead of generating her own report. She also forwarded the link to her customer who was requesting the data.

I am not exaggerating when in her effusive Thank You to me, she said this new method of providing data to clients was “life changing.” Both her and her client were truly grateful for what I had provided with the power of Tableau Server. I love seeing newcomers to our world just be utterly wow’d.

So to her and everyone else who has thanked me for doing something I love, you’re welcome!

Share Button

Alteryx in the office – episode 2

I had a task the other week where I needed to load Excel data into a datamart. The problem was that the Excel data was not formatted like my destination table in the datamart. So I fired up Alteryx and started playing around.

The Excel sheet was structured with total quantity by week, like so:

ScreenHunter_01 Feb. 25 20.11

(sample data shown)

But I needed it to appear as follows:

ScreenHunter_02 Feb. 27 11.15

The goal is to have total distinct job_name rows equal the QTY value for the corresponding Week of. So 1/3/2011 would have a total of 117 rows.

My result was the following module.

ScreenHunter_03 Feb. 27 11.18

First, I bring in the data with an Input tool. Then I used a Formula tool to create a string field as “jobname”. The next part is where it gets fun. There’s a Generate Rows tool in the Preparation section. I set it up to create a new field, RowCount, initialized at value 1. You can then set the Condition Expression, which for me was RowCount<=[QTY]. The loop iteration is simply RowCount +1 in order to increment rows by 1 each time.

ScreenHunter_04 Feb. 27 11.21

This created rows that match the QTY value for each Week of. Next came the a Record ID tool followed by another Formula tool so that I could create distinct job_names. This is simply a concatenation of the original jobname field with Record ID. I also added a Quantity formula of 1 for quick summation purposes.

ScreenHunter_05 Feb. 27 11.36

A final Select tool was used so I only outputted my desired columns. You can download the sample input, Alteryx module, and sample output files here.

Hope you enjoyed it!

Share Button

Tableau in the office – episode 2

Time for another Tableau in the Office post. A quick viz that I worked on was for a department head that wanted to see how much a service had declined in recent years. The decline is due to advances in technology, therefore the service becoming obsolete. I took a look at the data (sample data used here) and created a simple line chart by plotting Quarter (Continuous) with Quantity.

ScreenHunter_01 Jan. 31 15.22

This is pretty straightforward. But I wanted to show the total percent decrease for 2014 Q4 since 2013 Q1. This is where I employed a Table Calculation. Table calcs in Tableau are extremely powerful. They can range for very simple to exceptionally complex. I try to read as much as I can about table calcs and encourage others as well.

I created a calculated field that used the LOOKUP function. This is a great way to isolate exactly what is needed as you can specify the offset. The formula is written as LOOKUP(expression, offset). So LOOKUP(SUM(QTY), FIRST() + 3) would return the fourth value in my partition.

Since I only needed the first and last values, I didn’t require an offset. Here’s how my formula looks:

ScreenHunter_02 Jan. 31 15.33

In pseudocode: (QTY at 2014 Q4 – QTY at 2013 Q1)/ QTY at 2013 Q1

I formatted the value as a percentage and placed this calculated field in the Detail shelf so I can use it in the chart.

ScreenHunter_03 Jan. 31 16.02

Right click on the last mark, select Annotate > Mark

ScreenHunter_04 Jan. 31 16.05

Edit the annotation as desired.

ScreenHunter_05 Jan. 31 16.06

Here’s my final viz. Simple and effective in illustrating the decrease of a service over the last two years.

ScreenHunter_06 Jan. 31 16.08

Thanks for tuning in!

Share Button

One year in

One year ago, Tableau declared January as Data Blogging Month. As a fan of the data community, I was inspired by that declaration to join in and thus, Data Knight Rises was born with this first post.

So a year in and I must say, “Wow!” Time sure does fly when you’re being a data geek. I have really enjoyed being part of this community. This blog has allowed me to e-meet some great people out there, with hopes of getting to meet them in person as well. The data community further inspires me to do more. I want to say Thank You to everyone who have supported me, taught me, and participated with me during this past year. Some of my most satisfying moments have been a simple thanks comment left on a post. It’s great knowing that what I have produced has made someone’s day easier in their data needs.

These site stats might not blow you away, but I’m really proud of what I’ve been able to accomplish this past year.

ScreenHunter_01 Jan. 25 13.24

So what’s next? There’s so much I want to accomplish, which I believe is a very common sentiment among other data enthusiasts. In order to be more organized, I feel like I should layout some goals for 2015.

  1. Continue learning and sharing Tableau and Alteryx tips/tricks/wins that I come across.
  2. Dedicate time to Python. I really want to get more into this language as its potential is vast in my professional and personal life.
  3. Community involvement. There are a plethora of data related gatherings in the DC area that I’ve been meaning to check out. The ones I see are from Data Community DC. Let me know if you know of others.
  4. Have fun! I never want this to feel like a chore or homework.

Thank you again for being a part of my journey and may your 2015 be as fulfilling as you intend!

Share Button

Quick Alteryx Win

So I first wrote about Alteryx last year, with the following post, when they invited me to speak at a local joint event with Tableau. If you’re unfamiliar with Alteryx, I suggest hitting the Google because it can make your life very easy.

At my company, we use a lot of MS SQL Server and all that it bundles. This includes SQL Server Integration Services. SSIS is a powerful ETL tool that we leverage to move large amounts of data between systems. But every time I launch Visual Studio and try to make a simple task, I inherently receive error messages that I have no idea what they mean. Yes, SSIS can do a great deal, but it feels like unless you have months and months of training, it is not a friendly and intuitive tool.

My task was to update tables in my SQL datamart that were once stored on a Lotus Notes database but now are in SharePoint as lists. I don’t know why I even opened Visual Studio to attempt this. But needless to say, it quickly frustrated me and then I thought, “Hmm, wonder if Alteryx can connect to SharePoint lists.” Sure enough, it can! I can’t stress how easy it was for me to build a module that would download the data from the each list and push it to my SQL datamart.

So here’s what I did. First, go to the Connectors tools and drag SharePoint List Input tool to your canvas.

ScreenHunter_02 Jan. 15 16.04

Next, complete the Properties for the tool. The SharePoint URL is everything before the /Lists in your url. Input your User Name and Password. Then pick the List you want from the drop down.

ScreenHunter_04 Jan. 15 16.09

I recommend sticking a Browse tool to check you’re correctly downloading what you need.

I then used the Select tool to only capture desired data, rename columns and change their data types. The Select tool is fantastic. It let’s you really narrow down what you need and how you need it. Have you ever done a Transformation task in SSIS? What a pain in the butt.

ScreenHunter_06 Jan. 15 16.23

Finally, each list has an Output tool to my SQL datamart. I really enjoy the flexibility outputting to SQL provides. Notice how you can add Pre and Post Create SQL statements into your module to save you time elsewhere.

ScreenHunter_05 Jan. 15 16.19

I repeated the process above six times, one for each list. Here’s what my module looks like when it was done. Clean and easy!

Alteryx SAP

This was such a great quick win for me, I wanted to share with you all.

Share Button

Tableau in the office

I’m going to start sharing some things I do with Tableau in my professional life. I’ve done this once with my Time Duration post. Hopefully, “Tableau in the Office” becomes a recurring series and provides ideas/tips/tricks that you can employ in your daily Tableau routine.

This week I was asked by a colleague for some data on their jobs queue. As my division is the content supply chain for Discovery, incoming deliveries must be QC’d to ensure they meet our technical specifications. We call this process, Technical Evaluation or TE. Our media management system tracks these TE jobs with multiple date fields. We have job created date and job completed date. This is a high volume operation so new jobs are created every day as the team completes jobs in the queue.

To illustrate how I solved this, I’ve mocked up data in the following excel file. This file contains three columns: Created Date, Completed Date and Job Number. I randomly generated 1500 dates between January 1, 2014 and October 31, 2014 to populate the Created Date column. Then I populated the Completed Date column by randomly adding 1 to 15 days to the Created Date value.

I connected to my data source using the following custom SQL technique. For excel file, this would be:

SELECT
[Sheet1$].[created date] AS [date],
‘CREATED’ AS [SOURCE],
[Sheet1$].[job_num] AS [job_num]
FROM [Sheet1$]

UNION SELECT
[Sheet1$].[completed date] AS [date],
‘COMPLETED’ AS [SOURCE],
[Sheet1$].[job_num] AS [job_num]
FROM [Sheet1$]

The important thing to note here is that the first select statement labels the Created date as Date and the second labels Completed Date as Date. This way I have both date values in the same column when I bring the data into Tableau. Also, I added a SOURCE field to designate where the data is coming from.

Once in Tableau, I made a bar chart with Stack Marks turned off. Placed SOURCE in the color and size shelf. This is a great way to see how many jobs are being created versus completed for a given date range. I’m filtering out November in the view below.

job dates

This was my take on this problem. Let me know if you have done something similar in a different manner.

Share Button

Python – Here I Come

My tech journey has brought me to want to learn a programming language. I’ve picked up some SQL for my professional life and really enjoyed writing queries, digging into data and solving problems. I want to learn a language that I can leverage both at work and for personal use. I did some research and decided that Python would be a great fit for my goals. As a data geek, I’ve been seeing Python used for many things I’d like to be able to do. From web scrapping, data processing, statistical analysis to even things that weren’t on my radar like web development.

So where do I start? I looked at the familiar free tools available out there. I started with Codeacademy. but I didn’t really like the format. I tried using the forum but people were just posting the solution code so I wasn’t learning much. I switched over to Udacity and began Intro to Computer Science. This is a class that is taught in Python and the ultimate goal is to build your own search engine. This project sounded fun but again I struggled with the format. The self-paced aspect had me putting it off constantly. Life happens and it became an after thought. I need deadlines.

So now comes Coursera. I spoke to colleague at work who was about to finish the Programming for Everyone (Python) class. He enjoyed to the experience so I decided to check it out. Coursera classes differ in length. But they follow a weekly schedule with deadlines. I believe this will assist me in satisfying my goal. The #PR4E class is taught by a professor at the University of Michigan, Dr. Chuck. Once you start watching the class videos, his passion for teaching and educational technology is clearly evident…and he has the tattoos to prove it. Check out his TEDx talk on MOOCs here.

MOOC stands for Massive Open Online Course. I love how technology has enabled MOOCs to grow and continue to mature. I wish these were available when I was younger and had more free time. I look forward to continue learning in my life from MOOCs.

The Programming for Everyone class started on October 6th and it runs 10 weeks long. But there is still time to join so if you’re interested, click here.

– Future Pythonista

Share Button

Tableau UTC Conversion

In one of my earlier posts, I wrote about how to manually convert UTC time to Eastern time, adjusting for the Standard vs Daylight Savings. I’ve found this formula useful for some of SQL work but recently needed to incorporate it into Tableau.

So I decided to share the following calculated field. Simply cut and paste the formula below into your favorite text editor and Replace All the [date_field] with your date column name.

if date([date_field]) > #13-03-2011# and date([date_field]) < #06-11-2011# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #11-03-2012# and date([date_field]) < #04-11-2012# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #10-03-2013# and date([date_field]) < #03-11-2013# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #09-03-2014# and date([date_field]) < #02-11-2014# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #08-03-2015# and date([date_field]) < #01-11-2015# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #13-03-2016# and date([date_field]) < #06-11-2016# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #12-03-2017# and date([date_field]) < #05-11-2017# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #11-03-2018# and date([date_field]) < #04-11-2018# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #10-03-2019# and date([date_field]) < #03-11-2019# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #08-03-2020# and date([date_field]) < #01-11-2020# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #14-03-2021# and date([date_field]) < #07-11-2021# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #13-03-2022# and date([date_field]) < #06-11-2022# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #12-03-2023# and date([date_field]) < #05-11-2023# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #10-03-2024# and date([date_field]) < #03-11-2024# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #09-03-2025# and date([date_field]) < #02-11-2025# then DATEADD(‘hour’,-4,[date_field])
else DATEADD(‘hour’,-5,[date_field]) end

Please note, this converts UTC to US Eastern time zone. So if you’re in another time zone, be sure to adjust the DATEADD values. I hope this helps.

Share Button

Time Duration in Tableau

Time conversion seems to be pretty popular topic for Tableau. I recently worked on a project where the users wanted to display media duration in hh:mm:ss format. I looked around the Tableau forum and saw others have had this issue too. As long as the duration does not exceed 24 hours, this formula works just fine DATETIME(SUM([total dur secs]/84600)). And then use Tableau Date formatting to change to hh:mm:ss. But if duration will exceed 24 hours, this formula will not work. In my case, duration will exceed 24 hours. So here’s how I tackled this problem.

First, I needed to calculate duration in seconds before I could create the required format. Working in media, time codes go down to the frames level. So first I converted frames into aggregated seconds based on the asset frame rate, [total dur secs] =  SUM([PROGRAM_DURATION]/29.97)

Now that my [total dur secs] field is in total seconds we can begin. There are probably many ways to perform this but this was the approach I took.

First, I created a [total hours] calculated field.

IF INT([total dur secs]/60/60) >=1 THEN
STR(IIF(INT[total dur secs]/60/60)> 9, “”, “0”)) + STR(INT[total dur secs]/60/60))
ELSE “”
END

The first line tests if the duration is greater than or equal to one hour. If that is true, then the second line tests if the duration is greater than 9 hours. If that is true, then create a blank, if false, then pad with a zero. Since I’m concatenating, the duration is converted to strings. If the total duration is not greater than an hour, then blank.

Next, I created a [total minutes : seconds] calculated field.

IF INT([total dur secs]/60) >=60 THEN
STR(IIF((INT([total dur secs]/60)-INT([total hours])*60)> 9, “”, “0”)) + STR(INT([total dur secs]/60)-INT([total hours])*60)
+ “:” +
STR(IIF(ROUND([total dur secs]-INT([total dur secs]/60)*60)> 9, “”, “0”)) +
STR(ROUND([total dur secs]-INT([total dur secs]/60)*60))
ELSEIF INT([total dur secs]/60) >=1 THEN
STR(IIF(INT([total dur secs]/60)> 9, “”, “0”)) + STR(INT([total dur secs]/60))
+ “:” +
STR(IIF(ROUND([total dur secs]-INT([total dur secs]/60)*60)> 9, “”, “0”)) +
STR(ROUND([total dur secs]-INT([total dur secs]/60)*60))
ELSE “00:” + STR(IIF(ROUND([total dur secs])> 9, “”, “0”)) + STR(ROUND([total dur secs]))
END

Let’s break down what is happening in the calculated field. The following portion below checks if [total dur secs] is greater than or equal to 60 minutes. Because I want time formatted as hh:mm:ss and I don’t want the minutes portion to exceed 60. So I subtract out [total hours] as minutes. Again, using the IIF to pad the extra zero as needed.

IF INT([total dur secs]/60) >=60 THEN
STR(IIF((INT([total dur secs]/60)-INT([total hours])*60)> 9, “”, “0”)) + STR(INT([total dur secs]/60)-INT([total hours])*60)

The next section below is calculating the seconds portion of the duration.

STR(IIF(ROUND([total dur secs]-INT([total dur secs]/60)*60)> 9, “”, “0”)) +
STR(ROUND([total dur secs]-INT([total dur secs]/60)*60))

I have to round [total dur secs] because I don’t want decimals. Then similar to the minutes calculation, I want to subtract out minutes that exceed 60. The IIF again pads for zeros as necessary. 

The ELSEIF and ELSE basically repeat the same calculation, checking if minutes are greater or equal to 1 or zero respectively.

Lastly, I bring [total hours] and [total minutes : seconds] together in a new calculated field, [Total Duration].

[total hours]+ IIF([total hours]=””,[total minutes : seconds],”:”+ [total minutes : seconds])

So just as I’m about to finish this set of calculations, a new message appears in my inbox. I subscribe to Tableau Zen Master – Jonathan Drummey’s blog, and he just posted a much simpler version of time duration on, Drawing with Numbers. I guess I just prefer the scenic route. Check it out his work though, it is great.

How have you handled time duration formatting in Tableau?

 

Share Button