Alteryx in the office – episode 2

I had a task the other week where I needed to load Excel data into a datamart. The problem was that the Excel data was not formatted like my destination table in the datamart. So I fired up Alteryx and started playing around.

The Excel sheet was structured with total quantity by week, like so:

ScreenHunter_01 Feb. 25 20.11

(sample data shown)

But I needed it to appear as follows:

ScreenHunter_02 Feb. 27 11.15

The goal is to have total distinct job_name rows equal the QTY value for the corresponding Week of. So 1/3/2011 would have a total of 117 rows.

My result was the following module.

ScreenHunter_03 Feb. 27 11.18

First, I bring in the data with an Input tool. Then I used a Formula tool to create a string field as “jobname”. The next part is where it gets fun. There’s a Generate Rows tool in the Preparation section. I set it up to create a new field, RowCount, initialized at value 1. You can then set the Condition Expression, which for me was RowCount<=[QTY]. The loop iteration is simply RowCount +1 in order to increment rows by 1 each time.

ScreenHunter_04 Feb. 27 11.21

This created rows that match the QTY value for each Week of. Next came the a Record ID tool followed by another Formula tool so that I could create distinct job_names. This is simply a concatenation of the original jobname field with Record ID. I also added a Quantity formula of 1 for quick summation purposes.

ScreenHunter_05 Feb. 27 11.36

A final Select tool was used so I only outputted my desired columns. You can download the sample input, Alteryx module, and sample output files here.

Hope you enjoyed it!

Share Button

Tableau in the office – episode 2

Time for another Tableau in the Office post. A quick viz that I worked on was for a department head that wanted to see how much a service had declined in recent years. The decline is due to advances in technology, therefore the service becoming obsolete. I took a look at the data (sample data used here) and created a simple line chart by plotting Quarter (Continuous) with Quantity.

ScreenHunter_01 Jan. 31 15.22

This is pretty straightforward. But I wanted to show the total percent decrease for 2014 Q4 since 2013 Q1. This is where I employed a Table Calculation. Table calcs in Tableau are extremely powerful. They can range for very simple to exceptionally complex. I try to read as much as I can about table calcs and encourage others as well.

I created a calculated field that used the LOOKUP function. This is a great way to isolate exactly what is needed as you can specify the offset. The formula is written as LOOKUP(expression, offset). So LOOKUP(SUM(QTY), FIRST() + 3) would return the fourth value in my partition.

Since I only needed the first and last values, I didn’t require an offset. Here’s how my formula looks:

ScreenHunter_02 Jan. 31 15.33

In pseudocode: (QTY at 2014 Q4 – QTY at 2013 Q1)/ QTY at 2013 Q1

I formatted the value as a percentage and placed this calculated field in the Detail shelf so I can use it in the chart.

ScreenHunter_03 Jan. 31 16.02

Right click on the last mark, select Annotate > Mark

ScreenHunter_04 Jan. 31 16.05

Edit the annotation as desired.

ScreenHunter_05 Jan. 31 16.06

Here’s my final viz. Simple and effective in illustrating the decrease of a service over the last two years.

ScreenHunter_06 Jan. 31 16.08

Thanks for tuning in!

Share Button

One year in

One year ago, Tableau declared January as Data Blogging Month. As a fan of the data community, I was inspired by that declaration to join in and thus, Data Knight Rises was born with this first post.

So a year in and I must say, “Wow!” Time sure does fly when you’re being a data geek. I have really enjoyed being part of this community. This blog has allowed me to e-meet some great people out there, with hopes of getting to meet them in person as well. The data community further inspires me to do more. I want to say Thank You to everyone who have supported me, taught me, and participated with me during this past year. Some of my most satisfying moments have been a simple thanks comment left on a post. It’s great knowing that what I have produced has made someone’s day easier in their data needs.

These site stats might not blow you away, but I’m really proud of what I’ve been able to accomplish this past year.

ScreenHunter_01 Jan. 25 13.24

So what’s next? There’s so much I want to accomplish, which I believe is a very common sentiment among other data enthusiasts. In order to be more organized, I feel like I should layout some goals for 2015.

  1. Continue learning and sharing Tableau and Alteryx tips/tricks/wins that I come across.
  2. Dedicate time to Python. I really want to get more into this language as its potential is vast in my professional and personal life.
  3. Community involvement. There are a plethora of data related gatherings in the DC area that I’ve been meaning to check out. The ones I see are from Data Community DC. Let me know if you know of others.
  4. Have fun! I never want this to feel like a chore or homework.

Thank you again for being a part of my journey and may your 2015 be as fulfilling as you intend!

Share Button

Quick Alteryx Win

So I first wrote about Alteryx last year, with the following post, when they invited me to speak at a local joint event with Tableau. If you’re unfamiliar with Alteryx, I suggest hitting the Google because it can make your life very easy.

At my company, we use a lot of MS SQL Server and all that it bundles. This includes SQL Server Integration Services. SSIS is a powerful ETL tool that we leverage to move large amounts of data between systems. But every time I launch Visual Studio and try to make a simple task, I inherently receive error messages that I have no idea what they mean. Yes, SSIS can do a great deal, but it feels like unless you have months and months of training, it is not a friendly and intuitive tool.

My task was to update tables in my SQL datamart that were once stored on a Lotus Notes database but now are in SharePoint as lists. I don’t know why I even opened Visual Studio to attempt this. But needless to say, it quickly frustrated me and then I thought, “Hmm, wonder if Alteryx can connect to SharePoint lists.” Sure enough, it can! I can’t stress how easy it was for me to build a module that would download the data from the each list and push it to my SQL datamart.

So here’s what I did. First, go to the Connectors tools and drag SharePoint List Input tool to your canvas.

ScreenHunter_02 Jan. 15 16.04

Next, complete the Properties for the tool. The SharePoint URL is everything before the /Lists in your url. Input your User Name and Password. Then pick the List you want from the drop down.

ScreenHunter_04 Jan. 15 16.09

I recommend sticking a Browse tool to check you’re correctly downloading what you need.

I then used the Select tool to only capture desired data, rename columns and change their data types. The Select tool is fantastic. It let’s you really narrow down what you need and how you need it. Have you ever done a Transformation task in SSIS? What a pain in the butt.

ScreenHunter_06 Jan. 15 16.23

Finally, each list has an Output tool to my SQL datamart. I really enjoy the flexibility outputting to SQL provides. Notice how you can add Pre and Post Create SQL statements into your module to save you time elsewhere.

ScreenHunter_05 Jan. 15 16.19

I repeated the process above six times, one for each list. Here’s what my module looks like when it was done. Clean and easy!

Alteryx SAP

This was such a great quick win for me, I wanted to share with you all.

Share Button

Tableau in the office

I’m going to start sharing some things I do with Tableau in my professional life. I’ve done this once with my Time Duration post. Hopefully, “Tableau in the Office” becomes a recurring series and provides ideas/tips/tricks that you can employ in your daily Tableau routine.

This week I was asked by a colleague for some data on their jobs queue. As my division is the content supply chain for Discovery, incoming deliveries must be QC’d to ensure they meet our technical specifications. We call this process, Technical Evaluation or TE. Our media management system tracks these TE jobs with multiple date fields. We have job created date and job completed date. This is a high volume operation so new jobs are created every day as the team completes jobs in the queue.

To illustrate how I solved this, I’ve mocked up data in the following excel file. This file contains three columns: Created Date, Completed Date and Job Number. I randomly generated 1500 dates between January 1, 2014 and October 31, 2014 to populate the Created Date column. Then I populated the Completed Date column by randomly adding 1 to 15 days to the Created Date value.

I connected to my data source using the following custom SQL technique. For excel file, this would be:

SELECT
[Sheet1$].[created date] AS [date],
‘CREATED’ AS [SOURCE],
[Sheet1$].[job_num] AS [job_num]
FROM [Sheet1$]

UNION SELECT
[Sheet1$].[completed date] AS [date],
‘COMPLETED’ AS [SOURCE],
[Sheet1$].[job_num] AS [job_num]
FROM [Sheet1$]

The important thing to note here is that the first select statement labels the Created date as Date and the second labels Completed Date as Date. This way I have both date values in the same column when I bring the data into Tableau. Also, I added a SOURCE field to designate where the data is coming from.

Once in Tableau, I made a bar chart with Stack Marks turned off. Placed SOURCE in the color and size shelf. This is a great way to see how many jobs are being created versus completed for a given date range. I’m filtering out November in the view below.

job dates

This was my take on this problem. Let me know if you have done something similar in a different manner.

Share Button

Python – Here I Come

My tech journey has brought me to want to learn a programming language. I’ve picked up some SQL for my professional life and really enjoyed writing queries, digging into data and solving problems. I want to learn a language that I can leverage both at work and for personal use. I did some research and decided that Python would be a great fit for my goals. As a data geek, I’ve been seeing Python used for many things I’d like to be able to do. From web scrapping, data processing, statistical analysis to even things that weren’t on my radar like web development.

So where do I start? I looked at the familiar free tools available out there. I started with Codeacademy. but I didn’t really like the format. I tried using the forum but people were just posting the solution code so I wasn’t learning much. I switched over to Udacity and began Intro to Computer Science. This is a class that is taught in Python and the ultimate goal is to build your own search engine. This project sounded fun but again I struggled with the format. The self-paced aspect had me putting it off constantly. Life happens and it became an after thought. I need deadlines.

So now comes Coursera. I spoke to colleague at work who was about to finish the Programming for Everyone (Python) class. He enjoyed to the experience so I decided to check it out. Coursera classes differ in length. But they follow a weekly schedule with deadlines. I believe this will assist me in satisfying my goal. The #PR4E class is taught by a professor at the University of Michigan, Dr. Chuck. Once you start watching the class videos, his passion for teaching and educational technology is clearly evident…and he has the tattoos to prove it. Check out his TEDx talk on MOOCs here.

MOOC stands for Massive Open Online Course. I love how technology has enabled MOOCs to grow and continue to mature. I wish these were available when I was younger and had more free time. I look forward to continue learning in my life from MOOCs.

The Programming for Everyone class started on October 6th and it runs 10 weeks long. But there is still time to join so if you’re interested, click here.

– Future Pythonista

Share Button

Tableau UTC Conversion

In one of my earlier posts, I wrote about how to manually convert UTC time to Eastern time, adjusting for the Standard vs Daylight Savings. I’ve found this formula useful for some of SQL work but recently needed to incorporate it into Tableau.

So I decided to share the following calculated field. Simply cut and paste the formula below into your favorite text editor and Replace All the [date_field] with your date column name.

if date([date_field]) > #13-03-2011# and date([date_field]) < #06-11-2011# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #11-03-2012# and date([date_field]) < #04-11-2012# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #10-03-2013# and date([date_field]) < #03-11-2013# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #09-03-2014# and date([date_field]) < #02-11-2014# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #08-03-2015# and date([date_field]) < #01-11-2015# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #13-03-2016# and date([date_field]) < #06-11-2016# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #12-03-2017# and date([date_field]) < #05-11-2017# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #11-03-2018# and date([date_field]) < #04-11-2018# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #10-03-2019# and date([date_field]) < #03-11-2019# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #08-03-2020# and date([date_field]) < #01-11-2020# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #14-03-2021# and date([date_field]) < #07-11-2021# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #13-03-2022# and date([date_field]) < #06-11-2022# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #12-03-2023# and date([date_field]) < #05-11-2023# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #10-03-2024# and date([date_field]) < #03-11-2024# then DATEADD(‘hour’,-4,[date_field])
elseif date([date_field]) > #09-03-2025# and date([date_field]) < #02-11-2025# then DATEADD(‘hour’,-4,[date_field])
else DATEADD(‘hour’,-5,[date_field]) end

Please note, this converts UTC to US Eastern time zone. So if you’re in another time zone, be sure to adjust the DATEADD values. I hope this helps.

Share Button

Time Duration in Tableau

Time conversion seems to be pretty popular topic for Tableau. I recently worked on a project where the users wanted to display media duration in hh:mm:ss format. I looked around the Tableau forum and saw others have had this issue too. As long as the duration does not exceed 24 hours, this formula works just fine DATETIME(SUM([total dur secs]/84600)). And then use Tableau Date formatting to change to hh:mm:ss. But if duration will exceed 24 hours, this formula will not work. In my case, duration will exceed 24 hours. So here’s how I tackled this problem.

First, I needed to calculate duration in seconds before I could create the required format. Working in media, time codes go down to the frames level. So first I converted frames into aggregated seconds based on the asset frame rate, [total dur secs] =  SUM([PROGRAM_DURATION]/29.97)

Now that my [total dur secs] field is in total seconds we can begin. There are probably many ways to perform this but this was the approach I took.

First, I created a [total hours] calculated field.

IF INT([total dur secs]/60/60) >=1 THEN
STR(IIF(INT[total dur secs]/60/60)> 9, “”, “0”)) + STR(INT[total dur secs]/60/60))
ELSE “”
END

The first line tests if the duration is greater than or equal to one hour. If that is true, then the second line tests if the duration is greater than 9 hours. If that is true, then create a blank, if false, then pad with a zero. Since I’m concatenating, the duration is converted to strings. If the total duration is not greater than an hour, then blank.

Next, I created a [total minutes : seconds] calculated field.

IF INT([total dur secs]/60) >=60 THEN
STR(IIF((INT([total dur secs]/60)-INT([total hours])*60)> 9, “”, “0”)) + STR(INT([total dur secs]/60)-INT([total hours])*60)
+ “:” +
STR(IIF(ROUND([total dur secs]-INT([total dur secs]/60)*60)> 9, “”, “0”)) +
STR(ROUND([total dur secs]-INT([total dur secs]/60)*60))
ELSEIF INT([total dur secs]/60) >=1 THEN
STR(IIF(INT([total dur secs]/60)> 9, “”, “0”)) + STR(INT([total dur secs]/60))
+ “:” +
STR(IIF(ROUND([total dur secs]-INT([total dur secs]/60)*60)> 9, “”, “0”)) +
STR(ROUND([total dur secs]-INT([total dur secs]/60)*60))
ELSE “00:” + STR(IIF(ROUND([total dur secs])> 9, “”, “0”)) + STR(ROUND([total dur secs]))
END

Let’s break down what is happening in the calculated field. The following portion below checks if [total dur secs] is greater than or equal to 60 minutes. Because I want time formatted as hh:mm:ss and I don’t want the minutes portion to exceed 60. So I subtract out [total hours] as minutes. Again, using the IIF to pad the extra zero as needed.

IF INT([total dur secs]/60) >=60 THEN
STR(IIF((INT([total dur secs]/60)-INT([total hours])*60)> 9, “”, “0”)) + STR(INT([total dur secs]/60)-INT([total hours])*60)

The next section below is calculating the seconds portion of the duration.

STR(IIF(ROUND([total dur secs]-INT([total dur secs]/60)*60)> 9, “”, “0”)) +
STR(ROUND([total dur secs]-INT([total dur secs]/60)*60))

I have to round [total dur secs] because I don’t want decimals. Then similar to the minutes calculation, I want to subtract out minutes that exceed 60. The IIF again pads for zeros as necessary. 

The ELSEIF and ELSE basically repeat the same calculation, checking if minutes are greater or equal to 1 or zero respectively.

Lastly, I bring [total hours] and [total minutes : seconds] together in a new calculated field, [Total Duration].

[total hours]+ IIF([total hours]=””,[total minutes : seconds],”:”+ [total minutes : seconds])

So just as I’m about to finish this set of calculations, a new message appears in my inbox. I subscribe to Tableau Zen Master – Jonathan Drummey’s blog, and he just posted a much simpler version of time duration on, Drawing with Numbers. I guess I just prefer the scenic route. Check it out his work though, it is great.

How have you handled time duration formatting in Tableau?

 

Share Button

Beer!

Beer is awesome! What else is new, right? I’ve been thinking about doing a beer post and started looking around the internet for data. If you’re a fan of beer, you’ve noticed the rise in new brewers over the last decade. In fact, Denizens Brewing Co. just opened across the street from my office. I’ve heard rumors our engineering team is looking at ways to boost our wifi signal, allegedly of course. The explosion of craft beer has been great for beer nerds like myself. I love trying a new beer with friends and I keep finding new favorites. I prefer the IPAs but will try just about anything.

Off to the internet I went and I found data from the Beer Institute. The site has a downloadable excel with a wealth of beer information. The bar chart below shows active brewer permits by state and year. Use the slide filter to see the growth in permits issued.

Looking at the growth a little closer you can see YoY for the US as a whole and for each state. My home state of Maryland needs to step it up, while 40% is great, I know we can do better. :) Some states on the other hand are growing fast since 2004. Keep it up!

But the growth isn’t necessarily a great thing if you’re trying to break in the market. I’ve talked to the brewer of a local restaurant in Bethesda, MD and he says we’re going to start seeing many brewers close up shop. Many individuals are entering this space and not all will survive. But in the meantime, I’ll sample their beer.

Clearly California has the most active brewer permits, but it is the most populous state. How does 2012 brewer permits look when population is factored in? I found some data on adult population and blended it with the brewer data.  Note that adult here means 18 years of age or older, not 21, the legal drinking age in the US. I won’t go into it here much but I will say that I believe the drinking age should be 18, but the driving age should be 21.

Vermont has the least adults per active brewer permit and if you live in Mississippi, I’m sorry. I hope your state is importing other craft brews to help you out.

So now that I got you thinking about beer, you’re probably about to stop reading and head for the fridge. Wait!!! In order to help you make a selection, I scraped the web for prize winning beers. Using import.io, a fantastic web scraping tool, I downloaded a table of the Great American Beer Festival winners from 2013.

Use the filter to select the Category to view the Gold, Silver, and Bronze winners. I’m going to have to use this chart next time I’m on a beer run.

Now if you’ll excuse me, I just opened one of my favorites, Loose Cannon from Heavy Seas out of Baltimore. Have a great day!

loose cannon

 

 

Share Button

Tableau Server Tip

Tip time again. And this one focuses on Tableau Server. One of my favorite features of Tableau Server is the ability to publish data sources. I work for a global company and we constantly share data with teams in international offices. This data can range from SQL Server/Oracle data to Excel files stored on a multitude of share drives. Publishing data sources allows our global teams to all access data needed for a given project or report, without needing access to where the underlying data is stored. This has been advantage and allowed collaboration between regions, all while working with a single version of the truth.

Ok, now getting to the tip. Let’s say you’re given a workbook that already contains charts, tables, dashboards, and of course, calculated fields. The workbook is connected to a published data source on your server. You try to edit a calculated field and the only option is Edit Copy, which makes a new calculated field.

ScreenHunter_02 Jul. 24 16.45

This is not ideal because the calculated field you’d like to edit is already embedded in several locations of the workbook. So how do you edit the original field?

Easy, simply right click on the data source name and select Create Local Copy.

ScreenHunter_03 Jul. 24 16.48

Save the data source to your desired folder. You’ll now have a new data source.

ScreenHunter_04 Jul. 24 16.50

Next, swap data sources so the exists workbook sources from the local copy.

Select Data > Replace Data Source…

ScreenHunter_01 Jul. 24 19.59

Now when you right click to edit a calculated field, you’ll be able to edit the original field that is already being used throughout the workbook.

ScreenHunter_05 Jul. 24 16.51

From here, you can proceed with the needed task. Just remember that if you republish the data source, your calculated field edits will be available for others. So depending on your ultimate goal, know if you should overwrite the original published data source or not.

Thanks! And happy Tableau’ing!

Share Button