Exploring Data Exploration with Tableau

Introduction

Data exploration is a vital step of the data mining process that allows the analyst to gain a deeper understanding of the dataset being mined. Data exploration can reveal data quality issues that need to be addressed, such as missing values or poorly formatted data, to ensure that insights generated from the dataset are usable (Ponniah, 2010). While it is part of the overall data mining process, sometimes data exploration can provide useful business insights and actionable intelligence, making it a vital skill for an analyst.

Tableau is a robust data visualization application that allows data from multiple sources to easily be combined and utilized in interactive reports, charts, maps, dashboards, and other objects. This application allows the data analyst to efficient clean and format data, along with offering a large variety of tools to explore the dataset for business insights. The follow is an excerpt from a larger project regarding data exploration, discussing using Tableau to explore data and gain insight into a business issue.

  • At the end of this essay is a link to the Tableau workbook containing the tables and figures used to complete this project.

Tableau Data Exploration Overview

As part of a larger project into data exploration, Tableau was utilized to quickly explore a sample dataset to show how the software can quickly help an analyst dive into data, find interesting information to drill into, and come away with useful insights. The following essay utilized the Superstore sample dataset provided with Tableau, and all tables and figures were generated within Tableau. Covering both a tabular report and multiple charts, this essay highlights the robust functionality that makes Tableau a useful data analytics tool.

Exploration of the Superstore Dataset

Table 1 - Superstore product category profits for 2018 through 2021, broken down by sub-category.

Understanding what is happening within an organization is vital to the success of that organization’s strategic planning efforts. Software like Microsoft Excel and Tableau can allow an analyst to quickly aggregate data into information while also providing tools and pathways to drill deeper into a specific data set. The Superstore sample dataset represents a fictional office supplies retailer operating within the United States of America, which will be referenced to as Superstore. Shown in Table 1 is a tabular report of Superstore’s profits for the years 2018-2021, which are broken down by product category and subcategory. This report gives the analyst a “bird’s eye view” of the base performance of the various product sub-categories, allowing the analyst to see a generalized view of how these various sub-categories have performed. This report can act as a beginning point for the analyst to determine which areas may deserve a deeper look.

Digging into the Data

Figure 1 - Bar charts showing annual table profits/loss for the years 2018 through 2021, broken down by region.

An important piece of information when developing a strategic plan for an organization is to identify areas that are underperforming. From there, the analyst can drill down into information to potentially determine the causes of the poor or negative performance. Examining Table 1 show that the Tables sub-category of Furniture is by far the worst performing product category within the entire organization. In the four-year period analyzed, the Furniture product category made a profit of $18, 451. During that same period, the loses from the Table sub-category equaled $17,725, or approximately 96% of the overall Furniture profits for that time. Another alarming trend is that the profit loses from Table sales more than doubled from 2020 to 2021. This poor performance of the Tables sub-category should raise red flags and warrant further investigation.

Drilling down into the table data, one area to look at is breaking down the profits/loses by region. This can help determine whether the loses are coming from across the organization or if they are localized to a certain area of operation. Figure 1 shows that while every region Superstore operates in has seen loses, the East region has posted the most loses. In fact, only the West region has an overall positive record for Table profits, while every other region has lost more than they have made on Table sales. This indicates that there may be a more wide-spread issue within Superstore that must be addressed with their strategic plan to ensure that loses do not continue to grow. Determining the causes of the loses would be a vital piece of information for Superstore.

Finding Potential Business Insights

Figure 2 - Scatterplot comparing Table sales and sales discounts. Color and annotation indicates a profit/loss, and shape indicates the region the order was from.

Drilling down even further into the Table sales data, Figure 2 shows a scatterplot of all Table orders based on the sale amount and the discount given. The shape of each mark represents the region the sale occurred in, while the shade (and annotation) indicates the profits from the order as a loss (red) or a gain (green). When looking at profitable sales, most had a discount of 20% or less, and many of the profitable sales were made in the West Region, which agrees with what was discovered in Figure 1. Looking for loses, they begin to pick up at a 30% discount for sales over $500 and continue to grow as the discount goes over 40%. Additionally, the East region routinely gives discounts at 30-50%, equaling large loses for these orders. Importantly, they are providing these large discounts to small orders as well as large orders, potentially exacerbating the loses.

Considering the information gathered, it seems that excessive discounting is a potential cause for the product sub-category of Tables to lose large amounts of profits. Taken together, Table 1 along with Figures 1 and 2, can be used by decision makers at Superstore in their strategic planning to determine how to handle the loses from Tables. From the information gathered, setting discount guidelines for the Tables category could help limit loses for the company. Ensuring that a company has sustainable pricing policies is vital to the long-term viability of any organization.

Conclusion

While technologies such as machine learning and artificial intelligence provide powerful analytical functionalities, it should not be forgotten that efficient and effective data exploration can reveal actionable business intelligences before a model is every trained. It is important to remember that data exploration can potentially answer the business questions that are being asked in a data mining project, and so the exploration of the target dataset should not be skipped (Shmueli, Bruce, & Patel, 2016). Developing a firm understanding of the data being analyzed allows the analyst to better understand their task at hand.

Below is a link to the Tableau Workbook containing the tables and figures from this project sample:

References

Ponniah, P. (2010). Dara Warehousing Fundamentals for IT Professionals (2nd ed.). Hoboken, NJ: Wiley.

Shmueli, G., Bruce, P., & Patel, N. (2016). Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner (3rd ed.). Hoboken, NJ: Wiley.