I’m not a philosopher, but when three of humanity's greatest thinkers all say the same thing, it’s probably important.
“The best is the enemy of the good.” - Voltaire
“Better a diamond with a flaw than a pebble without.” - Confucius
“Striving to better, oft we mar what’s well.” - Shakespeare
The core thought from this illustrious group is that striving for perfection at the expense of done (or “good enough”), is often not worth the effort.
In the world of analytics however, there is enormous pressure to release a dashboard or report only when we are absolutely certain that it is perfect.
It’s very common to hear,
“We are deploying our <<insert cloud data warehouse>> at the moment, the data is very messy. We need to get our data quality and governance 100% before we can even think about exposing the data to the business.”
I was chatting with a colleague recently about this statement and we were both surprised by how widespread this approach is. It prompted me to look at the Google search trends over time for the term “Data Governance.”
You can see in the chart above, that interest in the term “Data Governance” has been increasing steadily from around 2007 and starts to really pick up around 2015.
Out of interest, I did a number of searches for various terms and found searches for the term “Cloud data warehouse” had a strikingly similar trend. It got me thinking, is this somehow related to marketing messaging from Cloud Data Warehouse vendors?
In business, data is valuable when it can be used to drive positive business outcomes i.e. reduce customer churn, optimise cost or increase sales and so on. To put it another way, data is useful only when it is used. A cloud data warehouse is an expense not an asset, unless the data within it is being used.
So when is the right time to start using the data in the warehouse and sharing reports and dashboards with your business users to assist in their decision making?
The answer is, it depends on what the data is being used for. For example, if we are using data in a safety critical system, then data quality needs to be as close to perfect as possible. If on the other hand, we are looking at reducing customer churn, by identifying at risk customers and implementing an intervention, then data that isn’t perfect may be acceptable.
Let me explain that a bit further. Let’s say we know that when we call a customer and are able to identify a saving for them, they are likely to remain a customer for the next year. We have a limited number of call-centre staff who can make these calls, so we want to use them with the most at-risk customers. If we make our customer data available to our call-centre team now, even with some data quality issues, they will be able to identify some or maybe most of our at-risk customers. They might end up calling some customers that were not actually at risk, and they might miss some customers that are at-risk. However, it is likely that the majority of their calls will be to customers that are at risk. Each call they make reduces churn and increases profitability. In this type of situation, I’d argue that getting the data in the hands of the call-centre staff sooner, is more important than focussing on the data quality.
“The best time to plant a tree is 20 years ago, the second best time is now” - Chinese Proverb
Each day that you delay has a cumulative impact on the potential financial savings as shown below.
With each decision there is of course compromise. In the above example, we are compromising quality over speed. Finding the optimum balance requires communication between the data team and the business to identify:
What is the most valuable data in the warehouse?
Can this data be used, even if the quality is not perfect or potentially unknown?
Is the business okay with this data changing over time as the quality is improved?
How does the business want to consume this data?
Getting data in the hands of the user early, and using feedback to drive continuous improvement, is a proven approach to focusing investment and optimising business outcomes. Data quality and data governance are important, however we must ensure that we strive for data quality that is “fit for purpose” rather than perfection. This requires effective two way communication between the analytics teams and the business. As well as an understanding that in some cases data is not perfect, it will change over time and that is okay. The challenge is to find a balance between the use of imperfect data that can deliver a better business outcome today or perfect data that would be available at some point in the future.