We All Need a Little Validation
Daniël Hussem, a Forbes Council Member and Troparé Inc. Director of Marketing & Product, penned a January 2022 article that really got Team Standard Co’s gears turning. Titled “Four Data Onboarding Trends to Enhance the Customer Experience in 2022”, the article’s intent is to underscore the importance of proper data onboarding; namely, data validation. As true data nerds, we naturally respect anyone touting the significance of good, clean data. However, we see the points Hussem outlines surrounding data validation, data differencing, machine learning, and converting ETL to EVTL as phenomenal housekeeping tactics, though oftentimes these steps needn’t be leveraged at all if proper data collection strategies are employed well before data onboarding occurs.
Here at Standard Co, we strive to enhance our client’s relationship with data. To do this, we strongly believe in good data standardization from pre-planning to training to collection to analysis, thereby reducing or eliminating the data validation process altogether. Enter: the Data Experience.
The Data Experience
Standard Data is a powerful toolkit capable of making all your data dreams come true – however, before we hand over the keys to any of our partners, before any data is collected or uploaded, before we even draft an agreement with said partner, we engage them with some fundamental questions to ensure we’re all on the same page for a successful data adventure.
Establishing Real Data Objectives
Working with organizations in over 60 countries worldwide, we often encounter the same answer when asking what the leadership hopes to accomplish with data: “we want to measure success”. Furthermore, many organizations focus on quantity of data over true data quality, concerning themselves first and foremost with capturing as much data as possible and worrying about synthesizing said data at some other point in time. Pro tip: don’t do this. Establishing goals and objectives for data in advance of a project kickoff is paramount – have a plan for the data and capture what you need to execute against those objectives, period.
“Building a comprehensive Data Experience is much like building a home. You know you need 3 bedrooms, 2 baths, and a functional kitchen – that’s great. But we want you to also consider the neighborhood: traffic patterns, school zones, crime statistics, access to amenities, commute times, etc. before you start pouring the foundation.” says TJ Muehleman, Standard Co’s CEO and Founder.
Let’s look at a real-world example. One of our favorite clients, The Kula Project, works with budding entrepreneurs in Rwanda with a mission of helping them grow their businesses sustainably. In measuring the organization’s success, the clearest metric available in defining the success of their programs is the overall rise in net income per entrepreneur, however we found that this data point historically was not captured. Thus, because there was no plan for measuring the organization’s success with data, the captured data did not paint a clear picture that if entrepreneurs were making more money, The Kula Project’s programs were working.
Tracking Progress with Good Data
Less is most definitely more, particularly in the aforementioned case surrounding The Kula Project. We sought to answer a few simple questions that directly addressed increases in net income. However, since income isn’t doubled or tripled overnight (unfortunately…), we also recognized that we’d need to survey a set cohort of entrepreneurs at different points in time over the course of the project’s lifespan. Meaning, we needed longitudinal data on a subset of entrepreneurs. And, as many a data nerd knows, it is quite common to corrupt data in longer term studies if you don’t have a clear process from the start. Changing up questions or surveying different people won’t yield usable trend data; rather, one will have a bunch of incongruent data points that even a phenomenal system of data cleaning and/or data validation cannot solve.
“A useable data system is a simple data system. If you come up with 75 things you want to measure, you’re concocting a recipe for disaster. Aim to answer a few key questions, gather the relevant data to answer those questions, and, if needed, add more data points later.” remarks Muehleman.
Collecting & Storing Data
The mechanics of collecting data and managing the results (data validation, removing outliers, fixing data, and data accessibility) is undoubtedly an important part of the Data Experience, but one must first ask oneself: what data does the organization already have, and what needs to be collected.
In the case of needing to collect more data, pre-planning is imperative. We work with clients in some of the most remote locales around the globe, so determining the appropriate data collection tools is a big part of how we craft the Data Experience on behalf of our clients, (ie: commodity smartphone with long battery life vs. a laptop connected to reliable internet sources). Further, we determine who is responsible for collecting the data (ie: highly trained surveyors leveraging complex forms vs. self-assessment tools that are clean and intuitive).
Data Standardization & Data Cleaning
To Hussem’s point, data validation is monumentally important in reducing errors – but data validation needn’t occur once data is collected. We work with our clients to ensure fields such as age are appropriately bound by a realistic lifespan (ie: no person lives to be 2000 years old). Oftentimes, we find that removing or restricting optionality from pure user input yields better data.
We employ machine learning much like Hessum outlines: crawling data to find and eliminate outliers, duplicates, missing data points, etc. Again, though, this process can occur simultaneously with data collection, not at the completion of a survey, to ensure that issues are properly identified and removed earlier in the process. By working hand-in-hand with our clients to learn what data isn’t useful to them, we’re able to eliminate sorting through an enormous amount of data points that are irrelevant to the project’s predefined goals by stopping a problem before it becomes a bigger problem via machine learning.
When data nerds say “80% of data science is actually data cleaning”, they’re not wrong. Most of what data scientists do is ensuring collected data is functional and prepared for use with a data analytics engine like Metabase or Tableau. Our toolkit, Standard Data, automates much of the data standardization process on behalf of our clients and triggers us, the data experts, when more manual efforts are required.
Data Accessibility
We live in the cloud era, meaning data can and should be accessible anywhere, anytime (unless, of course, your data has PII or other sensitivities, in which case, let’s talk…). Features like downloading to CSV or Excel, direct connect to an analytics tool, and JSON API, which allows systems to communicate logically with one another, are all essential in this day and age. Data isn’t useful if it cannot be easily accessed whenever and wherever stakeholders may be.
Data Visualization
Data visualization is the “art” of data presentation and arguably the most important aspect of a better Data Experience. Not everyone is a data genius, so we place a strong emphasis on ensuring our clients understand who will be the consumer of the data. Defining the audience is key in establishing how the data will be consumed or processed and, in our experience, simple visualizations unlock powerful comprehension, regardless of the data’s complexity. Similar to pre-planning, utilizing data to underscore a few key observations is best in measuring success, but ultimately, one must present this data in a format that is palatable to the intended audience target (ie: leadership typically likes simple charts whereas a data analyst will want to see the a full CSV of the raw data).
We find that decision-makers often operate best with a clear “before” and “after” picture, with a quick background on what has occurred in juxtaposition to trends one can expect in the future. Presentations are best for helping folks formulate a point-of-view on a certain topic, thus factual, compelling charts help to clearly articulate the good and the bad. However, the biggest issue we see with our clients is an amalgamation of charts, graphs, and tables showcasing too much data minutia, (we’re data nerds, after all), and not enough substantive, high-level bullets. Most non-data-lovers just need the top line bullets to make a decision – the answers, if you will, without “showing one’s work” in arriving at that answer.
Crafting a Better Data Experience for Your Organization
Essentially, we want organizations to begin thinking about Data Experience as an equal to Customer Experience and User Experience. By repositioning the importance of data in an organization’s prioritization structure, we’ll work towards a world in which data serves as a highly effective throughline in all organizational decision-making, whereas in many cases today, data is either too cumbersome or too much of an afterthought to be truly useful to a broad swath of employees. Much of this can be attributed to poor data onboarding procedures, as Hussem accurately reports, but much of this can also be avoided with a strong Data Experience plan.
Hussem’s main point, “You can realize a smooth, efficient and streamlined data onboarding process in conjunction with adamant validation components so that you can spend more time utilizing your data rather than spending valuable time and effort trying to figure out how to fix it.” This couldn’t be more true. One step further, before collecting any data whatsoever, define a clear plan for said data. This, friends, is the essence of the Data Experience.