by Samuel Ward-Riggs, Principal Consultant – Altis London
If there’s one concept in Data & Analytics that has frustrated and divided experts and businesses alike, but may still outlive us all, it’s the notion of A Single Version of the Truth (SVOT), defined simply as: one single place of access which can supply all relevant data.
Some argue that accepting the existence of SVOT leaves no room for alternative answers; others believe the notion has become hackneyed or misunderstood and may even be too sensitive to use in some organisations at all.
To investigate, let’s explore the reason SVOT originated in the first place, then apply the necessary ingredients to make it a useful addition to an organisations’ Data & Analytics toolkit.
Many versions of the truth
Sally is the (fictitious) CEO of the (fictitious) Norwich Semiconductor Manufacturing Co. (NSMC), a specialised manufacturing business. The industry is highly competitive, and to grow revenues and improve profit margins Sally needs access to timely, accurate data. She has asked her management team to report on NSMC’s customer base, and at the next board meeting she is presented with three reports.
The first report is from the Finance Director. It shows that NSMC has received payments from 300 customers this financial year. Sally inspects the supporting data carefully before announcing, “300 customers is correct.”
There is impassioned disagreement from the GM of Operations, who came directly from the regional fulfilment centre to inform Sally that NSMC has, in fact, shipped products to 450 customers since January. Sally considers the data’s lineage, checking for accuracy, and then corrects herself, “Actually, it appears we have 450 customers.”
A murmur from one end of the boardroom draws Sally’s attention. The Director of Sales insists that the board considers the real data (from the CRM, of course!) which shows that 2100 customers have engaged with NSMC this year, be it through website inquiries, quote requests, or convention attendances. The customer names, phone numbers, and email addresses are all present – NSMC does indeed have 2100 customers.
All three of the management team’s reports are in conflict, yet all could be considered correct. What the data is missing, and what Sally needs, is context.
How context affects data
Without context, the jumble of numbers, letters, and symbols which we call data could never be understood. Consider these two simple equations:
Both are correct… in context. In fact, for every billion times these equations are evaluated, equation (2) is correct for almost all of them.
Not convinced? Here’s an alternative view:
Plot twist! Context shows that the terms of the equation use different number systems. Equation (3) specifies the base 10 number system we are taught in schools when learning arithmetic; we count from one to nine before carrying over one place, from 9 to 10, from 99 to 100, and so on. Equation (4) specifies a base two (binary) number system as used by computers. Since binary only has two digits (0 and 1), after we exceed 1 we must carry over immediately.
Some important lessons have emerged already, directly applicable to data:
- Context is often assumed rather than specified or defined: it’s very easy for humans to only care about (1) and for computers to only care about (2), each assuming their answer is manifestly correct.
- Context resolves conflict: while (1) and (2) can be valid individually, only (3) and (4) are valid together, as they have a specified context which alleviates any conflict.
The Glossary, Catalogue, and Dictionary
Back in the NSMC boardroom, Sally realised it wasn’t possible for her team’s reports to be true at the same time and in the same place. For that to happen, she needed to apply context. Without changing the data, applying context to NSMC’s boardroom reports would allow a SVOT to unfold. To help, Sally reached out to Altis. Pamela Biggs , Altis consultant, data boffin, and all-around wunderkind, engaged with NSMC stakeholders to create a Business Glossary, defining terms for day-to-day activities in simple, common language. Here’s a snippet of the results:
The Data Glossary helped Sally to disambiguate her management team’s assertion that each definition of customer was correct. The hard part was understanding what the term Customer means to NSMC as a whole rather than to any one stakeholder group. Here’s how the Finance, Operations, and Sales teams’ definition of Customer may have differed from that of NSMC:
With the Business Glossary established, it was possible to create more advanced, data-related guidance for NSMC’s analysts: a Data Catalogue is an index of the datasets available within NSMC’s data platform and allows stakeholders to quickly find existing data relating to their analytics needs; while a Data Dictionary provides technical definitions of datasets and their attributes, allowing for a detailed view of data lineage and transformation rules.
Upon seeing the agreed-upon definitions in the Business Glossary, and the readily available support of a Data Dictionary and searchable Data Catalogue, Sally exclaimed, “Finally, a Single Version of the Truth!”
I hope that you won’t shy away from the ambition to have consistent, data-driven answers to critical questions within your organisation. And to help others understand how this can be achieved, I hope you won’t shy away from fundamentally important data management concepts like a Single Version of the Truth either. Just remember, to avoid ruffling feathers or confusing stakeholders, context is king!