Authored by Dr. Jan van Niekerk
In Computer Science, garbage in, garbage out (GIGO), is the concept that flawed inputs will yield flawed outputs, or ‘garbage’. This principle applies to all analysis and logic, in that arguments are unsound if their premises are flawed. In data analytics, it is a mammoth challenge.
A famous example of garbage data from history is the Mars Planet Orbiter, launched by NASA in December 1998. The mission was to learn more about Mars, its climate, atmosphere, and surface conditions – but one piece of bad data caused the probe to fire its thrusters incorrectly. The problem was that one piece of software supplied by the manufacturer calculated the force the thrusters needed to exert in pounds of force, but a second piece of software, supplied by NASA, took in the data assuming it was in the metric unit, newtons. This resulted in the craft dipping 170km closer to the planet than expected – causing a $327.6 million mission to burn up in space.
Building trust in your data is interwoven with how you source your data (for more information, please see our article Why a centralised analytics function, rather than distributed, makes sense). Garbage data often arises when businesses either don’t have the capacity or have not set up specific processes to acquire and clean data before it is analysed. Naturally, one would like to remove the human element from data acquisition and cleaning, enabling efficiencies through automation.
Yet how could this be achieved? Firstly, a sound Master Data Management (MDM) strategy is needed. An MDM is the core process used to manage, centralise, organise, categorise, localise, synchronise, and enrich data. It is also a key enabler for providing a single, trustworthy view of critical business information. Trusted data sources help reduce the costs of application integration, improve customer experience, and yield actionable analytic insight.
Secondly, human beings have a tendency to make mistakes and take short-cuts which could dirty your data. With automation and an MDM strategy in place, clean, trustworthy data will be one step closer.
Much like the previous article in this series that explores the ‘Sit. Crawl. Walk. Run.’ Principle
which drives data and analytical maturity, ensuring that clean data is produced as an incremental process. Building organisational trust in data, however, is not a quick task; money, time and effort is needed to ensure that insights received are clean.
Step one is having a data strategy. One of Stephen Covey’s habits in his book, “Seven Habits of Highly Effective People,” is beginning with the end in mind. When you are creating a data strategy, know why you want those analytics. As important is having a data champion within the organisation who knows where all the data silos live. In a perfect world, this data champion, or the head of the data team should be sitting at Exco or tactical level within an organisation, building a rapport with business unit stakeholders and presenting how data can have a high impact on the business, fulfilling the role of an organisational enabler.
In a data world, however, perfect data does not exist. Data is not created equal; it is incomplete and inconsistent at best, with many little intricacies, thus it is always important that the analysts are very clear as to potential caveats within their findings. People can be lazy too, especially when it comes to data capturing. Could there be a possibility within your oganisation to automate data acquisition, cleaning and transfer?
If you fuel your business engines with bad data, then, much like the Mars Planet Orbiter, it will not be long until you crash and burn. Be an enabler for data analytics in your organisation. Develop a strategy and execute it; don’t use humans to clean and capture; and Introduce automation to capture data efficiently. It is imperative an organisation collectively realises that data can be used as a strategic asset. And the starting point is clean data. Without that, it is garbage in, garbage out…