In our many travels, conferences, speaking engagements, and other interactions with customers, technology vendors, press and others, we seem to often hear the same refrain: “Data is the new oil” as if that’s supposed to mean something profound. The first time we heard this expression (over a decade ago, we should add), it was an interesting point to make about how “important” and “strategic” data is. But every time we’ve heard it since, we’ve grown tired of that expression and even more so surprised that people are comparing a dwindling, dirty resource we’re increasingly running away from to a resource increasing almost infinitely that we can’t get enough of. So, why is this expression used, and can we honestly, finally, please kill it?
STOP SAYING DATA IS THE NEW OIL. PLEASE
The expression Data is the New Oil originates from as early as 2006 when Clive Humby from Tesco in the UK said that “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” Since then, this perhaps throw-away analogy was latched onto by increasingly more people with louder bullhorns, including Ginny Rometty, IBM CEO, and Peter Sondegaard, SVP of Gartner.
The primary objective of the analogy is from a few perspectives. First, like oil, data has strategic value in that who owns it controls a lot of resources. Second, the idea is that data by itself has limited value much like oil must be refined to capture value. Third, the idea is that oil has commodity value that can be exchanged for valuable goods and money in much the same way that data can be exchanged, traded, and dealt with as a commodity with inherent value. Those points are good ones. But you can make the same points about any natural resource, including solar energy.
But the value of the analogy stops there. Oil is a natural resource that decreases in quantity and availability increasingly over time — this is why it’s called a fossil fuel. It’s made of dead things. Oil is dirty, requires huge amounts of effort to extract from the earth, refine, and transport around the globe. It’s stored and sits idle, often for long periods of time. And once it’s used, it’s gone. Oil wealth is concentrated in a few nations with spotty political and social behaviors, and is the subject of wars and international disputes. We’re trying to free ourselves of the oil economy.
Data on the other hand, is for all intents and purposes limitless in its volume, quantity, and availability. It grows even when you don’t want it to grow. Go to sleep with 1 Terabyte of data and wake up with 2 Terabytes of data. Data is easy to generate and cheap to transport. Data can be reused, repurposed, and new insights can continue to be gleaned from old data. In the information economy, data is the byproduct of our advancement. Data is both the inhale and the exhale of our technology organism. Like oil, data can be dirty, but unlike oil, it can be cleansed with more data.
The Real Untapped Resource: Unstructured Data
The only positive aspect of the analogy we can leverage is the idea that a raw resource should be refined to extract more value. Defined simply, structured data is information stored in data stores that maintain some schema structure of the data, with defined types and often relationships between data. Unstructured data is often without any structure context such as images, video, emails, documents, text files, and many other sources of data.
By all accounts, most of the data that companies collect is unstructured data. For most organizations, over 80% of their data is unstructured. For some organizations, it can be closer to 90% of their total data. Some call this unstructured data “dark data” because much of the value remains to be extracted. The true value therefore lies in extracting not only the value of structured information, but also that of unstructured data. Like the oil barons of yesteryear, the data barons of the information economy are those who are doing the best job of data extraction and refinement – the Googles, Facebooks, Amazons, Metas, and their ilk.
Artificial intelligence and machine learning are very data hungry. Huge amounts of data are needed to train AI models. The challenge comes from turning big data assets into valuable machine learning training models. Just like the concentration of wealth in the oil economy was with a few oligarchs and state-run organizations, so too are we starting to see concentration of big data-powered AI with a few large companies that have amassed tremendous quantities of big data.
But unlike oil, big data-powered AI is available to any organization that can create a strategy for collecting quantities of unstructured and structured data suitable for machine learning and appropriately refining that data to extract increasingly more value from it over time. Doing this right is part of what will power companies to the next evolution in their digital transformation strategies.
Many organizations are becoming “AI First”, but to be AI first, you need to be “data first”. Regardless of where you stand in this data-centric world, you should skip the backwards-looking (and very 20th century) perspective that data is the new oil, because it’s not. Data is an almost limitless resource, and it’s up to you to extract the value and promise of what it can become. Data is potential, and it’s up to you to realize its full potential, especially in the context of AI.