Take a minute to increase awareness and understanding of the terminology, context, and application of data and analytics.
Part three of a series.
A common phrase tossed around is “big data.” But what does it mean?
If you Google the term, “big data” is defined as “extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.”
Some will describe big data as something so large or different that most traditional computers or applications cannot effectively deal with it. Big data is also often paired with analytics success.
If we consider the immense growth of processing power through the early 2000s, big data must be pretty impressive. And, it is. So, what makes big data different?
Let’s Explore Big Data
Big data is typically differentiated from “data” by four Vs: Volume, Velocity, Variety, and Veracity. (Note that there is some dispute over the number of Vs. I say four, some say three and leave out Veracity. Others say four is right, but include Value rather than Veracity. Others say there are seven to 10 Vs – way too many Vs for me.)
Let’s use the four main Vs to explore big data:
#1 – Volume:
If you think about the pure quantity of data, big data is an insane amount of data.
We are regularly discussing data that is being stored in petabytes. There are a number of estimates that indicate business transactions will be measured in zettabytes by 2020.
In terms of storage, a byte of data is the smallest unit and generally represents a single character of text. A Megabyte is 1024 bytes. A Gigabyte is 1024 megabytes. The pattern continues through Terabytes, Petabytes, Exabytes, and Zettabytes. (I have a great story describing petabytes in terms of the stars in the Milky Way galaxy. Just ask me, and I would be happy to share!)
#2 – Velocity:
This qualifier defines the rate data comes at us.
If we think about this in terms of big data, not only is there a lot of it, but it gets here quickly and expects equally fast processing.
For example, Walmart handles more than one million customer transactions every hour. Facebook estimates that they have 900 million photos uploaded every day. We helped one of our client’s process 75 million records in 30 minutes every day.
#3 – Variety:
If we think about computer processing prior to 10 years ago, we were almost exclusively talking about data that could be structured and interrogated in a database.
This actually makes up a mere 20% of data. With big data, we include unstructured or multi-structured data that doesn’t fit in the defined fields of a database. This includes videos, audio, web stream data, images, and much more.
Big data capabilities give you the ability to process that data to detect patterns, trends, and more. As examples, think about facial recognition or voice-activated systems. (Did you know that many chatbots are actually automated?)
#4 – Veracity:
This refers to the accuracy or truthfulness of the data.
There is a lot of data coming in, but how much is reliable? Do you have the level of detail needed to draw conclusions and patterns? Are the quality and accuracy fit for purpose so that you could reliably predict a trend or a specific outcome?
I also see this as a “value” factor. There is some data that is more valuable than other data due to its granularity or quality.
From the very definition, you can probably understand why organizations are overwhelmed with data,and the possibilities of analyzing “big data.”
To get the most value, we need to first understand what we are analyzing and how to access it.
Big data is different and the way we want to store and analyze it is also different. It will impact the underlying infrastructure, the tools, the skills and the overall approach to how insight is driven into the business or by the business.