Big data introduction
Instructor’s edit: On the origins of “Big Data”
So it turns out that my association of the term “Big Data” with Doug Laney is not entirely accurate. As I was being corrected on this point, I thought it was worthwhile doing some digging. Here’s an article by the New York Times that dives deep into the origins of the term and gives credit where it’s due.
It seems Doug himself credits John Mashey as being the original person to coin the term. As the article points out, Mr Mashey is not particularly bothered in any case:
I was using one label for a range of issues, and I wanted the simplest, shortest phrase to convey that the boundaries of computing keep advancing.John Mashey
This doesn’t discredit Doug from coining the 3 V’s that we talk about in the video. These have since been expanded on (almost to death) by people desperate to add more simplicity but seemingly making the world increasingly complex.
Rather than re-record the video and airbrush out my own error, I thought it would be an interesting aside for anyone keen to go further down this particular rabbit-hole.
Data, master data, metadata, now BIG data?
What’s all the fuss about big data? Well, big data really is a term that came about in the early 2000s. I believe it was coined (edit – popularised) by Doug Laney, who started talking about the explosive growth in the amount of data that we were able to capture once people started to carry around smart devices and things that connect us to the Internet.
So what was it Doug was coining (edit – thinking) when he came up with the idea of big data? Well, he gave it the 3 “V”s as a definition, starting with volume.
Historically, you would have had large transactional data sets. These days, however, everybody’s carrying around smart devices. There are various different sensors and things like that. Social media interactions allow organisations to compile and pull in much larger volumes of data than they did in the past. They can also store that at scale relatively cheaply with cloud storage and data links.
Then there’s the second “V”, the velocity of data. Velocity looks at the speed with which data comes in and the timeliness of it. Think about that in the context of transactional data. Just looking at the data you can see what transactions are being driven across your organisation and can we detect anything in real-time to perhaps issue coupons or price codes to our shoppers at checkout?
Lastly, let’s look at the variety of data. Previously, we discussed structured data. In the next lesson, we’ll talk about unstructured data. These are things like emails and videos. But there’s a whole wealth of different data types that can now be pulled in and you can run analysis and comparisons across these different data sets, hence “variety” of data.
To sum it up..
If it helps, Cognopia’s logo happens to be a nice Venn diagram. Our slogan happens to be “Complex data, simplified” too. – A happy accident.
Origins of big data: