Big Data Simplified – Part 2

March 8, 2012 3:50 pm 0 comments
Big Data

Its Data and its growing Big

In Part 1 of this 3 part post we tried to understand if not answer the question: What is Big Data? In Part 2 of this post we shall try understand the Issues around Big Data or Data more broadly. We shall also try and understand how technology has evolved around Data, Databases and what this means in the Big Data context.

Issues with Data?

Need for faster access to information, storage for larger and larger volumes of information
and crunching it faster for better insights led to most advancements in the past few decades of computing.

e.g. Email and the Internet were innovated largely so researchers could communicate faster (need to increase the Velocity with which the information changed hands).

e.g. Data Warehousing and Storage Industries emerged and grew at a skyrocketing pace fueled by the need to store larger and larger volumes of information. Obviously velocity(demand) creates its own volume!(supply)and cheaper technology means more people can use it, your 3 year old uses an iPad. Velocity(demand for Cars) also creates its own Variety(demand for Luxury Cars).

e.g. Business Intelligence and Analytics industries emerged and grew significantly in their ability to crunch, analyze and produce insights. Increasing Volume and Variety of Information created the Opportunity to use all this Data to gain better insights.

Volume, Velocity and Variety of Data can therefore be summed up as the key issues to consider when looking at data in general.

It should be no surprise that these same issues were first noted by Industry Analyst Doug Laney over a decade ago.(5)

Let us look Volume, Velocity and Variety of Data in 2 different contexts.

a) Evolution of Technology to address the (Volume, Velocity and Variety)
b) Impact and Importance of business (due to the Volume, Velocity and Variety)

Evolution of Technology around Data

The chart below summarized the key issues with Big Data and its evolution and how Velocity, Variety and Volume of data have impacted the evolution of Big Data.

Evolution of Big Data Technology

Evolution of Big Data Technology

The 1960s to 1990s

Data was stored using Punch Cards along with the Program that would process it. However, this was not fast enough. There was a demand by people using computing resources for an increased Velocity of Data (speed with which we wanted answers from computers).

This led to the advent of software Programs as opposed to Data.

(Recall: fragmentation of layers for those who have attended my lecture at NYU last month we discussed this under the topic Innovation Curve)

Further, increase in the demand for the Velocity led to the evolution of Databases as a means to store Data because Programs at at that time could not process this data fast enough.

The 1990s to 2010s

The demand for faster Data continued to grow and so did the types of applications and programs processing this data.

Adoption of Data (systems) by Businesses led to further increased demand for multiple types
of Data and Data processing Applications.

This vortex grew spirally fuelled by advent of Email, EDI and Supply Chain systems that integrated businesses and further fuelled by the growth of the Internet.

While, the Variety of Data increased signficantly in this period we could easily sum up all variety into 2 main buckets structured and unstructured data:

a) Sturctured Data:

A set of Data that has a repeating pattern thereby making it easier for a program to sort, read and process it, much faster than if this data contained no specific repeating patterns.

All Relational Data, Streaming or Time Series data would fall in this category.

While, the nature of Database technology and structures may vary immensely between normal Relational and Time Series or Streaming Data the inherent commonality is that the Data has an underlying pattern which is easily recognizable and hence any processing or analysis of such data can be done more easily.

b) Unstructured Data:

A set of data that may or may not have a repeating pattern but is an aggregate or a complex structure.

Such data usually contains a lot of meta data (data about data) that is stored together with the data.

This data is stored usually as files e.g. Text, Audio, Video or Image files. Such data therefore requires unique Applications or Programs to be viewed and processed accurately without any loss of information.

e.g. Opening a Microsoft Word File which is neatly formatted with bullets and indents will lose all its richness when opened in a Notepad or other Text Editor that does not support bullets and indents.

The 2010s to 2030s

(We are sort of, peering through the looking glass here…)

In the previous 20 years the systems matured in their ability to handle Variety (various data types). This led to further demand for storing and processing more and more of this data. The concept to understand here is that Variety (supply) creates its own Volume (demand).

E.g. You were asked by your boss to create a proposal and submit for an RFP deadline at 7 a.m. and your Microsoft Powerpoint crashed at at 10 p.m. Now you will use a different Editor (likely Google Presentation or a Open Office Presentation tool and create a Presentation in a format that is not .ppt). Hence Variety (the fact that other applications exist) create its own volume (new files in various formats).

Clearly, the growth of social media in the first 3 years of this decade is a great indication of how Variety(number of social media networks) is creating its own Volume(number of tweets, likes, facebook posts, etc).

Hence, it would be logical to conclude that:
while the Velocity and Variety challenges continue to grow the Volume challenge continues to grow much faster than Velocity and Variety.

The above conclusion can be further substantiated.

On Velocity: We already have High Performance In Stream Processing Systems (Wall Street has been using such solutions for a while for Stock Market trading called High Frequency Trading).

On Variety: The Variety challgenges have been largely solved by the advent of Non-SQL databases that can handle any kind of structured data(streaming or relational) stored as Key Value pairs but also
Unstructured data (files that are automatically fragmented and recontructed on demand).

On Volume: The one challenge that Big Data solution providers need to continue to address more than anything else is the Volume challenge.

Volume of data continues to grow at an exponential rate. The challenge of handling volume to store(storage) or compute(analysis) remains open to all.

Growth of Data Traffic

Growth of Data Traffic

For those in doubt they can Read the detailed Google FCC filing here.

Above said, it can still be argued that the issue about Volume of Data always existed. One could always implement a Cluter or a Grid of Data to tackle the volume problem. There is no doubt that the underlying technologies required to solve these problems exist today.

However, as Dr. Ramasami the Director of Department and Science and Technology, India would say. It is all about Innovation i.e. making the Useable (Technology) into something Useful (cost efficient and economic ways of dealing with Big Data).

This brings us to the key questions
1) What does Big Data mean for business?
2) Do the returns from the use and processing of Big Data justify the returns?


Coming Soon Part 3 of this 3 Part Post which will help us understand the key question around Big Data. What does Big Data mean for Business and why should be care about Big Data?

Other Posts in this Series:
Big Data Simplified – Part 1

Leave a Reply

Other News

  • Big Data Cloud Computing Innovation Cloud Standards

    Cloud Standards

    The Cutter IT Journal August Issue carries an paper written on CARMA. The paper is a sneak preview of the content you can expect in the Cloud Standards book when it is out this winter. You can download a Free Copy of the Issue. If you missed our earlier conversations on Cloud and CARMA. I am in the process of writing my second book Cloud Standards, which is due later this year as part of the Cloud series, The Paradigm […]

    Read more →
  • Uncategorized Clearing the Cloud by Air India

    Clearing the Cloud by Air India

    This is to interested CEOs and CIOs in India who visit this Blog, you can now read about the Cloud when flying at 36,000 feet in Air India. In the past few months I have been talking to various CIOs and CEOs about the current state of the Cloud. What is most interesting to note is that despite so much buzz around the Cloud there is still very little clarity as to what the Cloud is all about and what […]

    Read more →
  • Big Data Cloud Computing Big Data Simplified – Part 3

    Big Data Simplified – Part 3

    In the 1st Part of this 3 Part Post we tried to understand the question : What is Big Data? Then in part 2 of this post we tried to understand the Issues with Data and the Evolution of Technology around Data. Now in this 3rd and final part of the Post we try to understand the Importance of Big Data and why Businesses should care about Big Data. Impact and Importance of Big Data on Business There are many […]

    Read more →
  • Big Data Cloud Computing Big Data Simplified – Part 2

    Big Data Simplified – Part 2

    In Part 1 of this 3 part post we tried to understand if not answer the question: What is Big Data? In Part 2 of this post we shall try understand the Issues around Big Data or Data more broadly. We shall also try and understand how technology has evolved around Data, Databases and what this means in the Big Data context. Issues with Data? Need for faster access to information, storage for larger and larger volumes of information and […]

    Read more →
  • Big Data Cloud Computing Big Data Simplified – Part 1

    Big Data Simplified – Part 1

    So, if you read the book (Cloud Basics – The Paradigm Shift) I had said it was really mean’t to be a Blog Post or a Series of Posts that ended up becoming a book. As, I write my first post on Big Data I can clearly see the need for a Book or maybe even a series of Books here that can help clear up the mist on Big Data. More over there is a need to clarify what […]

    Read more →
  • Cloud Computing Types of Private Clouds

    Types of Private Clouds

    Private Clouds are in the News. AT&T announced yesterday(Monday, 13th Feb 2012) its Virtual Private Cloud offering. It is interesting to see terminology evolve in the Cloud world. Am sure we have heard the terms Public Cloud, Private Cloud, Hybrid Cloud but maybe not Virtual Private Cloud. Telcos are big proponents of the Cloud. It is often argued that the Telcos stand to be the biggest gainers from the increased Internet Traffic and hence their desire to create new terms. […]

    Read more →
  • Business Finance Management How to get valued like Facebook, $100 Bn?

    How to get valued like Facebook, $100 Bn?

    Is Facebook worth $100 Billion? was a post made by the Wall Street Journal a few months ago which carried an interesting perspective from Geoff Yang, Founding Partner with Redpoint Ventures, Menlo Park. Geoff gave an interesting perspective on how Facebook could be valued based on its potential market share of the Ad market and the projected growth of the market by 2015. You can read the full post here. (If you know how to get past the Pay Wall […]

    Read more →
  • Health & Wellness Life Uncategorized Wisdom Do Blackholes exist on Earth?

    Do Blackholes exist on Earth?

    Blackholes exist right here on earth. What is a Blackhole?: Light goes in and is never reflected back to the source or sender. It is believed that this happens because Blackholes have a high amount of gravity and hence pull in anything that is thrown towards them. At least, so says the Theory of General Relativity. Then there is another theory called Quantum Mechanics that says Black holes are black bodies(no kidding) hence they must emit heat radiation like any […]

    Read more →

Please log in to vote

You need to log in to vote. If you already had an account, you may log in here

Alternatively, if you do not have an account yet you can create one here.