Big Data Simplified – Part 1

February 20, 2012 2:33 am 1 comment
Big Data

Its Data and its growing Big

So, if you read the book (Cloud Basics – The Paradigm Shift) I had said it was really mean’t to be a Blog Post or a Series of Posts that ended up becoming a book.

As, I write my first post on Big Data I can clearly see the need for a Book or maybe even a series of Books here that can help clear up the mist on Big Data.

More over there is a need to clarify what Big Data means in relation to the Cloud.

We start today on a 3 part post to help simplify the concept of Big Data. This first part tries to answer the question: What is Big Data?

Well, June 2011 McKinsey came up with a 156 page report called “Big data: The next frontier for innovation, competition, and productivity“.

If you have the time you can download and read that report and spend sometime cutting through the web or else just continue to read this post for the crunching has already been done!

What is Big Data?

People say
a) Big data is data that exceeds the processing capacity of conventional database systems.(1)
b) Large Organizations need to maintain large amounts of Structured and Unstructured Data.(2)
c) Real Time Data Storage, Correlation & Analysis (3)
d) Virtual Scale Out for Data Storage & Analytics (4)

All of the above may be somewhat true or not so true depending on who is asking the question on Big Data.

Dimensions of Big Data

Dimensions of Big Data

Big Data is data that is ‘Too Big’

Yes, the quantum of data is constantly increasing however Too big is a relative term. It can mean anything from a Terabyte (1 TB = 1000 GB) to a Petabyte (1 PB = 1000 TB) to a Exabyte (1EB = 1000 PB). Per a McKinsey study in the year 2009 an average Enterprise had over 100 TBs of data and some Enterprises had Petabytes of Data.

We cannot ignore the exponential growth of Data and rise in revenues of Storage firms from 2009 to 2012 (today). Hence, clearly the average enterprise has over 100 TBs of data and that’s big enough compared to an average user’s 500 GB Hard Drive. One would need 200 Hard Drives to store all that data!

Data Storage Statistics

Data Storage Statistics : McKinsey Study 2009

At the high end of the spectrum, say a large securities or banking firm on Wall Street cloud easily have 4000 TB of Data. One would need 8000 Hard Drives of 500 GB each to store that data. Now that sure is Big Data!

Big Data is about Unstructured Data

Yes, databases evolved over the past few decades mainly geared to handling Structured (Relational) data. However, we must also recognize that Streaming Data (e.g. videos) and Time Series data (e.g. stock tickers) and other forms of non-relational data have existed, at least in the past decade.

This streaming or time series data can primarily be thought of as unstructured data. However, this data was ‘effectively stored’ in some database or storage and that it did carry some structure. A few examples of efficient storage and structure are:

1. Streaming Caches used for Caching Real Time Market Data so an Algorithmic Trade can be made.

2. A CDN(Content Delivery Network – set of streaming servers) that store video files and can stream efficiently a single video file on demand to millions of users.

3. Time Series Databases that store Tick Data (Stock Last Traded Price data that you see at the bottom of your TV News Channel) which is used for Back Testing of Trading Strategies by Hedge Funds.

Big Data is about Real-Time

Yes, traditional databases and data warehouse models meant data had to be stored for subsequent processing and analysis. The Extract, Transform, Load process which is how most Data Warehouses are primed with data is almost always a Batch(as opposed to Real-Time) process. Insights were often made on what could be thought off as historical data.

Getting more real time insight really meant going away from the traditional database model. Business demands in the past decade for Real Time insights lead to ‘in stream’ processing database products. These ‘in stream’ databases have been in use on Wall Street for over a decade now.

Therefore, a Real Time Database or Real Time Data Analysis is not something new. Furthermore, these systems exist on Wall Street, so we can assume that the volumes were also ‘Big’.

However, with the explosive growth of the Internet and Social Media the quantum of Data that needs to be processed in Real Time has increased significantly.

Tomorrow’s Enterprise will demand for insight not just from the all data within your enterprise but also from the growing Social ecosystem created around Enterprises by Social Destinations like Twitter, Facebook, LinkedIn etc.

Hence, it may be fair to say that the Real Time systems of today do not have the capacity to handle the Big Data Real Time demands of the Social Enterprise of tomorrow.

Big Data is about Virtual Scale-Out

Yes, the traditional in-house model meant using individual databases that could be scaled by using bigger and bigger hardware or implementing a Database Cluster.

However, with the Cloud you no longer need to incur the required Capex for buying hardware or incur significantly high license costs if you choose an Open Standards based Cloud.

This known mantra of the Cloud and is logically extended to Big Data. Given that Database/Storage form key layers within the Cloud.

It is important to note here how the Cloud is changing the concept of Database and Storage as being separate things.

Over time we may see just one layer in the Computing stack called the Data Layer as the Database and Storage layers collapse into one.

This cloud Data layer may very well handle all Structured and Unstructured Data, have real time processing capabilities and might at some point in the future be called the Big Data layer.

Big Data Cloud

Big Data Cloud

The definition of Big Data can be best understood if we grasp the key issues behind Data i.e. how computer programs Process and Store data and how this problem has been solved technologically over the past few decades.

(We will keep elements like data Retrieval, data I/O, data Moving aside for now and handle them in a more appropriate technical post which outlines mainly the evolution of (Big) Data technology)

We will discuss Issues with Data & The Evolution of Technology around Data in Part 2 of this Post next week.

Until then look forward to your comments and you can think ‘Big’ Data!

Coming Soon in this series: Big Data Simplified Part 3 of 3.

Other Posts in this Series:
Big Data Simplified – Part 2

Leave a Reply

Other News

  • Big Data Cloud Computing Innovation Cloud Standards

    Cloud Standards

    The Cutter IT Journal August Issue carries an paper written on CARMA. The paper is a sneak preview of the content you can expect in the Cloud Standards book when it is out this winter. You can download a Free Copy of the Issue. If you missed our earlier conversations on Cloud and CARMA. I am in the process of writing my second book Cloud Standards, which is due later this year as part of the Cloud series, The Paradigm […]

    Read more →
  • Uncategorized Clearing the Cloud by Air India

    Clearing the Cloud by Air India

    This is to interested CEOs and CIOs in India who visit this Blog, you can now read about the Cloud when flying at 36,000 feet in Air India. In the past few months I have been talking to various CIOs and CEOs about the current state of the Cloud. What is most interesting to note is that despite so much buzz around the Cloud there is still very little clarity as to what the Cloud is all about and what […]

    Read more →
  • Big Data Cloud Computing Big Data Simplified – Part 3

    Big Data Simplified – Part 3

    In the 1st Part of this 3 Part Post we tried to understand the question : What is Big Data? Then in part 2 of this post we tried to understand the Issues with Data and the Evolution of Technology around Data. Now in this 3rd and final part of the Post we try to understand the Importance of Big Data and why Businesses should care about Big Data. Impact and Importance of Big Data on Business There are many […]

    Read more →
  • Big Data Cloud Computing Big Data Simplified – Part 2

    Big Data Simplified – Part 2

    In Part 1 of this 3 part post we tried to understand if not answer the question: What is Big Data? In Part 2 of this post we shall try understand the Issues around Big Data or Data more broadly. We shall also try and understand how technology has evolved around Data, Databases and what this means in the Big Data context. Issues with Data? Need for faster access to information, storage for larger and larger volumes of information and […]

    Read more →
  • Big Data Cloud Computing Big Data Simplified – Part 1

    Big Data Simplified – Part 1

    So, if you read the book (Cloud Basics – The Paradigm Shift) I had said it was really mean’t to be a Blog Post or a Series of Posts that ended up becoming a book. As, I write my first post on Big Data I can clearly see the need for a Book or maybe even a series of Books here that can help clear up the mist on Big Data. More over there is a need to clarify what […]

    Read more →
  • Cloud Computing Types of Private Clouds

    Types of Private Clouds

    Private Clouds are in the News. AT&T announced yesterday(Monday, 13th Feb 2012) its Virtual Private Cloud offering. It is interesting to see terminology evolve in the Cloud world. Am sure we have heard the terms Public Cloud, Private Cloud, Hybrid Cloud but maybe not Virtual Private Cloud. Telcos are big proponents of the Cloud. It is often argued that the Telcos stand to be the biggest gainers from the increased Internet Traffic and hence their desire to create new terms. […]

    Read more →
  • Business Finance Management How to get valued like Facebook, $100 Bn?

    How to get valued like Facebook, $100 Bn?

    Is Facebook worth $100 Billion? was a post made by the Wall Street Journal a few months ago which carried an interesting perspective from Geoff Yang, Founding Partner with Redpoint Ventures, Menlo Park. Geoff gave an interesting perspective on how Facebook could be valued based on its potential market share of the Ad market and the projected growth of the market by 2015. You can read the full post here. (If you know how to get past the Pay Wall […]

    Read more →
  • Health & Wellness Life Uncategorized Wisdom Do Blackholes exist on Earth?

    Do Blackholes exist on Earth?

    Blackholes exist right here on earth. What is a Blackhole?: Light goes in and is never reflected back to the source or sender. It is believed that this happens because Blackholes have a high amount of gravity and hence pull in anything that is thrown towards them. At least, so says the Theory of General Relativity. Then there is another theory called Quantum Mechanics that says Black holes are black bodies(no kidding) hence they must emit heat radiation like any […]

    Read more →

Please log in to vote

You need to log in to vote. If you already had an account, you may log in here

Alternatively, if you do not have an account yet you can create one here.