Getting started with bigdata

Big Data example

Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy.

A general example of big data:

Data collected by social networking site facebook. Facebook collects hundreds of terabytes(TB) of data every day. Data collected may be images, videos, posts, updates, etc. The data varies from structured to unstructured. A like, share or reaction maybe structured data as we clearly know the structure of it. Whereas updates or posts are unstructured data which don’t exactly follow a structure. All this data together forms BigData!

What is Big Data?

Big Data, in its most basic form, can be described as the umbrella term metricized by different aspects of data. These different aspects are

Volume(Huge quantity of Data), Velocity(Greater dataflow speeds), Variety(Structured, Unstructured and Semi-structured Data) and Veracity(Making right decisions based on data).

These metrics were hard to be taken care of by old age relational databases. A need for a new system arose and Big Data processing came to the rescue. While many people have different understanding on what Big Data is, here are few of the definitions of Big Data given by industry leaders in Data sector:


  • “Big data exceeds the reach of commonly used hardware environments and software tools to capture, manage, and process it with in a tolerable elapsed time for its user population.” (Teradata Magazine article, 2011)
  • “Big data refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.” (The McKinsey Global Institute, 2012)
  • “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools.” (Wikipedia, 2014)
  • “Big Data are high-­‐volume,high-­‐velocity,and/or high-­‐variety information assets that require new forms of processing to enable enhanced decision making,insight recovery and process optimization” (Gartner,2012)

When data become “Big”?

enter image description here

                     IOPS:Input/Output Operations Per Second

What Comes Under Big Data?

Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data.

  • Black Box Data : It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.

  • Social Media Data : Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.

  • Stock Exchange Data : The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.

  • Power Grid Data : The power grid data holds information consumed by a particular node with respect to a base station.

  • Transport Data : Transport data includes model, capacity, distance and availability of a vehicle.

  • Search Engine Data : Search engines retrieve lots of data from different databases.

  • Sensor Data : Data from different devices working on sensors, example: Meteorological (weather and climate) data, Seismic (earthquake) data, Oceanic (Tides, Tsunami etc.) data.

enter image description here

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types.

1. Structured data : Mostly data from Relational Databases.

2. Semi Structured data : XML data, email data.

3. Unstructured data : Word, PDF, Text, Media Logs.