Did you know that 90% of the data in the world today was created in the last two years? The growth of data will continue to rise as the cost of storage decreases. Below is the rate of growth of data since 2005 to 2015 (forecast) – IDC Research
What is Big Data?
The term “big data” has been broadly becoming a buzz word – combination of both technical and marketing. Edd Dumbill, principal analyst for O’Reilly Radar in simple terms defined it as: Big data is data that becomes so large that it cannot be processed using conventional methods. The size of the data which can be considered to be Big Data is a constantly varying factor and newer tools are continuously being developed to handle this “Big Data”.
How much data is Big Data and how fast is it growing?
To put some historical context and evolution of systems and technology – we must understand the best use of technology is where you are solving a problem or pain point. For example, Keg Kruger, Bixo Labs in his presentation – “A Very Short History of Big Data” described how the US census used Hollerith Tabulating Systems in 1890 to tabulate millions of pages of data which was historically being done manually. Hollerith’s tabulating company with three other companies were combined to form Computing Tabulating Recording Corporation which is now International Business Machines – IBM.
We must first understand, how do we measure data? Byte (8 bits equals 1 byte) is a unit of measuring digital information. it is important to understand the below metrics as we start looking at big data. Big Data is certainly not a measurement but we should understand how much data is considered “Big”. In the table below – the starting of terrabyte of data is considered to starting of what is referred to as big data.
Gigabyte: 1024 megabytes
4.7 Gigabytes: A single DVD
Terabyte: 1024 gigabytes
1 Terabyte: About two years worth of non-stop MP3s. (Assumes one megabyte per minute of music)
10 Terabytes: The printed collection of the U.S. Library of Congress
Petabyte: 1024 terabytes
1 Petabyte: The amount of data stored on a stack of CDs about 2 miles high or 13 years of HD-TV video
20 Petabytes: The storage capacity of all hard disk drives created in 1995
Exabyte: 1024 petabytes
1 Exabyte: One billion gigabytes
5 Exabytes: All words ever spoken by mankind
What are the characteristics of Big Data?
Big Data can be described by the following characteristics:
(i) Volume – The quantity of data that is generated is very important in this context.It is the size of the data which determines the value and potential of the data under consideration and whether it can actually be considered as Big Data or not.The name ‘Big Data’ itself contains a term which is related to size and hence the characteristic.
(ii) Variety- The next aspect of Big Data is its variety.This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts.This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data.
(iii) Velocity- The term ‘velocity’ in this context refers to the speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development.
(iv) Variability- This is a factor which can be a problem for those who are analyse the data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.
(v) Complexity- Data management can become a very complex process,especially when large volumes of data come from multiple sources.These data need to be linked,connected and correlated in order to be able to grasp the information that is supposed to be conveyed by these data.This situation,is therefore,termed as the ‘complexity’ of Big Data.
Examples of Big Data
Data comes mainly in two forms-
- Structured, and
- Unstructured Data (there are also semi-structured data – eg. XML)
Structured data has semantic meaning attached to it whereas Unstructured data has no latent meaning. The growth in data that we are referring is most unstructured data. Below are few examples of unstructured data -
- Calls, text, tweet, net surf, browse through various websites each day and exchange messages via several means.
- Social media usage my several million people for exchanging data in various forms also forms a part of Big Data.
- Transactions made through card for various payment issues in large numbers every second across the world also constitutes the Big Data.
Hope this posts gave you enough of infomation about Big Data and in future posts, we will be looking at – Applications of Big Data i.e. Big Data Analytics, Careers in Big Data – From Software Engineer to becoming a Data Scientist, Hadoop and Applications.