Understanding how large is a Big Data

Source: The Hitavada      Date: 25 Sep 2018 12:51:31


By dr s b kishor,


In the world of digital era, the organisations cannot ignore the use of big data, as every company wants to outperform the other competitors or peers. If we are not able to organise this data everyday properly then it can lead to Information Explosion or Overload. There are ways to overcome this problem. Read on to know more...

What is the importance of a minute in our day-to-day life in general? Mostly the answers from all of us will remain the same and that is ok. But one should not forget at the same time that so many things do take place on social networking sites in a single minute which generates tremendous amount of data. According to a survey which was carried out in 2018 by Lori Lewis and Chadd Callahan of Cumulus Media shockingly revealed that around 4.3 million videos have been viewed by users on YouTube, 3.7 million searched for data on Google, 18 million texted the messages, 375000 apps were downloaded, 187 million emails were sent, 481000 tweets sent across globe, Approximately $862823 were spent on online transactions, 2.4 million snaps created, 174000 scrolling on Instagram, approximately 973000 people logged in on Facebook and so many other things take place on social networking sites in a single minute.

It has been found in a study that Facebook has more than two billion active monthly users. It means that in any given month, more than 25% of Earth’s population logs in to their Facebook account at least once.

Now even if we consider that each one of us is uploading and downloading certain amount of data then think how much data is generated every minute using these social networking sites and then there arises a question on how to manage all this mostly unstructured and scattered data for processing.

It says that the ninety percent of the data in the world today has been created during last 3 years. According to recent research cited by Domo, our current output of data is roughly 2.5 quintillion bytes a day and this figure is bound to grow exponentially with the ever-increasing number of electronic devices i.e. numerous information-sensing Internet of Things (IoT) devices such as Mobile Devices, Remote Sensing, Software Logs, Cameras, Microphones, Radio-Frequency Identification (RFID) readers, Mobile IoT GPS data and wireless sensor networks etc.

Big Data: In 1990, John Mashey coined this terms. This term is used to refer to the study and applications of data sets that are so big and complex that traditional data-processing application and traditional data management tools software are inadequate to deal with them i.e. commonly used software is not able to perform within tolerable elapsed time the operation like capture, curate, manage, and process data. The challenge in managing big data includes capturing data, data storage, data analysis, search, sharing, transfer, visualisation, querying, updating, information privacy and data source etc. As Big data can be analysed for insights that lead to better decisions and strategic business moves and therefore for this operation, one may require massively Parallel Software running on tens, hundreds, or even thousands of servers.

V’s of Big Data and Main Components: The concept of Big data gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the high V’s, (Volume, Velocity and Variety) and may require specific technology and analytical methods for its transformation into value as suggested by McKinsey Global Institute report in 2011.

Volume: The data are generated via variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. Although big data doesn't equate to any specific volume of data, the term is often used to describe terabytes, peta bytes and even exabytes of data captured over time. Storing and maintaining such a big data have been eased by new technologies such as highly distributed Hadoop compute instances, Cloud Computing. Amazon Web Services Elastic Map Reduce is one example of big data services in a public cloud.

Variety: Data comes in all types of formats – Big data philosophy encompasses structured data (The data which is usually stored in table format having data type associated with data i.e. Relational Databases, OLTP etc.), semi-structured (data represented in XML file, system log files etc.) unstructured (Email, Text, Audio, Video, Stock Ticker data, Financial Transactions, Output returned by Google Search etc.). In Big data, however the main focus is on unstructured data. Big data is also known as 'Predictive Analytics' as one needs to convert of massive unstructured data into something searchable and sortable format.

Velocity: The speed at which data is processed. Data streams in at an unprecedented speed and must be dealt with in a timely manner. There are various data that need to be processed at very high speed for example a real-time data like RFID tags, sensors and smart metering are driving the need to deal with torrents of data that requires the high speed mechanism to deal with data for processing. To draw some useful conclusion based on this data, one may follow the some of the statistical packages like SPSS, R Programming and Mat Lab etc.

As big data analysis expands into recent and advance fields like Data Science, Business Intelligence, Machine Learning, Deep Learning, Artificial Intelligence, Neural Network, Expert System, Bayesian Network, Natural Language Processing and Image Processing, where analytical processes mimic perception by finding and using patterns in the collected data, here velocity plays major role.

For example, if you want to find whose smile is the best among the photos that are uploaded on FB, Instagram, WhatsApp then manipulating such a big data requires lot of processing and may need to work with above advance field to pickup best photo.

For instance take another example, if users have liked certain products, then popping up similar types of products immediately, is a strategy that has been employed by most of the organizations and usually coined a management term called Market Basked Analysis (It a Modelling Technique based upon the theory that if you buy a certain group of items, you are more likely to buy another group of items if kept them together). Another example, when you select a particular song of particular singer in YouTube, it pops up immediately other song of same singer. Here, Data Mining plays important role. Data mining is the art and science of discovering and exploiting new, useful, and profitable relationships in data. Compare to big data, Data Warehouse is usually helped to extract data from varieties of relational data base.

Importance of Big Data
l Data can be collected or fetched from any sources (Black Box Data, Social Media Data, Stock Exchange Data, Power Grid Data, and Search Engine Data etc.). Once you own this big data, next important question is how you can utilise and analyse it so that you will be able to
l Reduce the Cost
l Reduce the Processing Time
l Help to create New Product based on trends of customer
l Take Smart decision making.
l Give an insight of market conditions
l Control online reputation of company based on web

Analytic tools.
Conclusion: In the world of digital era, the organisations cannot ignore the use of big data, as every company wants to outperform the other competitors or peers. Senior Vice President, Gartner, Peter Sondergaard said once that “Information is the oil of the 21st century, and analytics is the combustion engine”. If we are not able to organise this data everyday properly then it can lead to Information Explosion or Overload. As it was observed that, if you record all human communication from the dawn of time to 2003, it takes up around five billion gigabytes of storage space. Surprisingly now we are creating that much data in every two days.

The big data helps the organisations to analyse and decide strategies on the basis of data to compete, innovate and capture the value which in turn helps to create new growth opportunities. In future, assume if we will be able to derive a mechanism whereby every single thought of human being is recorded then converting that big-big (Colossal) data into image via Image Processing technique or in Virtual Reality format will even consume more storage space than what actually social media sites are consuming in a minute and indeed it will require even greater technology and software to deal with it.

(The author is Chairman, Computer Science Board, Gondwana University, Gadchiroli and can be reached at [email protected])