Definition of
Data: It is the raw form of any content we produce, for example if you have ten people and you measure their height and record it on a sheet, this sheet contains data.
Information: It is the output of any raw data processing process, meaning if you take the tallest of these ten people and get an arithmetic average for it, this average is information, because it gives a useful measure. While data are just numbers recorded on a sheet.
In 2011, the McKinsey Global Institute defined big data as any set of data that is larger than the capacity of traditional database tools to capture, store, manage and analyze that data.
Big data consists of all of the structured information which makes up a fraction of as little as 10% compared to the unstructured information which makes up the rest.
Unorganized information is what humans produce, such as emails, videos, tweets, Facebook posts, WhatsApp chat messages, website clicks, and more.
Big data has become a reality in which we live, even the Oxford dictionary adopted the term and added it to the dictionary with other innovative terms such as tweet.
How much does huge mean?
What is huge today, it will not be so tomorrow. And what is huge for you, is too small for others. Here arises the challenge of defining the meaning of the huge.
Since last year, limits on the size of data sets that are suitable for processing in a reasonable amount of time have been subject to the exabyte.
Example: An Airbus A380 aircraft produces a billion lines of code every half hour, or to transmit 10 terabytes of data, this data is generated by the engines and sensors in the plane about all the minute details associated with its flight, and remember this is just half an hour in only one flight of one plane only .
Likewise, if you were to fly from Heathrow to Kennedy Airport, the flight would produce 640 terabytes of data. Imagine how many flights a day takes by planes, and from it you will imagine the nature of big data. By these measures, whatever we used to know as big data becomes dwarf.
Big data is the next generation of computing that creates value by scanning and analyzing data.
With the passage of time, the data produced by users has grown rapidly for several reasons, including data on purchases in supermarkets, commercial markets, freight bills, banks, health and social networks.
With the development of facial and people recognition technologies, they will be able to find more details and information about anyone, and with the increasing number of devices connected to the Internet, devices that we are not used to connecting to the global network such as cars, refrigerators and washing machines, they all contribute to increasing the volume The data produced.
Characteristics of Big Data
In order for data to be big, three main factors are required:
Volume: this is the number of terapats of data that we release daily from content.
Diversity: it is the diversity of these data, between structured, unstructured and semi-structured
Speed: How quickly the data occurs, for example the speed of posting Tweets differs from the speed at which remote sensors scan for climate changes.
But what are the characteristics of big data?
Big data is differentiated by size, variety, and speed. By studying the large volume of data, companies can better understand their customers. Imagine, for example, searching in the purchase data of one million people who deal with Wal-Mart. This research and analysis of the vast amount of purchase invoices and the frequency and diversity of purchases will give very useful information to management and decision-makers.
Challenges arise in front of traditional database management tools in dealing with varied and fast data, as traditional databases used to deal with text documents and numbers only, while today's big data contains new types of data that cannot be ignored, such as images, audio clips, videos, and so on. 3D models, geolocation data, and more.
For example, the majority of supermarkets and supermarkets that deal with loyalty cards do not take advantage of this data and process it in a way that helps them better understand buyers to develop a loyalty card model.
Also, all video clips recorded by medical devices during surgeries are not utilized as required, but rather are deleted within weeks.
Today, Hadoop is one of the best technologies for dealing with big data, and it is an open source library suitable for dealing with large, varied and fast data, and major companies use the Hadoop service, for example there is LinkedIn. The social network specializing in jobs and work uses the service in order to generate more than 100 billion suggestions for users. Weekly.
But what's the point of big data? IBM says big data gives you the opportunity to discover important insights into data, and Oracle says big data allows companies to better understand their customers.
Cisco estimated that by 2015, total internet traffic will exceed 4.8 zettabytes (4.8 billion terabytes) annually.
Practical examples
The Great Hydron Collider has 150 million sensors that provide data 40 million times per second. There are approximately 600 million collisions per second. But we only deal with less than 0.001% of the sensor stream data, so the data stream from all four collider experiments is 25 petabytes.
Amazon.com handles millions of background processes every day, as well as inquiries from more than half a million third-party sellers. Amazon relies on the Linux system mainly to be able to deal with this huge amount of data, and Amazon has the 3 largest Linux databases in the world, which have a capacity of 7.8, 18.5 and 24.7 terabytes.
The Walmart store chain processes more than a million commercial transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data - 167 times the data in all books in the Library of Congress in the states. United.
Facebook processes 50 billion photos from its user base. The FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts around the world.