How big MNC’s like
Google, Facebook, Instagram etc
stores, manages and manipulate
Thousands of Terabytes of data
with High Speed and High Efficiency?

Jayesh Gupta
3 min readMar 7, 2021

--

Today, everyone is using smart devices in the world and they are uploading and downloading videos, music, emails, documents, photos, and many more. With the increasing demand for social media like YouTube, Facebook, Snapchat, every hour about 70 terabytes of data is uploaded on the servers.

On YouTube, by the time you’ve finished one video, there will be 1,000 more videos added to the website. If you assume the same growth rate for the past 10 years and no more video will be uploaded until you stop watching, it would take 60,000 years of non-stop watching to watch each and every video on YouTube.

Big data is not a technology but is a problem of a data world as the physical storing components, the velocity of processing this huge data is limited at present time. It requires a single day to create approx 25000000 Terabytes of data and a second to create 1.7 megabytes of data approximately.

By the latest report, in half an hour 105 Terabyte of data is uploaded on Facebook’s server. So storing and managing this bulk of data is difficult.

To solve this problem there are many software's available in the market. Some of the software's are-

Hadoop —Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Spark — Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Storm — Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Apache Storm is simple, can be used with any programming language, and is a lot of fun to use!

Cassandra — Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

MongoDB — MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License.

--

--

Jayesh Gupta
Jayesh Gupta

No responses yet