Notes on Big Data

Large Hadron Collider (LHC), the world’s largest particle accelerator that is being used to find the elusive “God particle” generates about 40Terabytes (1 TB=103GB) of data per day from its four main detectors. Assuming an average size of 1GB for a movie, that amounts to a data stream worth 40000 movies(!) in a day. This massive amount of data is distributed to selected institutions across the world for further research. Thus, in one year, the CERN pumps about 15 Petabytes(1PB=106GB) of data into its private network and the Internet. Similarly, multiple information sources across the world are generating data so large as to challenge the technology that stores and processes that data. To put things in a broader perspective, 90% of the world’s data was created in the last 2 years!

read more