Massive amounts of data is being produced in everyday activities hence it is necessary to store and analyze such data. Hadoop is a popular distributed system used to store this data and MapReduce is used for performing analysis on it. Detail study and experiments led to conclusion that MapReduce job’s execution times can be lowered. A cost-effective mechanism known as collaborative caching has been proposed for efficient use of resources and system. This mechanism helps in improving the performance, reducing access latency and increasing the throughput. A new architecture called Hadoop-Collaborative Caching is proposed in order to lower the execution times. It incorporates collaborative caching, reference caching and Modified-ARC algorithm. Each of the DataNodes have their own dedicated Cache Manager that manages caching, replacement, collaborative caching and eviction. Cache is organized in to recent, frequent, recent history and frequent history. To evaluate, results obtained were compared with default configuration of Hadoop.
International Conference on Parallel and Distributed Processing Techniques and Applications, 2013