IJSRP, Volume 5, Issue 7, July 2015 Edition [ISSN 2250-3153]
Saminath.V, Sangeetha.M.S
Abstract:
Hadoop is an open-source framework to storing and processing of Big data in a distributed environment. Big data is collection of complex and large volume of structured and un-structured data. Hadoop stores data throughout clusters located in geographically different machines and distribute workload using parallel computing. MapReduce is software framework derived on Java, to analyze the large scale data. MapReduce uses Distributed Data processing model. HDFS is another component in Hadoop, storing large volume of data. Google File system supports immense amount of data stored into distributed data nodes, each node has redundant data storage maintained to avoid lost. This paper explains the HDFS, details of jobs node cluster environment, stack layered component on Hadoop framework, various Application development on Hadoop.