Processing XML file using xmlinputformat Hadoop Map Reduce

Hadoop provides default input formats like TextInputFormat, NLineInputFormat, KeyValueInputFormat etc., when you get a different types of files for processing you have to create your own custom input format for processing using MapReduce jobs Here I am going to show you how to processing XML files using MapReduce Job by creating custom XMLInputFormat (xmlinputformat hadoop)[…]

Reading SeqeunceFile hadoop mapreduce

Reading SeqeunceFile hadoop mapreduce , You have created a sequence using sequence writer, once done now you want to check whether sequence file in hadoop created successfully or not by reading the sequence file present in HDFS Input to the program is the location of the Sequence File in Hadoop HDFS. You can run this[…]

Why we need Kafka?

A large amount of data is generated by companies having any form of web-based presence and activity. Data is one of the newer ingredients in these Internet-based systems and typically includes user-activity events corresponding to logins, page visits, clicks, social networking activities such as likes, sharing, and comments, and operational and system metrics. This data[…]

What is Apache Kafka

Apache Kafka is an open source, distributed publish-subscribe messaging system, mainly designed with the following characteristics: Persistent messaging: To derive the real value from big data, any kind of information loss cannot be afforded. Apache Kafka is designed with O(1) disk structures that provide constant-time performance even with very large volumes of stored messages, which[…]

Apache Spark Installation Steps

Spark Installation Step 1 : Step 2 : Download binaries for Hadoop 2 (HDP2, CDH5) Step 3 : tar -xvf spark-0.9.0-incubating-bin-hadoop2.tgz Step 4 : Download SCALA Step 5 : configure the environment variable        export SCALA_HOME=/home/notroot/lab/software/scala-2.10.4-RC3     export PATH=$PATH:$SCALA_HOME/sbin Step 6 : Building the SPARK         Install github:        sudo apt-get[…]