The Hadoop Cluster Administration course is designed to cover fundamental concepts of Apache Hadoop and Hadoop Cluster. It covers topics to deploy, manage, monitor, and secure a Hadoop Cluster. You will learn to configure backup options, diagnose and recover node failures in a Hadoop Cluster. The course will also cover HBase Administration. There will be many challenging, practical and focused hands-on exercises for the learners. Software professionals new to Hadoop can quickly learn the cluster administration through technical sessions and hands-on labs. By the end of this six week Hadoop Cluster Administration training, you will be prepared to understand and solve real world problems that you may come across while working on Hadoop Cluster.

The course content, quizzes, assignment, labs, and hands on practical’s have been updated to cover new features in Hadoop 2.0, namely YARN, NameNode High Availability, HDFS Federation, Snapshot and so forth.

Course Objectives

After the completion of ‘Hadoop Administration’ course , you should be able to:

Get a clear understanding of Apache Hadoop, HDFS, Hadoop Cluster and Hadoop Administration.
     Hadoop 2.0, Name Node High Availability, HDFS Federation, YARN, MapReduce v2.
    Plan and Deploy a Hadoop Cluster.
    Load Data and Run Applications.
    Configuration and Performance Tuning.
    Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster.
    Secure a deployment and understand Backup and Recovery.
    Learn what Oozie, Hcatalog/Hive, and HBase Administration is all about.

Who should go for this course?

Students, DBAs, System Administrators, Software Architects, Data Warehouse Professionals, IT Managers, and Software Developers interested in learning Hadoop Cluster Administration should go for this course.
Pre-requisites

This course assumes no prior knowledge of Apache Hadoop and Hadoop Cluster Administration.Good knowledge of Linux is required as Hadoop runs on Linux. Fundamental Linux system administration skills such as Linux scripting (perl / bash), good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks are preferable.

Why Learn Hadoop Administration?
BiG Data! A Worldwide Problem?

According to Wikipedia, “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” In simpler terms, Big Data is a term given to large volumes of data that organizations store and process. However, It is becoming very difficult for companies to store, retrieve and process the ever-increasing data. If any company gets hold on managing its data well, nothing can stop it from becoming the next BIG success!

The problem lies in the use of traditional systems to store enormous data. Though these systems were a success a few years ago, with increasing amount and complexity of data, these are soon becoming obsolete. The good news is – Hadoop, which is not less than a panacea for all those companies working with BIG DATA in a variety of applications has become an integral part for storing, handling, evaluating and retrieving hundreds or even petabytes of data.

Apache Hadoop! A Solution for Big Data!

Hadoop is an open source software framework that supports data-intensive distributed applications. Hadoop is licensed under the Apache v2 license. It is therefore generally known as Apache Hadoop. Hadoop has been developed, based on a paper originally written by Google on MapReduce system and applies concepts of functional programming. Hadoop is written in the Java programming language and is the highest-level Apache project being constructed and used by a global community of contributors. Hadoop was developed by Doug Cutting and Michael J. Cafarella. And just don’t overlook the charming yellow elephant you see, which is basically named after Doug’s son’s toy elephant!

Some of the fortune companies using Hadoop:

The importance of Hadoop is evident from the fact that there are many global MNCs that are using Hadoop and consider it as an integral part of their functioning, such as companies like Yahoo and Facebook.  On February 19, 2008, Yahoo! Inc. established the world’s largest Hadoop production application. The Yahoo! Search Webmap is a Hadoop application that runs on over 10,000 core Linux cluster and generates data that is now widely used in every Yahoo! Web search query.

Facebook, a  muti billion company has over 1 billion active users in 2012, according to Wikipedia. Storing and managing data of such magnitude could have been a problem, even for a company like Facebook. But thanks to Apache Hadoop! Facebook uses Hadoop to keep track of each and every profile it has on it, as well as all the data related to them like their images, posts, comments, videos, etc.

Opportunities for Hadoopers are infinite – from a Hadoop Developer, to a Hadoop Tester or a Hadoop Architect, and so on. If cracking and managing BIG Data is your passion in life, then think no more and Join Hadoop Online course and carve a niche for yourself.

Learning Objectives :
Overview of Hadoop Cluster Administration

Learning Objectives – In this module, you will understand what is Big Data and Apache Hadoop, How Hadoop solves the Big Data problems, Various data loading techniques, Basics of Map Reduce, Introduction to Hadoop 2.0, Role of a Hadoop Cluster Administrator.

Topics – Introduction to Big Data, Introduction to Apache Hadoop, MapReduce Framework, A typical Hadoop Cluster, Data Loading into HDFS, Hadoop Cluster Administrator: Roles and Responsibilities.
Hadoop 2.0 and Hadoop Cluster Configuration

Learning Objectives – After this module, you will understand the Hadoop 2.0 NameNode High Availability, HDFS Federation, YARN, Hadoop Cluster Architecture, Hadoop Cluster setup and configuration, Hive and Pig Installation, Setting up Hadoop Clients, Managing a Cluster using Cloudera Manager.

Topics – Hadoop 2.0 New Features, Hadoop Cluster Architecture, Planning the Hadoop Cluster, Cluster Size, Hardware and Software considerations, Hadoop Installation and Initial Configuration, Deploying Hadoop in pseudo-distributed mode, deploying a complete, multi-node Hadoop cluster, Installing and Configuring Hive and Pig, Installing Hadoop Clients, Cloudera Manager.
Node roles, Data Processing, and Network configuration

Learning Objectives – In this module, you will understand Multiple Hadoop Server roles such as NameNode, DataNode, Journal Node etc. in detail, Quorum Journal Manager (QJM), MapReduce and YARN Data Processing, Unbalanced and Balanced Cluster, Advance Cluster Configuration, and Cluster Network configuration.

Topics – Quorum Journal Manager (QJM), Hadoop server roles and their usage, Rack Awareness, Anatomy of Write and Read, Replication Pipeline, Data Processing: YARN, Map and Reduce, Unbalanced Cluster, Cluster Balancing, Advanced Cluster Configuration, and Cluster Network configuration.
Managing, Monitoring and Maintaining a Hadoop cluster

Learning Objectives – In this module, you will understand Oozie Workflow Scheduler, Managing and Scheduling Jobs, Fair and Capacity Scheduler, Hadoop Cluster Monitoring and Troubleshooting, Analyzing logs, and Auditing.

Topics – Oozie, Managing and Scheduling Jobs, Configuring a Fair Scheduler, Configuring a Capacity Scheduler Cluster Monitoring and Troubleshooting, Cluster Maintenance, Service and Log Management, Auditing and Alerts, Service Monitoring.

Security, Backup and Recovery

Learning Objectives – In this module, you will understand basics of Hadoop security, Managing security with Kerberos, HDFS Federation setup, how to configure Backup and Recovery in Hadoop, Diagnosting the Node Failures in the Cluster.

Topics – Basics of Hadoop Platform Security, Securing the Platform, Configuring Kerberos, Configuring HDFS Federation, Backup, Diagnostics and Recovery.

Upcoming Batches

Start Date Duration Class Days Class Time (IST) Price (USD)
15 th Dec 2014 4 Week Sat, Sun 08:00 AM – 10:00 AM IST $200.00 Add to Cart

3rd Feb 2015 10 Days Mon, Tue, Wed, Thu, Fri 08:00 AM – 11:00 AM IST $200.00 Add to Cart

5th March 2015 4 Week Sat, Sun 08:30 PM – 11:30 PM IST $200.00 Add to Cart

10th Apr 2015 4 Week Sat, Sun 08:00 AM – 10:00 AM IST $200.00 Add to Cart

24 May 2015 10 Days Mon, Tue, Wed, Thu, Fri 08:00 AM – 10:00 AM IST $200.00 Add to Cart