Hadoop Configuration using Ansible Playbook

Aman Goyal
4 min readMar 23, 2021

First Lets discuss what is Hadoop and Ansible.
Hadoop:- Hadoop is a open-source software framework which is used for big data problem solving .It provides massive storage for any kind of data.

Ansible:- Ansible is a simplest way to automate apps and IT infrastructure. It automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs.

Objective of blog :- 🔰 11.1 Configure Hadoop and start cluster
services using Ansible Playbook

So How we will proceed, we will launch 3 VM . one of them would be Controller node , 1 will be Namenode for Hadoop and 1 will be datanode for Hadoop.

We will Write an ansible automation program inside Controller node by which we will setup Hadoop cluster in rest of 2 VMs . For this we will only need IP, user name and password of those 2 os.

so lets begin with launching 3 VMs and getting IPs

Namenode ip =192.168.0.111 Datanode ip = 192.168.0.110

Go to Controller node(CN) and Install Ansible there

pip3 install ansible

Some more thing we will be needed , one of them is sshpass downloaded in CN as we will need to connect to Hadoop Vm

So for that first configure your yum repository with epel . then install epel-release and install sshpass

yum install epel-release
yum install sshpass

Now in CN create one Inventory file to tell node where our name and datanode will be . so this file will be having IPs ,user name , password and method of connection.

File=ip.txt

now update this detail to config file of ansible that these are our working nodes(hosts). we need to create that config file as it is not by default created. for that go to root and follow these commands

cd /etc
cd /ansible
vi ansible.cfg #now updates details here

check whether we are able to ping to target nodes

ansible all --list-hosts #To get all Managed Node IPs list -
ansible namenode --list-hosts #To get “Namenode” group IPs list -
ansible datanode --list-hosts #To get “DataNode” group IPs list -

now we will write our automation code in ansible controller node. Here i have created the code and uploaded it to github .check it out

https://github.com/AmanGoyal31/Ansible_playbook_for_Hadoop

Now run the code using this command

ansible-playbook -v ansible_hadoop_setup.yml

Now just go to Namenode and datanode and run command to check whether it is launched or not:- jps

Check cluster report by

hadoop dfsadmin -report

And our automation setup is done.

That’s all with this article. I hope you found the post Informative, if something was missing or you think some more things could have been added, feel free to provide suggestions in the comments section or on LinkedIn.

You can check out my LinkedIn profile.

Thanks for giving your precious time to this article✌ Hope you like it .

--

--